This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
12/12
DWARFDebugInfoEntry.cpp
12/12
DWARFUnit.cpp
-
test/tools/llvm-dwarfdump/X86/
-
tools/
-
llvm-dwarfdump/
-
X86/
8/10
format-warnings.s

Differential D104271

llvm-dwarfdump: Print warnings on invalid DWARF
ClosedPublic

Authored by jankratochvil on Jun 14 2021, 3:03 PM.

Download Raw Diff

Details

Reviewers

dblaikie
jhenderson

Commits

rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF

Summary

llvm-dwarfdump was silent even when the format of DWARF was invalid and/or llvm-dwarfdump did not understand/support some of the constructs. This can be pretty confusing as llvm-dwarfdump is a tool for DWARF producers+consumers development.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jankratochvil created this revision.Jun 14 2021, 3:03 PM

Herald added a reviewer: jhenderson. · View Herald TranscriptJun 14 2021, 3:03 PM

Herald added subscribers: cmtice, hiraditya. · View Herald Transcript

jankratochvil requested review of this revision.Jun 14 2021, 3:03 PM

Herald added a subscriber: MaskRay. · View Herald TranscriptJun 14 2021, 3:03 PM

Harbormaster completed remote builds in B109203: Diff 352004.Jun 14 2021, 3:42 PM

dblaikie added inline comments.Jun 14 2021, 9:05 PM

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37–39	Could probably include the bounds in the message? Maybe something like: "DWARF compile unit extends beyond its bounds [x, y) to z"?
52–55	Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406	What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc?

jhenderson added inline comments.Jun 15 2021, 12:00 AM

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1	The test deserves an introductory commetn describing what it is testing. Additionally, the name is too generic and possibly slightly misleading "format-warnings.s" suggests that the main aim of the test is to test the formatting of warning messages in general, whereas this test is more about invalid debug abbrev/debug info, so perhaps it could just be debug-entry-invalid.s or something like that. I have a personal preference to not use stdin to drive llvm-dwarfdump, and instead create an object file on disk with llvm-mc. This makes it easier to debug the test should something go wrong, since you can inspect the object without needing to change the test code.
3	It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). Same goes for the remaining messages. Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
21	Indentation here is a little inconsistent.
31	Here and elsewhere, consider lining up your comments vertically. They're a bit all over the place currently.

dblaikie added inline comments.Jun 15 2021, 11:08 AM

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1	There is somewhat of a convention to prefer piping, because it means the input file name (whatever it happens to be on the buildbot/local filesystem) does not appear in the output - this reduces the chance that FileCheck commands might accidentally match on some part of the input file name - making the test more hermetic/reliable.

To satisfy all the required detailed reporting I had to extend the patch far more than I originally inteded. Is it OK this way?

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406	That `DIE.extractFast` just exited on both errors and successful end of DIEs. As it did not report any errors the return code (`false`) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside `DIE.extractFast` as that has all the info available. Therefore this error has moved inside `DIE.extractFast` (`if (Offset >= UEndOffset)` there).
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3	It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.

jankratochvil updated this revision to Diff 352845.Jun 17 2021, 2:25 PM

jankratochvil marked 2 inline comments as done.

jankratochvil added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103 ↗	(On Diff #352845)	This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.

jankratochvil updated this revision to Diff 352847.Jun 17 2021, 2:30 PM

Not had time to review the test cases yet, but will do that next week.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375 ↗	(On Diff #352847)	I'd rename this function to `getSupportedAddressSizes()`, which reads slightly better. Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
74 ↗	(On Diff #352847)	Lambdas follow variable naming style, so `flush` -> `Flush`.
103 ↗	(On Diff #352845)	I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it. I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs: std::string DWARFAbbreviationDeclarationSet::getCodeRange() const { // Create a sorted list of all abbrev codes. std::vector<uint32_t> Codes; Codes.reserve(Decls.size()); std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes), [](const DWARFAbbreviationDeclaration &Decl){ return Decl.getCode(); }); std::sort(Codes.begin(), Codes.end()); std::string Buffer = ""; raw_string_ostream Stream(Buffer); // Each iteration through this look represents a single contiguous range in the set of codes. for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) { uint32_t RangeStart = Current; // Add the current range start. Stream << Current; uint32_t RangeEnd = RangeStart; // Find the end of the current range. while(++Current != End && *Current == RangeEnd + 1) ++RangeEnd; // If there is more than one value in the range, add the range end too. if (RangeStart != RangeEnd) Stream << "-" << RangeEnd; // If there is at least one more range, add a separator. if (Current != End) Stream << ", "; } return Buffer; }
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..." Same applies below.
56	It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
364	I think this is a little clearer.
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
1 ↗	(On Diff #352847)	Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3	Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.

Harbormaster completed remote builds in B109805: Diff 352847.Jun 18 2021, 6:27 AM

dblaikie added inline comments.Jun 18 2021, 11:49 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103 ↗	(On Diff #352845)	Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point. @jhenderson's version seems nice, if we're going to do this.

jankratochvil marked 9 inline comments as done.Jun 20 2021, 2:09 PM

jankratochvil added inline comments.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375 ↗	(On Diff #352847)	String literals don't have lifetime issues, so there's no need for the static local variable. True, I am stupid.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103 ↗	(On Diff #352845)	TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version. The look ahead instead of a lambda flusher is an interesting idea.
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users. And then we should change an already existing error message: "while reading [0x%x, 0x%x)"
56	I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken. I agree, thanks.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3	It is true 32-bit buildbots would catch it.

jankratochvil updated this revision to Diff 353241.Jun 20 2021, 2:10 PM

jankratochvil marked 3 inline comments as done.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 20 2021, 2:10 PM

Harbormaster completed remote builds in B110100: Diff 353241.Jun 20 2021, 2:55 PM

jhenderson added inline comments.Jun 21 2021, 12:45 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103 ↗	(On Diff #352845)	I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it). I'm also happy if you'd prefer to drop the entire thing.
81 ↗	(On Diff #353241)
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...". I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.
54	In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.
66	`AbbrCode` is a `uint64_t`. The test will fail on some platforms due to either bitness or endianness issues.
80	`dwarf::Form` is specified to be a `uint16_t` in its declaration.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
283	I wonder if this error needs putting earlier, in case the header is truncated?
287	`debug_info.size()` returns a `size_t`, so may not always be 64 bits.
288	As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end". (I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).
290	`Size` is a `uint8_t`.
291	I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.
295	`getVersion()` returns a `uint16_t`.
303	As above, I don't think you need the end offset here, as it isn't relevant for this message.
304	`getAddressByteSize()` returns a `uint8_t`.
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
25 ↗	(On Diff #353241)	I'd test both the cases where the offset points to within the header and past the end of the unit.
93 ↗	(On Diff #353241)	Nit: all the other comments use upper-case for their first letter.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3	It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
3	I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs. From my understanding, integer promotion of variadic arguments is only as far as `int`/`unsigned int`. The size of `int` is implementation defined and not necessarily the same as `uint32_t` (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using `uint*_t` types, you should use their corresponding macros for printing.

jankratochvil marked 13 inline comments as done.Jun 21 2021, 7:24 AM

jankratochvil added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	without the ambiguity risk by saying "with length X at offset Y" instead That is definitely ambiguous as it can mean length `b-a` of `[a, b)` or it can mean length stored in the binary which is `b-a-4` (for DWARF32). I used the offsets with "incl." and "excl." to move forward. The length is actually the thing that's encoded in the DWARF after all. So you did mean the `b-a-4`.

Changes the messages to "incl." and "excl." as I hope that can be acceptable for both of us.
I have newly used joinErrors there which I intended originally and then I forgot about it. It creates a two-line error:

warning: DWARF unit at 0x0000002c cannot be parsed: 
warning: unexpected end of data at offset 0x2d while reading [0x2c, 0x30)

jankratochvil marked 4 inline comments as done.Jun 21 2021, 7:26 AM

Harbormaster completed remote builds in B110196: Diff 353364.Jun 21 2021, 8:03 AM

LGTM, with one possible suggestion, but also happy if this is committed as-is. You might want to wait for @dblaikie too.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375 ↗	(On Diff #353364)	This is probably absolutely fine, but I was thinking about it and wondering whether it would make some sense to factor out the commonality into some sort of container, that the string function can iterate over to generate a string, and the bool function can just compare values against. What do you think?
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	Fair point. Let's stick with your latest version.

This revision is now accepted and ready to land.Jun 23 2021, 12:59 AM

jankratochvil updated this revision to Diff 353934.Jun 23 2021, 5:20 AM

jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:22 AM

jankratochvil added inline comments.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375 ↗	(On Diff #353364)	I did not want to code it that myself first as it looked overengineered to me and it still looks so. But why not if there is an agreement upon it. I agree there is no information duplication then.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
75 ↗	(On Diff #353934)	I have put there a simple iterator instead of `llvm::transform` as it is really shorter and easier to read.

Just some clang-tidy warnings.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
377 ↗	(On Diff #353934)
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
302	Nit: fix clang-tidy warnings.

Fixed clang-tidy warnings.

jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:45 AM

Harbormaster completed remote builds in B110606: Diff 353937.Jun 23 2021, 6:25 AM

This revision was landed with ongoing or failed builds.Jun 27 2021, 2:41 AM

Closed by commit rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF (authored by jankratochvil). · Explain Why

This revision was automatically updated to reflect the committed changes.

jankratochvil added a commit: rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF.

jankratochvil mentioned this in rGa7afaf901914: Fix lld testsuite after llvm-dwarfdump now errors on invalid DWARF.Jun 27 2021, 3:28 AM

MaskRay mentioned this in rG251640ab5756: [ELF][test] Terminate .debug_info with a null entry to fix warnings.Feb 22 2022, 9:35 PM

Revision Contents

Path

Size

llvm/

lib/

DebugInfo/

DWARF/

DWARFDebugInfoEntry.cpp

19 lines

DWARFUnit.cpp

13 lines

test/

tools/

llvm-dwarfdump/

X86/

format-warnings.s

43 lines

Diff 352004

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp

//===- DWARFDebugInfoEntry.cpp --------------------------------------------===// //===- DWARFDebugInfoEntry.cpp --------------------------------------------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h" #include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h"

#include "llvm/ADT/Optional.h" #include "llvm/ADT/Optional.h"

#include "llvm/DebugInfo/DWARF/DWARFContext.h"

#include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h" #include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"

#include "llvm/DebugInfo/DWARF/DWARFFormValue.h" #include "llvm/DebugInfo/DWARF/DWARFFormValue.h"

#include "llvm/DebugInfo/DWARF/DWARFUnit.h" #include "llvm/DebugInfo/DWARF/DWARFUnit.h"

#include "llvm/Support/DataExtractor.h" #include "llvm/Support/DataExtractor.h"

#include <cstddef> #include <cstddef>

#include <cstdint> #include <cstdint>

using namespace llvm; using namespace llvm;

using namespace dwarf; using namespace dwarf;

bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U,

uint64_t *OffsetPtr) { uint64_t *OffsetPtr) {

DWARFDataExtractor DebugInfoData = U.getDebugInfoExtractor(); DWARFDataExtractor DebugInfoData = U.getDebugInfoExtractor();

const uint64_t UEndOffset = U.getNextUnitOffset(); const uint64_t UEndOffset = U.getNextUnitOffset();

return extractFast(U, OffsetPtr, DebugInfoData, UEndOffset, 0); return extractFast(U, OffsetPtr, DebugInfoData, UEndOffset, 0);

} }

bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, uint64_t *OffsetPtr, bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, uint64_t *OffsetPtr,

const DWARFDataExtractor &DebugInfoData, const DWARFDataExtractor &DebugInfoData,

uint64_t UEndOffset, uint32_t D) { uint64_t UEndOffset, uint32_t D) {

Offset = *OffsetPtr; Offset = *OffsetPtr;

Depth = D; Depth = D;

if (Offset >= UEndOffset || !DebugInfoData.isValidOffset(Offset)) if (Offset >= UEndOffset || !DebugInfoData.isValidOffset(Offset)) {

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF compile unit extends beyond its "

jhendersonUnsubmitted

Done

It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."

Same applies below.

jhenderson: It's not going to be clear to the end user that these two values represent offsets. I'd be more…

jankratochvilAuthorUnsubmitted

Done

I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users.
And then we should change an already existing error message: "while reading [0x%x, 0x%x)"

jankratochvil: I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to…

jhendersonUnsubmitted

Done

With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...".

I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.

jhenderson: With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends…

jankratochvilAuthorUnsubmitted

Done

without the ambiguity risk by saying "with length X at offset Y" instead

That is definitely ambiguous as it can mean length b-a of [a, b) or it can mean length stored in the binary which is b-a-4 (for DWARF32). I used the offsets with "incl." and "excl." to move forward.

The length is actually the thing that's encoded in the DWARF after all.

So you did mean the b-a-4.

jankratochvil: > without the ambiguity risk by saying "with length X at offset Y" instead That is definitely…

jhendersonUnsubmitted

Done

Fair point. Let's stick with your latest version.

jhenderson: Fair point. Let's stick with your latest version.

"bounds cu 0x%8.8" PRIx64 " at 0x%8.8" PRIx64,

U.getOffset(), Offset));

dblaikieUnsubmitted

Done

Could probably include the bounds in the message? Maybe something like:

"DWARF compile unit extends beyond its bounds [x, y) to z"?

dblaikie: Could probably include the bounds in the message? Maybe something like: "DWARF compile unit…

return false; return false;

}

uint64_t AbbrCode = DebugInfoData.getULEB128(OffsetPtr); uint64_t AbbrCode = DebugInfoData.getULEB128(OffsetPtr);

if (0 == AbbrCode) { if (0 == AbbrCode) {

// NULL debug tag entry. // NULL debug tag entry.

AbbrevDecl = nullptr; AbbrevDecl = nullptr;

return true; return true;

} }

if (const auto *AbbrevSet = U.getAbbreviations()) if (const auto *AbbrevSet = U.getAbbreviations())

AbbrevDecl = AbbrevSet->getAbbreviationDeclaration(AbbrCode); AbbrevDecl = AbbrevSet->getAbbreviationDeclaration(AbbrCode);

if (nullptr == AbbrevDecl) { if (nullptr == AbbrevDecl) {

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF abbreviation 0x%" PRIx64 " is not valid "

"in cu 0x%8.8" PRIx64 " at 0x%8.8" PRIx64,

jhendersonUnsubmitted

Done

In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.

jhenderson: In this case, you don't need the end offset of the unit, as it has no impact here - only the…

AbbrCode, U.getOffset(), Offset));

dblaikieUnsubmitted

Done

Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)

dblaikie: Maybe include some details about the offset to the debug_abbrev contribution, and the range of…

// Restore the original offset. // Restore the original offset.

jhendersonUnsubmitted

Done

It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

jhenderson: It's not clear to me (when reading the message without the code context) what these last two…

jankratochvilAuthorUnsubmitted

Done

I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

I agree, thanks.

jankratochvil: > I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this…

*OffsetPtr = Offset; *OffsetPtr = Offset;

return false; return false;

} }

// See if all attributes in this DIE have fixed byte sizes. If so, we can // See if all attributes in this DIE have fixed byte sizes. If so, we can

// just add this size to the offset to skip to the next DIE. // just add this size to the offset to skip to the next DIE.

if (Optional<size_t> FixedSize = AbbrevDecl->getFixedAttributesByteSize(U)) { if (Optional<size_t> FixedSize = AbbrevDecl->getFixedAttributesByteSize(U)) {

*OffsetPtr += *FixedSize; *OffsetPtr += *FixedSize;

return true; return true;

} }

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "contains invalid abbreviation %u at "

+ "contains invalid abbreviation %" PRIu64 " at "

"offset 0x%8.8" PRIx64 ", valid abbreviations are %s",

AbbrCode is a uint64_t. The test will fail on some platforms due to either bitness or endianness issues.

jhenderson: `AbbrCode` is a `uint64_t`. The test will fail on some platforms due to either bitness or…

// Skip all data in the .debug_info for the attributes // Skip all data in the .debug_info for the attributes

for (const auto &AttrSpec : AbbrevDecl->attributes()) { for (const auto &AttrSpec : AbbrevDecl->attributes()) {

// Check if this attribute has a fixed byte size. // Check if this attribute has a fixed byte size.

if (auto FixedSize = AttrSpec.getByteSize(U)) { if (auto FixedSize = AttrSpec.getByteSize(U)) {

// Attribute byte size if fixed, just add the size to the offset. // Attribute byte size if fixed, just add the size to the offset.

*OffsetPtr += *FixedSize; *OffsetPtr += *FixedSize;

} else if (!DWARFFormValue::skipValue(AttrSpec.Form, DebugInfoData, } else if (!DWARFFormValue::skipValue(AttrSpec.Form, DebugInfoData,

OffsetPtr, U.getFormParams())) { OffsetPtr, U.getFormParams())) {

// We failed to skip this attribute's value, restore the original offset // We failed to skip this attribute's value, restore the original offset

// and return the failure status. // and return the failure status.

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF FORM_* 0x%x is not valid "

"in cu 0x%8.8" PRIx64 " at 0x%8.8" PRIx64,

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "contains invalid FORM_* 0x%x at offset 0x%8.8" PRIx64,

+ "contains invalid FORM_* 0x%" PRIx16" at offset 0x%8.8" PRIx64,

U.getOffset(), U.getNextUnitOffset(), AttrSpec.Form, *OffsetPtr));

dwarf::Form is specified to be a uint16_t in its declaration.

jhenderson: `dwarf::Form` is specified to be a `uint16_t` in its declaration.

AttrSpec.Form, U.getOffset(), *OffsetPtr));

*OffsetPtr = Offset; *OffsetPtr = Offset;

return false; return false;

} }

return true; return true;

} }

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines bool TypeOffsetOK =

TypeOffset < getLength() + getUnitLengthFieldByteSize(); TypeOffset < getLength() + getUnitLengthFieldByteSize();

bool LengthOK = debug_info.isValidOffset(getNextUnitOffset() - 1); bool LengthOK = debug_info.isValidOffset(getNextUnitOffset() - 1);

bool VersionOK = DWARFContext::isSupportedVersion(getVersion()); bool VersionOK = DWARFContext::isSupportedVersion(getVersion());

bool AddrSizeOK = DWARFContext::isAddressSizeSupported(getAddressByteSize()); bool AddrSizeOK = DWARFContext::isAddressSizeSupported(getAddressByteSize());

if (!LengthOK || !VersionOK || !AddrSizeOK || !TypeOffsetOK) if (!LengthOK || !VersionOK || !AddrSizeOK || !TypeOffsetOK)

return false; return false;

// Keep track of the highest DWARF version we encounter across all units. // Keep track of the highest DWARF version we encounter across all units.

jhendersonUnsubmitted

Done

I wonder if this error needs putting earlier, in case the header is truncated?

jhenderson: I wonder if this error needs putting earlier, in case the header is truncated?

Context.setMaxVersionIfGreater(getVersion()); Context.setMaxVersionIfGreater(getVersion());

return true; return true;

} }

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "extends past section size 0x%8.8" PRIx64,

+ "extends past section size 0x%8.8zx",

Offset, NextCUOffset, debug_info.size()));

debug_info.size() returns a size_t, so may not always be 64 bits.

jhenderson: `debug_info.size()` returns a `size_t`, so may not always be 64 bits.

bool DWARFUnitHeader::applyIndexEntry(const DWARFUnitIndex::Entry *Entry) { bool DWARFUnitHeader::applyIndexEntry(const DWARFUnitIndex::Entry *Entry) {

jhendersonUnsubmitted

Done

As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end".

(I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).

jhenderson: As this is for DWARF developers, maybe it would be best to use the actual field name as defined…

assert(Entry); assert(Entry);

assert(!IndexEntry); assert(!IndexEntry);

jhendersonUnsubmitted

Done

"outside of its relative boundary "

- "[0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ")",

+ "[0x%8.8" PRIx8 ", 0x%8.8" PRIx64 ")",

Offset, NextCUOffset, TypeOffset, Size,

Size is a uint8_t.

jhenderson: `Size` is a `uint8_t`.

IndexEntry = Entry; IndexEntry = Entry;

jhendersonUnsubmitted

Done

I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.

jhenderson: I'd put this before the type_offset check, as a different DWARF version might not have the type…

if (AbbrOffset) if (AbbrOffset)

return false; return false;

auto *UnitContrib = IndexEntry->getContribution(); auto *UnitContrib = IndexEntry->getContribution();

if (!UnitContrib || if (!UnitContrib ||

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "has unsupported version %u, supported are 2-%u",

+ "has unsupported version %" PRIu16 ", supported are 2-%u",

Offset, NextCUOffset, getVersion(),

getVersion() returns a uint16_t.

jhenderson: `getVersion()` returns a `uint16_t`.

UnitContrib->Length != (getLength() + getUnitLengthFieldByteSize())) UnitContrib->Length != (getLength() + getUnitLengthFieldByteSize()))

return false; return false;

auto *AbbrEntry = IndexEntry->getContribution(DW_SECT_ABBREV); auto *AbbrEntry = IndexEntry->getContribution(DW_SECT_ABBREV);

if (!AbbrEntry) if (!AbbrEntry)

return false; return false;

AbbrOffset = AbbrEntry->Offset; AbbrOffset = AbbrEntry->Offset;

return true; return true;

jhendersonUnsubmitted

Done

Nit: fix clang-tidy warnings.

jhenderson: Nit: fix clang-tidy warnings.

} }

jhendersonUnsubmitted

Done

As above, I don't think you need the end offset here, as it isn't relevant for this message.

jhenderson: As above, I don't think you need the end offset here, as it isn't relevant for this message.

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "has unsupported address size %u, supported are %s",

+ "has unsupported address size " PRIu8 ", supported are %s",

Offset, NextCUOffset, getAddressByteSize(),

getAddressByteSize() returns a uint8_t.

jhenderson: `getAddressByteSize()` returns a `uint8_t`.

// Parse the rangelist table header, including the optional array of offsets // Parse the rangelist table header, including the optional array of offsets

// following it (DWARF v5 and later). // following it (DWARF v5 and later).

template<typename ListTableType> template<typename ListTableType>

static Expected<ListTableType> static Expected<ListTableType>

parseListTableHeader(DWARFDataExtractor &DA, uint64_t Offset, parseListTableHeader(DWARFDataExtractor &DA, uint64_t Offset,

DwarfFormat Format) { DwarfFormat Format) {

// We are expected to be called with Offset 0 or pointing just past the table // We are expected to be called with Offset 0 or pointing just past the table

// header. Correct Offset in the latter case so that it points to the start // header. Correct Offset in the latter case so that it points to the start

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines if (!AppendCUDie && !AppendNonCUDies)

return; return;

// Set the offset to that of the first DIE and calculate the start of the // Set the offset to that of the first DIE and calculate the start of the

// next compilation unit header. // next compilation unit header.

uint64_t DIEOffset = getOffset() + getHeaderSize(); uint64_t DIEOffset = getOffset() + getHeaderSize();

uint64_t NextCUOffset = getNextUnitOffset(); uint64_t NextCUOffset = getNextUnitOffset();

DWARFDebugInfoEntry DIE; DWARFDebugInfoEntry DIE;

DWARFDataExtractor DebugInfoData = getDebugInfoExtractor(); DWARFDataExtractor DebugInfoData = getDebugInfoExtractor();

uint32_t Depth = 0; uint32_t Depth = 0;

jhendersonUnsubmitted

Done

DWARFDataExtractor DebugInfoData = getDebugInfoExtractor();

- // It has been already checked by DWARFUnitHeader::extract.

+ // The end offset has been already checked by DWARFUnitHeader::extract.

assert(DebugInfoData.isValidOffset(NextCUOffset - 1));

I think this is a little clearer.

jhenderson: I think this is a little clearer.

bool IsCUDie = true; bool IsCUDie = true;

while (DIE.extractFast(*this, &DIEOffset, DebugInfoData, NextCUOffset, while (DIE.extractFast(*this, &DIEOffset, DebugInfoData, NextCUOffset,

Depth)) { Depth)) {

if (IsCUDie) { if (IsCUDie) {

if (AppendCUDie) if (AppendCUDie)

Dies.push_back(DIE); Dies.push_back(DIE);

if (!AppendNonCUDies) if (!AppendNonCUDies)

break; break;

// The average bytes per DIE entry has been seen to be // The average bytes per DIE entry has been seen to be

// around 14-20 so let's pre-reserve the needed memory for // around 14-20 so let's pre-reserve the needed memory for

// our DIE entries accordingly. // our DIE entries accordingly.

Dies.reserve(Dies.size() + getDebugInfoSize() / 14); Dies.reserve(Dies.size() + getDebugInfoSize() / 14);

IsCUDie = false; IsCUDie = false;

} else { } else {

Dies.push_back(DIE); Dies.push_back(DIE);

} }

if (const DWARFAbbreviationDeclaration *AbbrDecl = if (const DWARFAbbreviationDeclaration *AbbrDecl =

DIE.getAbbreviationDeclarationPtr()) { DIE.getAbbreviationDeclarationPtr()) {

// Normal DIE // Normal DIE

if (AbbrDecl->hasChildren()) if (AbbrDecl->hasChildren())

++Depth; ++Depth;

else if (Depth == 0)

break; // This unit has a single DIE with no children.

} else { } else {

// NULL DIE. // NULL DIE.

if (Depth > 0) if (Depth > 0)

--Depth; --Depth;

if (Depth == 0) if (Depth == 0)

break; // We are done with this compile unit! break; // We are done with this compile unit!

} }

// Give a little bit of info if we encounter corrupt DWARF (our offset

// should always terminate at or before the start of the next compilation

// unit header).

if (DIEOffset > NextCUOffset)

Context.getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF compile unit extends beyond its "

"bounds cu 0x%8.8" PRIx64 " "

"at 0x%8.8" PRIx64 "\n",

getOffset(), DIEOffset));

dblaikieUnsubmitted

Done

What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc?

dblaikie: What was the motivation to move this code and add the Depth check in? I guess this didn't…

jankratochvilAuthorUnsubmitted

Done

That DIE.extractFast just exited on both errors and successful end of DIEs. As it did not report any errors the return code (false) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside DIE.extractFast as that has all the info available. Therefore this error has moved inside DIE.extractFast (if (Offset >= UEndOffset) there).

jankratochvil: That `DIE.extractFast` just exited on both errors and successful end of DIEs. As it did not…

} }

void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) { void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {

if (Error e = tryExtractDIEsIfNeeded(CUDieOnly)) if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))

Context.getRecoverableErrorHandler()(std::move(e)); Context.getRecoverableErrorHandler()(std::move(e));

} }

Error DWARFUnit::tryExtractDIEsIfNeeded(bool CUDieOnly) { Error DWARFUnit::tryExtractDIEsIfNeeded(bool CUDieOnly) {

▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s

This file was added.

				# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=CUEND=1 \
				jhendersonUnsubmitted Done Reply Inline Actions The test deserves an introductory commetn describing what it is testing. Additionally, the name is too generic and possibly slightly misleading "format-warnings.s" suggests that the main aim of the test is to test the formatting of warning messages in general, whereas this test is more about invalid debug abbrev/debug info, so perhaps it could just be debug-entry-invalid.s or something like that. I have a personal preference to not use stdin to drive llvm-dwarfdump, and instead create an object file on disk with llvm-mc. This makes it easier to debug the test should something go wrong, since you can inspect the object without needing to change the test code. jhenderson: The test deserves an introductory commetn describing what it is testing. Additionally, the…
				dblaikieUnsubmitted Done Reply Inline Actions There is somewhat of a convention to prefer piping, because it means the input file name (whatever it happens to be on the buildbot/local filesystem) does not appear in the output - this reduces the chance that FileCheck commands might accidentally match on some part of the input file name - making the test more hermetic/reliable. dblaikie: There is somewhat of a convention to prefer piping, because it means the input file name…
				# RUN: \| llvm-dwarfdump - 2>&1 \| FileCheck --check-prefix=CHECK-CUEND %s
				# CHECK-CUEND: warning: DWARF compile unit extends beyond its bounds cu 0x00000000 at 0x0000001f
				jhendersonUnsubmitted Done Reply Inline Actions It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). Same goes for the remaining messages. Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise. jhenderson: It would probably be a good idea if you used a non-zero value for the cu index. That will help…
				jankratochvilAuthorUnsubmitted Done Reply Inline Actions It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter. jankratochvil: > It would probably be a good idea if you used a non-zero value for the cu index. Done. >…
				jhendersonUnsubmitted Done Reply Inline Actions Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code. jhenderson: Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of…
				jankratochvilAuthorUnsubmitted Done Reply Inline Actions It is true 32-bit buildbots would catch it. jankratochvil: It is true 32-bit buildbots would catch it.
				jhendersonUnsubmitted Not Done Reply Inline Actions I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs. From my understanding, integer promotion of variadic arguments is only as far as `int`/`unsigned int`. The size of `int` is implementation defined and not necessarily the same as `uint32_t` (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using `uint_t` types, you should use their corresponding macros for printing. jhenderson:* I've gone through and highlighted where else I see a type mismatch in your print formats versus…
				jhendersonUnsubmitted Not Done Reply Inline Actions It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter. jhenderson: > > It would probably be a good idea if you used a non-zero value for the cu index. > > Done.

				# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=ABBREVNO=2 \
				# RUN: \| llvm-dwarfdump - 2>&1 \| FileCheck --check-prefix=CHECK-ABBREVNO %s
				# CHECK-ABBREVNO: warning: DWARF abbreviation 0x2 is not valid in cu 0x00000000 at 0x0000000b

				# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=FORMNO=0xdead \
				# RUN: \| llvm-dwarfdump - 2>&1 \| FileCheck --check-prefix=CHECK-FORMNO %s
				# CHECK-FORMNO: warning: DWARF FORM_* 0xdead is not valid in cu 0x00000000 at 0x0000000c

				.section .debug_abbrev,"",@progbits
				.uleb128 1 # Abbreviation Code
				.uleb128 17 # DW_TAG_compile_unit
				.uleb128 1 # DW_CHILDREN_yes
				.uleb128 37 # DW_AT_producer
				.ifndef FORMNO
				.uleb128 8 # DW_FORM_string
				.else
				.uleb128 FORMNO
				jhendersonUnsubmitted Done Reply Inline Actions Indentation here is a little inconsistent. jhenderson: Indentation here is a little inconsistent.
				.endif
				.uleb128 0 # end DW_AT_*
				.uleb128 0 # end DW_FORM_*
				.uleb128 0 # end abbreviation code

				.section .debug_info,"",@progbits
				.Lcu_begin0:
				.long .Lcu_end0-.Lcu_start0 # Length of Unit
				.Lcu_start0:
				.short 4 # DWARF version number
				jhendersonUnsubmitted Done Reply Inline Actions Here and elsewhere, consider lining up your comments vertically. They're a bit all over the place currently. jhenderson: Here and elsewhere, consider lining up your comments vertically. They're a bit all over the…
				.long .debug_abbrev # Offset Into Abbrev. Section
				.byte 8 # Address Size (in bytes)
				.ifndef ABBREVNO
				.uleb128 1 # Abbrev [1] DW_TAG_compile_unit
				.else
				.uleb128 ABBREVNO
				.endif
				.asciz "hand-written DWARF" # DW_AT_producer
				.ifndef CUEND
				.uleb128 0 # End Of Children Mark
				.endif
				.Lcu_end0:

This is an archive of the discontinued LLVM Phabricator instance.

llvm-dwarfdump: Print warnings on invalid DWARFClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352004

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s

llvm-dwarfdump: Print warnings on invalid DWARF
ClosedPublic