This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DebugInfo/DWARF/
-
llvm/
-
DebugInfo/
-
DWARF/
5/5
DWARFContext.h
-
DWARFDebugAbbrev.h
-
DWARFUnit.h
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
8/8
DWARFDebugAbbrev.cpp
12/12
DWARFDebugInfoEntry.cpp
12/12
DWARFUnit.cpp
-
test/tools/llvm-dwarfdump/X86/
-
tools/
-
llvm-dwarfdump/
-
X86/
3/3
debug-entry-invalid.s

Differential D104271

llvm-dwarfdump: Print warnings on invalid DWARF
ClosedPublic

Authored by jankratochvil on Jun 14 2021, 3:03 PM.

Download Raw Diff

Details

Reviewers

dblaikie
jhenderson

Commits

rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF

Summary

llvm-dwarfdump was silent even when the format of DWARF was invalid and/or llvm-dwarfdump did not understand/support some of the constructs. This can be pretty confusing as llvm-dwarfdump is a tool for DWARF producers+consumers development.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	120 ms	x64 windows > lld.ELF::non-abs-reloc.s

Event Timeline

jankratochvil created this revision.Jun 14 2021, 3:03 PM

Herald added a reviewer: jhenderson. · View Herald TranscriptJun 14 2021, 3:03 PM

Herald added subscribers: cmtice, hiraditya. · View Herald Transcript

jankratochvil requested review of this revision.Jun 14 2021, 3:03 PM

Herald added a subscriber: MaskRay. · View Herald TranscriptJun 14 2021, 3:03 PM

Harbormaster completed remote builds in B109203: Diff 352004.Jun 14 2021, 3:42 PM

dblaikie added inline comments.Jun 14 2021, 9:05 PM

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37–39	Could probably include the bounds in the message? Maybe something like: "DWARF compile unit extends beyond its bounds [x, y) to z"?
52–55	Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406	What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc?

jhenderson added inline comments.Jun 15 2021, 12:00 AM

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1 ↗	(On Diff #352004)	The test deserves an introductory commetn describing what it is testing. Additionally, the name is too generic and possibly slightly misleading "format-warnings.s" suggests that the main aim of the test is to test the formatting of warning messages in general, whereas this test is more about invalid debug abbrev/debug info, so perhaps it could just be debug-entry-invalid.s or something like that. I have a personal preference to not use stdin to drive llvm-dwarfdump, and instead create an object file on disk with llvm-mc. This makes it easier to debug the test should something go wrong, since you can inspect the object without needing to change the test code.
3 ↗	(On Diff #352004)	It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). Same goes for the remaining messages. Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
21 ↗	(On Diff #352004)	Indentation here is a little inconsistent.
31 ↗	(On Diff #352004)	Here and elsewhere, consider lining up your comments vertically. They're a bit all over the place currently.

dblaikie added inline comments.Jun 15 2021, 11:08 AM

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1 ↗	(On Diff #352004)	There is somewhat of a convention to prefer piping, because it means the input file name (whatever it happens to be on the buildbot/local filesystem) does not appear in the output - this reduces the chance that FileCheck commands might accidentally match on some part of the input file name - making the test more hermetic/reliable.

To satisfy all the required detailed reporting I had to extend the patch far more than I originally inteded. Is it OK this way?

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406	That `DIE.extractFast` just exited on both errors and successful end of DIEs. As it did not report any errors the return code (`false`) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside `DIE.extractFast` as that has all the info available. Therefore this error has moved inside `DIE.extractFast` (`if (Offset >= UEndOffset)` there).
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗	(On Diff #352004)	It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.

jankratochvil updated this revision to Diff 352845.Jun 17 2021, 2:25 PM

jankratochvil marked 2 inline comments as done.

jankratochvil added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103	This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.

jankratochvil updated this revision to Diff 352847.Jun 17 2021, 2:30 PM

Not had time to review the test cases yet, but will do that next week.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375	I'd rename this function to `getSupportedAddressSizes()`, which reads slightly better. Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
74	Lambdas follow variable naming style, so `flush` -> `Flush`.
103	I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it. I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs: std::string DWARFAbbreviationDeclarationSet::getCodeRange() const { // Create a sorted list of all abbrev codes. std::vector<uint32_t> Codes; Codes.reserve(Decls.size()); std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes), [](const DWARFAbbreviationDeclaration &Decl){ return Decl.getCode(); }); std::sort(Codes.begin(), Codes.end()); std::string Buffer = ""; raw_string_ostream Stream(Buffer); // Each iteration through this look represents a single contiguous range in the set of codes. for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) { uint32_t RangeStart = Current; // Add the current range start. Stream << Current; uint32_t RangeEnd = RangeStart; // Find the end of the current range. while(++Current != End && *Current == RangeEnd + 1) ++RangeEnd; // If there is more than one value in the range, add the range end too. if (RangeStart != RangeEnd) Stream << "-" << RangeEnd; // If there is at least one more range, add a separator. if (Current != End) Stream << ", "; } return Buffer; }
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..." Same applies below.
56	It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
411	I think this is a little clearer.
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
2	Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗	(On Diff #352004)	Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.

Harbormaster completed remote builds in B109805: Diff 352847.Jun 18 2021, 6:27 AM

dblaikie added inline comments.Jun 18 2021, 11:49 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103	Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point. @jhenderson's version seems nice, if we're going to do this.

jankratochvil marked 9 inline comments as done.Jun 20 2021, 2:09 PM

jankratochvil added inline comments.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375	String literals don't have lifetime issues, so there's no need for the static local variable. True, I am stupid.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103	TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version. The look ahead instead of a lambda flusher is an interesting idea.
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users. And then we should change an already existing error message: "while reading [0x%x, 0x%x)"
56	I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken. I agree, thanks.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗	(On Diff #352004)	It is true 32-bit buildbots would catch it.

jankratochvil updated this revision to Diff 353241.Jun 20 2021, 2:10 PM

jankratochvil marked 3 inline comments as done.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 20 2021, 2:10 PM

Harbormaster completed remote builds in B110100: Diff 353241.Jun 20 2021, 2:55 PM

jhenderson added inline comments.Jun 21 2021, 12:45 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
81
103	I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it). I'm also happy if you'd prefer to drop the entire thing.
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...". I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.
54	In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.
66	`AbbrCode` is a `uint64_t`. The test will fail on some platforms due to either bitness or endianness issues.
94	`dwarf::Form` is specified to be a `uint16_t` in its declaration.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
305	As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end". (I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).
307	`Size` is a `uint8_t`.
329	I wonder if this error needs putting earlier, in case the header is truncated?
333	`debug_info.size()` returns a `size_t`, so may not always be 64 bits.
337	I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.
341	`getVersion()` returns a `uint16_t`.
349	As above, I don't think you need the end offset here, as it isn't relevant for this message.
350	`getAddressByteSize()` returns a `uint8_t`.
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
26	I'd test both the cases where the offset points to within the header and past the end of the unit.
94	Nit: all the other comments use upper-case for their first letter.
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗	(On Diff #352004)	It would probably be a good idea if you used a non-zero value for the cu index. Done. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
3 ↗	(On Diff #352004)	I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs. From my understanding, integer promotion of variadic arguments is only as far as `int`/`unsigned int`. The size of `int` is implementation defined and not necessarily the same as `uint32_t` (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using `uint*_t` types, you should use their corresponding macros for printing.

jankratochvil marked 13 inline comments as done.Jun 21 2021, 7:24 AM

jankratochvil added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	without the ambiguity risk by saying "with length X at offset Y" instead That is definitely ambiguous as it can mean length `b-a` of `[a, b)` or it can mean length stored in the binary which is `b-a-4` (for DWARF32). I used the offsets with "incl." and "excl." to move forward. The length is actually the thing that's encoded in the DWARF after all. So you did mean the `b-a-4`.

Changes the messages to "incl." and "excl." as I hope that can be acceptable for both of us.
I have newly used joinErrors there which I intended originally and then I forgot about it. It creates a two-line error:

warning: DWARF unit at 0x0000002c cannot be parsed: 
warning: unexpected end of data at offset 0x2d while reading [0x2c, 0x30)

jankratochvil marked 4 inline comments as done.Jun 21 2021, 7:26 AM

Harbormaster completed remote builds in B110196: Diff 353364.Jun 21 2021, 8:03 AM

LGTM, with one possible suggestion, but also happy if this is committed as-is. You might want to wait for @dblaikie too.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375	This is probably absolutely fine, but I was thinking about it and wondering whether it would make some sense to factor out the commonality into some sort of container, that the string function can iterate over to generate a string, and the bool function can just compare values against. What do you think?
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37	Fair point. Let's stick with your latest version.

This revision is now accepted and ready to land.Jun 23 2021, 12:59 AM

jankratochvil updated this revision to Diff 353934.Jun 23 2021, 5:20 AM

jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:22 AM

jankratochvil added inline comments.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375	I did not want to code it that myself first as it looked overengineered to me and it still looks so. But why not if there is an agreement upon it. I agree there is no information duplication then.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
75	I have put there a simple iterator instead of `llvm::transform` as it is really shorter and easier to read.

Just some clang-tidy warnings.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
377
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
319	Nit: fix clang-tidy warnings.

Fixed clang-tidy warnings.

jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:45 AM

Harbormaster completed remote builds in B110606: Diff 353937.Jun 23 2021, 6:25 AM

This revision was landed with ongoing or failed builds.Jun 27 2021, 2:41 AM

Closed by commit rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF (authored by jankratochvil). · Explain Why

This revision was automatically updated to reflect the committed changes.

jankratochvil added a commit: rGc19a28919fc9: llvm-dwarfdump: Print warnings on invalid DWARF.

jankratochvil mentioned this in rGa7afaf901914: Fix lld testsuite after llvm-dwarfdump now errors on invalid DWARF.Jun 27 2021, 3:28 AM

MaskRay mentioned this in rG251640ab5756: [ELF][test] Terminate .debug_info with a null entry to fix warnings.Feb 22 2022, 9:35 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

DWARF/

DWARFContext.h

9 lines

DWARFDebugAbbrev.h

2 lines

DWARFUnit.h

2 lines

lib/

DebugInfo/

DWARF/

DWARFDebugAbbrev.cpp

29 lines

DWARFDebugInfoEntry.cpp

39 lines

DWARFUnit.cpp

86 lines

test/

tools/

llvm-dwarfdump/

X86/

debug-entry-invalid.s

111 lines

Diff 353937

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines public:

DIInliningInfo getInliningInfoForAddress( DIInliningInfo getInliningInfoForAddress(

object::SectionedAddress Address, object::SectionedAddress Address,

DILineInfoSpecifier Specifier = DILineInfoSpecifier()) override; DILineInfoSpecifier Specifier = DILineInfoSpecifier()) override;

std::vector<DILocal> std::vector<DILocal>

getLocalsForAddress(object::SectionedAddress Address) override; getLocalsForAddress(object::SectionedAddress Address) override;

bool isLittleEndian() const { return DObj->isLittleEndian(); } bool isLittleEndian() const { return DObj->isLittleEndian(); }

static unsigned getMaxSupportedVersion() { return 5; }

static bool isSupportedVersion(unsigned version) { static bool isSupportedVersion(unsigned version) {

return version == 2 || version == 3 || version == 4 || version == 5; return version >= 2 && version <= getMaxSupportedVersion();

} }

static SmallVector<uint8_t, 3> getSupportedAddressSizes() {

return {2, 4, 8};

}

static bool isAddressSizeSupported(unsigned AddressSize) { static bool isAddressSizeSupported(unsigned AddressSize) {

jhendersonUnsubmitted

Done

return version >= 2 && version <= getMaxSupportedVersion();

}

- static const char *getAddressSizeSupported() {

- static const char s[] = "2, 4, 8";

- return s;

- }

+ static const char *getAddressSizeSupported() { return "2, 4, 8"; }

static bool isAddressSizeSupported(unsigned AddressSize) {

I'd rename this function to getSupportedAddressSizes(), which reads slightly better.

Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.

jhenderson: I'd rename this function to `getSupportedAddressSizes()`, which reads slightly better. Also…

jankratochvilAuthorUnsubmitted

Done

String literals don't have lifetime issues, so there's no need for the static local variable.

True, I am stupid.

jankratochvil: > String literals don't have lifetime issues, so there's no need for the static local variable.

jhendersonUnsubmitted

Done

This is probably absolutely fine, but I was thinking about it and wondering whether it would make some sense to factor out the commonality into some sort of container, that the string function can iterate over to generate a string, and the bool function can just compare values against. What do you think?

jhenderson: This is probably absolutely fine, but I was thinking about it and wondering whether it would…

jankratochvilAuthorUnsubmitted

Done

I did not want to code it that myself first as it looked overengineered to me and it still looks so. But why not if there is an agreement upon it. I agree there is no information duplication then.

jankratochvil: I did not want to code it that myself first as it looked overengineered to me and it still…

return AddressSize == 2 || AddressSize == 4 || AddressSize == 8; return llvm::any_of(getSupportedAddressSizes(),

[=](auto Elem) { return Elem == AddressSize; });

jhendersonUnsubmitted

Done

return llvm::any_of(getSupportedAddressSizes(),

- [=](auto elem) { return elem == AddressSize; });

+ [=](auto Elem) { return Elem == AddressSize; });

}

std::shared_ptr<DWARFContext> getDWOContext(StringRef AbsolutePath);

jhenderson:

} }

std::shared_ptr<DWARFContext> getDWOContext(StringRef AbsolutePath); std::shared_ptr<DWARFContext> getDWOContext(StringRef AbsolutePath);

const MCRegisterInfo *getRegisterInfo() const { return RegInfo.get(); } const MCRegisterInfo *getRegisterInfo() const { return RegInfo.get(); }

function_ref<void(Error)> getRecoverableErrorHandler() { function_ref<void(Error)> getRecoverableErrorHandler() {

return RecoverableErrorHandler; return RecoverableErrorHandler;

▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
const_iterator begin() const {		const_iterator begin() const {
return Decls.begin();		return Decls.begin();
}		}

const_iterator end() const {		const_iterator end() const {
return Decls.end();		return Decls.end();
}		}

		std::string getCodeRange() const;

private:		private:
void clear();		void clear();
};		};

class DWARFDebugAbbrev {		class DWARFDebugAbbrev {
using DWARFAbbreviationDeclarationSetMap =		using DWARFAbbreviationDeclarationSetMap =
std::map<uint64_t, DWARFAbbreviationDeclarationSet>;		std::map<uint64_t, DWARFAbbreviationDeclarationSet>;

Show All 30 Lines

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h

Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	uint8_t getDwarfStringOffsetsByteSize() const {
return StringOffsetsTableContribution->getDwarfOffsetByteSize();		return StringOffsetsTableContribution->getDwarfOffsetByteSize();
}		}

uint64_t getStringOffsetsBase() const {		uint64_t getStringOffsetsBase() const {
assert(StringOffsetsTableContribution);		assert(StringOffsetsTableContribution);
return StringOffsetsTableContribution->Base;		return StringOffsetsTableContribution->Base;
}		}

		uint64_t getAbbreviationsOffset() const { return Header.getAbbrOffset(); }

const DWARFAbbreviationDeclarationSet *getAbbreviations() const;		const DWARFAbbreviationDeclarationSet *getAbbreviations() const;

static bool isMatchingUnitTypeAndTag(uint8_t UnitType, dwarf::Tag Tag) {		static bool isMatchingUnitTypeAndTag(uint8_t UnitType, dwarf::Tag Tag) {
switch (UnitType) {		switch (UnitType) {
case dwarf::DW_UT_compile:		case dwarf::DW_UT_compile:
return Tag == dwarf::DW_TAG_compile_unit;		return Tag == dwarf::DW_TAG_compile_unit;
case dwarf::DW_UT_type:		case dwarf::DW_UT_type:
return Tag == dwarf::DW_TAG_type_unit;		return Tag == dwarf::DW_TAG_type_unit;
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines if (FirstAbbrCode == UINT32_MAX) {

} }

return nullptr; return nullptr;

} }

if (AbbrCode < FirstAbbrCode || AbbrCode >= FirstAbbrCode + Decls.size()) if (AbbrCode < FirstAbbrCode || AbbrCode >= FirstAbbrCode + Decls.size())

return nullptr; return nullptr;

return &Decls[AbbrCode - FirstAbbrCode]; return &Decls[AbbrCode - FirstAbbrCode];

} }

std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {

// Create a sorted list of all abbrev codes.

std::vector<uint32_t> Codes;

Codes.reserve(Decls.size());

for (const auto &Decl : Decls)

jhendersonUnsubmitted

Done

Lambdas follow variable naming style, so flush -> Flush.

jhenderson: Lambdas follow variable naming style, so `flush` -> `Flush`.

Codes.push_back(Decl.getCode());

jankratochvilAuthorUnsubmitted

Done

I have put there a simple iterator instead of llvm::transform as it is really shorter and easier to read.

jankratochvil: I have put there a simple iterator instead of `llvm::transform` as it is really shorter and…

std::string Buffer = "";

raw_string_ostream Stream(Buffer);

// Each iteration through this loop represents a single contiguous range in

// the set of codes.

for (auto Current = Codes.begin(), End = Codes.end(); Current != End;) {

jhendersonUnsubmitted

Done

raw_string_ostream Stream(Buffer);

- // Each iteration through this look represents a single contiguous range in

+ // Each iteration through this loop represents a single contiguous range in

// the set of codes.

jhenderson:

uint32_t RangeStart = *Current;

// Add the current range start.

Stream << *Current;

uint32_t RangeEnd = RangeStart;

// Find the end of the current range.

while (++Current != End && *Current == RangeEnd + 1)

++RangeEnd;

// If there is more than one value in the range, add the range end too.

if (RangeStart != RangeEnd)

Stream << "-" << RangeEnd;

// If there is at least one more range, add a separator.

if (Current != End)

Stream << ", ";

}

return Buffer;

}

DWARFDebugAbbrev::DWARFDebugAbbrev() { clear(); } DWARFDebugAbbrev::DWARFDebugAbbrev() { clear(); }

void DWARFDebugAbbrev::clear() { void DWARFDebugAbbrev::clear() {

AbbrDeclSets.clear(); AbbrDeclSets.clear();

PrevAbbrOffsetPos = AbbrDeclSets.end(); PrevAbbrOffsetPos = AbbrDeclSets.end();

jankratochvilAuthorUnsubmitted

Done

This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.

jankratochvil: This function look to me as too much code for too little benefit but when @dblaikie has…

jhendersonUnsubmitted

Done

I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it.

I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs:

std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {
  // Create a sorted list of all abbrev codes.
  std::vector<uint32_t> Codes;
  Codes.reserve(Decls.size());
  std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes),
    [](const DWARFAbbreviationDeclaration &Decl){
      return Decl.getCode();
  });
  std::sort(Codes.begin(), Codes.end());

  std::string Buffer = "";
  raw_string_ostream Stream(Buffer);
  // Each iteration through this look represents a single contiguous range in the set of codes.
  for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) {
    uint32_t RangeStart = *Current;
    // Add the current range start.
    Stream << *Current;
    uint32_t RangeEnd = RangeStart;
    // Find the end of the current range.
    while(++Current != End && *Current == RangeEnd + 1)
      ++RangeEnd;
    // If there is more than one value in the range, add the range end too.
    if (RangeStart != RangeEnd)
      Stream << "-" << RangeEnd;
    // If there is at least one more range, add a separator.
    if (Current != End)
      Stream << ", ";
  }
  return Buffer;
}

jhenderson: I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since…

dblaikieUnsubmitted

Done

Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point.

@jhenderson's version seems nice, if we're going to do this.

dblaikie: Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I…

jankratochvilAuthorUnsubmitted

Done

TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version.
The look ahead instead of a lambda flusher is an interesting idea.

jankratochvil: TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers…

jhendersonUnsubmitted

Done

I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it).

I'm also happy if you'd prefer to drop the entire thing.

jhenderson: I don't have a strong opinion about whether they should be sorted or not, and if you would…

} }

void DWARFDebugAbbrev::extract(DataExtractor Data) { void DWARFDebugAbbrev::extract(DataExtractor Data) {

clear(); clear();

this->Data = Data; this->Data = Data;

} }

void DWARFDebugAbbrev::parse() const { void DWARFDebugAbbrev::parse() const {

▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp

//===- DWARFDebugInfoEntry.cpp --------------------------------------------===// //===- DWARFDebugInfoEntry.cpp --------------------------------------------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h" #include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h"

#include "llvm/ADT/Optional.h" #include "llvm/ADT/Optional.h"

#include "llvm/DebugInfo/DWARF/DWARFContext.h"

#include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h" #include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"

#include "llvm/DebugInfo/DWARF/DWARFFormValue.h" #include "llvm/DebugInfo/DWARF/DWARFFormValue.h"

#include "llvm/DebugInfo/DWARF/DWARFUnit.h" #include "llvm/DebugInfo/DWARF/DWARFUnit.h"

#include "llvm/Support/DataExtractor.h" #include "llvm/Support/DataExtractor.h"

#include <cstddef> #include <cstddef>

#include <cstdint> #include <cstdint>

using namespace llvm; using namespace llvm;

using namespace dwarf; using namespace dwarf;

bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U,

uint64_t *OffsetPtr) { uint64_t *OffsetPtr) {

DWARFDataExtractor DebugInfoData = U.getDebugInfoExtractor(); DWARFDataExtractor DebugInfoData = U.getDebugInfoExtractor();

const uint64_t UEndOffset = U.getNextUnitOffset(); const uint64_t UEndOffset = U.getNextUnitOffset();

return extractFast(U, OffsetPtr, DebugInfoData, UEndOffset, 0); return extractFast(U, OffsetPtr, DebugInfoData, UEndOffset, 0);

} }

bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, uint64_t *OffsetPtr, bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, uint64_t *OffsetPtr,

const DWARFDataExtractor &DebugInfoData, const DWARFDataExtractor &DebugInfoData,

uint64_t UEndOffset, uint32_t D) { uint64_t UEndOffset, uint32_t D) {

Offset = *OffsetPtr; Offset = *OffsetPtr;

Depth = D; Depth = D;

if (Offset >= UEndOffset || !DebugInfoData.isValidOffset(Offset)) if (Offset >= UEndOffset) {

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF unit from offset 0x%8.8" PRIx64 " incl. "

jhendersonUnsubmitted

Done

It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."

Same applies below.

jhenderson: It's not going to be clear to the end user that these two values represent offsets. I'd be more…

jankratochvilAuthorUnsubmitted

Done

I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users.
And then we should change an already existing error message: "while reading [0x%x, 0x%x)"

jankratochvil: I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to…

jhendersonUnsubmitted

Done

With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...".

I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.

jhenderson: With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends…

jankratochvilAuthorUnsubmitted

Done

without the ambiguity risk by saying "with length X at offset Y" instead

That is definitely ambiguous as it can mean length b-a of [a, b) or it can mean length stored in the binary which is b-a-4 (for DWARF32). I used the offsets with "incl." and "excl." to move forward.

The length is actually the thing that's encoded in the DWARF after all.

So you did mean the b-a-4.

jankratochvil: > without the ambiguity risk by saying "with length X at offset Y" instead That is definitely…

jhendersonUnsubmitted

Done

Fair point. Let's stick with your latest version.

jhenderson: Fair point. Let's stick with your latest version.

"to offset 0x%8.8" PRIx64 " excl. "

"tries to read DIEs at offset 0x%8.8" PRIx64,

dblaikieUnsubmitted

Done

Could probably include the bounds in the message? Maybe something like:

"DWARF compile unit extends beyond its bounds [x, y) to z"?

dblaikie: Could probably include the bounds in the message? Maybe something like: "DWARF compile unit…

U.getOffset(), U.getNextUnitOffset(), *OffsetPtr));

return false; return false;

}

assert(DebugInfoData.isValidOffset(UEndOffset - 1));

uint64_t AbbrCode = DebugInfoData.getULEB128(OffsetPtr); uint64_t AbbrCode = DebugInfoData.getULEB128(OffsetPtr);

if (0 == AbbrCode) { if (0 == AbbrCode) {

// NULL debug tag entry. // NULL debug tag entry.

AbbrevDecl = nullptr; AbbrevDecl = nullptr;

return true; return true;

} }

if (const auto *AbbrevSet = U.getAbbreviations()) const auto *AbbrevSet = U.getAbbreviations();

if (!AbbrevSet) {

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF unit at offset 0x%8.8" PRIx64 " "

jhendersonUnsubmitted

Done

In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.

jhenderson: In this case, you don't need the end offset of the unit, as it has no impact here - only the…

"contains invalid abbreviation set offset 0x%" PRIx64,

dblaikieUnsubmitted

Done

Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)

dblaikie: Maybe include some details about the offset to the debug_abbrev contribution, and the range of…

U.getOffset(), U.getAbbreviationsOffset()));

jhendersonUnsubmitted

Done

It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

jhenderson: It's not clear to me (when reading the message without the code context) what these last two…

jankratochvilAuthorUnsubmitted

Done

I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

I agree, thanks.

jankratochvil: > I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this…

// Restore the original offset.

*OffsetPtr = Offset;

return false;

}

AbbrevDecl = AbbrevSet->getAbbreviationDeclaration(AbbrCode); AbbrevDecl = AbbrevSet->getAbbreviationDeclaration(AbbrCode);

if (nullptr == AbbrevDecl) { if (!AbbrevDecl) {

U.getContext().getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF unit at offset 0x%8.8" PRIx64 " "

"contains invalid abbreviation %" PRIu64 " at "

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "contains invalid abbreviation %u at "

+ "contains invalid abbreviation %" PRIu64 " at "

"offset 0x%8.8" PRIx64 ", valid abbreviations are %s",

AbbrCode is a uint64_t. The test will fail on some platforms due to either bitness or endianness issues.

jhenderson: `AbbrCode` is a `uint64_t`. The test will fail on some platforms due to either bitness or…

"offset 0x%8.8" PRIx64 ", valid abbreviations are %s",

U.getOffset(), AbbrCode, *OffsetPtr,

AbbrevSet->getCodeRange().c_str()));

// Restore the original offset. // Restore the original offset.

*OffsetPtr = Offset; *OffsetPtr = Offset;

return false; return false;

} }

// See if all attributes in this DIE have fixed byte sizes. If so, we can // See if all attributes in this DIE have fixed byte sizes. If so, we can

// just add this size to the offset to skip to the next DIE. // just add this size to the offset to skip to the next DIE.

if (Optional<size_t> FixedSize = AbbrevDecl->getFixedAttributesByteSize(U)) { if (Optional<size_t> FixedSize = AbbrevDecl->getFixedAttributesByteSize(U)) {

*OffsetPtr += *FixedSize; *OffsetPtr += *FixedSize;

return true; return true;

} }

// Skip all data in the .debug_info for the attributes // Skip all data in the .debug_info for the attributes

for (const auto &AttrSpec : AbbrevDecl->attributes()) { for (const auto &AttrSpec : AbbrevDecl->attributes()) {

// Check if this attribute has a fixed byte size. // Check if this attribute has a fixed byte size.

if (auto FixedSize = AttrSpec.getByteSize(U)) { if (auto FixedSize = AttrSpec.getByteSize(U)) {

// Attribute byte size if fixed, just add the size to the offset. // Attribute byte size if fixed, just add the size to the offset.

*OffsetPtr += *FixedSize; *OffsetPtr += *FixedSize;

} else if (!DWARFFormValue::skipValue(AttrSpec.Form, DebugInfoData, } else if (!DWARFFormValue::skipValue(AttrSpec.Form, DebugInfoData,

OffsetPtr, U.getFormParams())) { OffsetPtr, U.getFormParams())) {

// We failed to skip this attribute's value, restore the original offset // We failed to skip this attribute's value, restore the original offset

// and return the failure status. // and return the failure status.

U.getContext().getWarningHandler()(createStringError(

errc::invalid_argument,

"DWARF unit at offset 0x%8.8" PRIx64 " "

"contains invalid FORM_* 0x%" PRIx16 " at offset 0x%8.8" PRIx64,

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "contains invalid FORM_* 0x%x at offset 0x%8.8" PRIx64,

+ "contains invalid FORM_* 0x%" PRIx16" at offset 0x%8.8" PRIx64,

U.getOffset(), U.getNextUnitOffset(), AttrSpec.Form, *OffsetPtr));

dwarf::Form is specified to be a uint16_t in its declaration.

jhenderson: `dwarf::Form` is specified to be a `uint16_t` in its declaration.

U.getOffset(), AttrSpec.Form, *OffsetPtr));

*OffsetPtr = Offset; *OffsetPtr = Offset;

return false; return false;

} }

return true; return true;

} }

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines bool DWARFUnitHeader::extract(DWARFContext &Context,

} }

if (isTypeUnit()) { if (isTypeUnit()) {

TypeHash = debug_info.getU64(offset_ptr, &Err); TypeHash = debug_info.getU64(offset_ptr, &Err);

TypeOffset = debug_info.getUnsigned( TypeOffset = debug_info.getUnsigned(

offset_ptr, FormParams.getDwarfOffsetByteSize(), &Err); offset_ptr, FormParams.getDwarfOffsetByteSize(), &Err);

} else if (UnitType == DW_UT_split_compile || UnitType == DW_UT_skeleton) } else if (UnitType == DW_UT_split_compile || UnitType == DW_UT_skeleton)

DWOId = debug_info.getU64(offset_ptr, &Err); DWOId = debug_info.getU64(offset_ptr, &Err);

if (errorToBool(std::move(Err))) if (Err) {

Context.getWarningHandler()(joinErrors(

createStringError(

errc::invalid_argument,

"DWARF unit at 0x%8.8" PRIx64 " cannot be parsed:", Offset),

std::move(Err)));

return false; return false;

}

// Header fields all parsed, capture the size of this unit header. // Header fields all parsed, capture the size of this unit header.

assert(*offset_ptr - Offset <= 255 && "unexpected header size"); assert(*offset_ptr - Offset <= 255 && "unexpected header size");

Size = uint8_t(*offset_ptr - Offset); Size = uint8_t(*offset_ptr - Offset);

uint64_t NextCUOffset = Offset + getUnitLengthFieldByteSize() + getLength();

if (!debug_info.isValidOffset(getNextUnitOffset() - 1)) {

Context.getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF unit from offset 0x%8.8" PRIx64 " incl. "

"to offset 0x%8.8" PRIx64 " excl. "

"extends past section size 0x%8.8zx",

Offset, NextCUOffset, debug_info.size()));

return false;

}

if (!DWARFContext::isSupportedVersion(getVersion())) {

Context.getWarningHandler()(createStringError(

errc::invalid_argument,

"DWARF unit at offset 0x%8.8" PRIx64 " "

"has unsupported version %" PRIu16 ", supported are 2-%u",

Offset, getVersion(), DWARFContext::getMaxSupportedVersion()));

return false;

}

// Type offset is unit-relative; should be after the header and before // Type offset is unit-relative; should be after the header and before

// the end of the current unit. // the end of the current unit.

bool TypeOffsetOK = if (isTypeUnit() && TypeOffset < Size) {

!isTypeUnit() Context.getWarningHandler()(

? true createStringError(errc::invalid_argument,

: TypeOffset >= Size && "DWARF type unit at offset "

TypeOffset < getLength() + getUnitLengthFieldByteSize(); "0x%8.8" PRIx64 " "

bool LengthOK = debug_info.isValidOffset(getNextUnitOffset() - 1); "has its relocated type_offset 0x%8.8" PRIx64 " "

bool VersionOK = DWARFContext::isSupportedVersion(getVersion()); "pointing inside the header",

bool AddrSizeOK = DWARFContext::isAddressSizeSupported(getAddressByteSize()); Offset, Offset + TypeOffset));

return false;

jhendersonUnsubmitted

Done

As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end".

(I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).

jhenderson: As this is for DWARF developers, maybe it would be best to use the actual field name as defined…

}

if (isTypeUnit() &&

jhendersonUnsubmitted

Done

"outside of its relative boundary "

- "[0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ")",

+ "[0x%8.8" PRIx8 ", 0x%8.8" PRIx64 ")",

Offset, NextCUOffset, TypeOffset, Size,

Size is a uint8_t.

jhenderson: `Size` is a `uint8_t`.

TypeOffset >= getUnitLengthFieldByteSize() + getLength()) {

Context.getWarningHandler()(createStringError(

errc::invalid_argument,

"DWARF type unit from offset 0x%8.8" PRIx64 " incl. "

"to offset 0x%8.8" PRIx64 " excl. has its "

"relocated type_offset 0x%8.8" PRIx64 " pointing past the unit end",

Offset, NextCUOffset, Offset + TypeOffset));

return false;

}

if (!LengthOK || !VersionOK || !AddrSizeOK || !TypeOffsetOK) if (!DWARFContext::isAddressSizeSupported(getAddressByteSize())) {

SmallVector<std::string, 3> Sizes;

jhendersonUnsubmitted

Done

Nit: fix clang-tidy warnings.

jhenderson: Nit: fix clang-tidy warnings.

for (auto Size : DWARFContext::getSupportedAddressSizes())

Sizes.push_back(std::to_string(Size));

Context.getWarningHandler()(createStringError(

errc::invalid_argument,

"DWARF unit at offset 0x%8.8" PRIx64 " "

"has unsupported address size %" PRIu8 ", supported are %s",

Offset, getAddressByteSize(), llvm::join(Sizes, ", ").c_str()));

return false; return false;

}

jhendersonUnsubmitted

Done

I wonder if this error needs putting earlier, in case the header is truncated?

jhenderson: I wonder if this error needs putting earlier, in case the header is truncated?

// Keep track of the highest DWARF version we encounter across all units. // Keep track of the highest DWARF version we encounter across all units.

Context.setMaxVersionIfGreater(getVersion()); Context.setMaxVersionIfGreater(getVersion());

return true; return true;

} }

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "extends past section size 0x%8.8" PRIx64,

+ "extends past section size 0x%8.8zx",

Offset, NextCUOffset, debug_info.size()));

debug_info.size() returns a size_t, so may not always be 64 bits.

jhenderson: `debug_info.size()` returns a `size_t`, so may not always be 64 bits.

bool DWARFUnitHeader::applyIndexEntry(const DWARFUnitIndex::Entry *Entry) { bool DWARFUnitHeader::applyIndexEntry(const DWARFUnitIndex::Entry *Entry) {

assert(Entry); assert(Entry);

assert(!IndexEntry); assert(!IndexEntry);

jhendersonUnsubmitted

Done

I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.

jhenderson: I'd put this before the type_offset check, as a different DWARF version might not have the type…

IndexEntry = Entry; IndexEntry = Entry;

if (AbbrOffset) if (AbbrOffset)

return false; return false;

auto *UnitContrib = IndexEntry->getContribution(); auto *UnitContrib = IndexEntry->getContribution();

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "has unsupported version %u, supported are 2-%u",

+ "has unsupported version %" PRIu16 ", supported are 2-%u",

Offset, NextCUOffset, getVersion(),

getVersion() returns a uint16_t.

jhenderson: `getVersion()` returns a `uint16_t`.

if (!UnitContrib || if (!UnitContrib ||

UnitContrib->Length != (getLength() + getUnitLengthFieldByteSize())) UnitContrib->Length != (getLength() + getUnitLengthFieldByteSize()))

return false; return false;

auto *AbbrEntry = IndexEntry->getContribution(DW_SECT_ABBREV); auto *AbbrEntry = IndexEntry->getContribution(DW_SECT_ABBREV);

if (!AbbrEntry) if (!AbbrEntry)

return false; return false;

AbbrOffset = AbbrEntry->Offset; AbbrOffset = AbbrEntry->Offset;

return true; return true;

jhendersonUnsubmitted

Done

As above, I don't think you need the end offset here, as it isn't relevant for this message.

jhenderson: As above, I don't think you need the end offset here, as it isn't relevant for this message.

} }

jhendersonUnsubmitted

Done

"DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "

- "has unsupported address size %u, supported are %s",

+ "has unsupported address size " PRIu8 ", supported are %s",

Offset, NextCUOffset, getAddressByteSize(),

getAddressByteSize() returns a uint8_t.

jhenderson: `getAddressByteSize()` returns a `uint8_t`.

// Parse the rangelist table header, including the optional array of offsets // Parse the rangelist table header, including the optional array of offsets

// following it (DWARF v5 and later). // following it (DWARF v5 and later).

template<typename ListTableType> template<typename ListTableType>

static Expected<ListTableType> static Expected<ListTableType>

parseListTableHeader(DWARFDataExtractor &DA, uint64_t Offset, parseListTableHeader(DWARFDataExtractor &DA, uint64_t Offset,

DwarfFormat Format) { DwarfFormat Format) {

// We are expected to be called with Offset 0 or pointing just past the table // We are expected to be called with Offset 0 or pointing just past the table

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines if (!AppendCUDie && !AppendNonCUDies)

return; return;

// Set the offset to that of the first DIE and calculate the start of the // Set the offset to that of the first DIE and calculate the start of the

// next compilation unit header. // next compilation unit header.

uint64_t DIEOffset = getOffset() + getHeaderSize(); uint64_t DIEOffset = getOffset() + getHeaderSize();

uint64_t NextCUOffset = getNextUnitOffset(); uint64_t NextCUOffset = getNextUnitOffset();

DWARFDebugInfoEntry DIE; DWARFDebugInfoEntry DIE;

DWARFDataExtractor DebugInfoData = getDebugInfoExtractor(); DWARFDataExtractor DebugInfoData = getDebugInfoExtractor();

// The end offset has been already checked by DWARFUnitHeader::extract.

jhendersonUnsubmitted

Done

DWARFDataExtractor DebugInfoData = getDebugInfoExtractor();

- // It has been already checked by DWARFUnitHeader::extract.

+ // The end offset has been already checked by DWARFUnitHeader::extract.

assert(DebugInfoData.isValidOffset(NextCUOffset - 1));

I think this is a little clearer.

jhenderson: I think this is a little clearer.

assert(DebugInfoData.isValidOffset(NextCUOffset - 1));

uint32_t Depth = 0; uint32_t Depth = 0;

bool IsCUDie = true; bool IsCUDie = true;

while (DIE.extractFast(*this, &DIEOffset, DebugInfoData, NextCUOffset, while (DIE.extractFast(*this, &DIEOffset, DebugInfoData, NextCUOffset,

Depth)) { Depth)) {

if (IsCUDie) { if (IsCUDie) {

if (AppendCUDie) if (AppendCUDie)

Dies.push_back(DIE); Dies.push_back(DIE);

if (!AppendNonCUDies) if (!AppendNonCUDies)

break; break;

// The average bytes per DIE entry has been seen to be // The average bytes per DIE entry has been seen to be

// around 14-20 so let's pre-reserve the needed memory for // around 14-20 so let's pre-reserve the needed memory for

// our DIE entries accordingly. // our DIE entries accordingly.

Dies.reserve(Dies.size() + getDebugInfoSize() / 14); Dies.reserve(Dies.size() + getDebugInfoSize() / 14);

IsCUDie = false; IsCUDie = false;

} else { } else {

Dies.push_back(DIE); Dies.push_back(DIE);

} }

if (const DWARFAbbreviationDeclaration *AbbrDecl = if (const DWARFAbbreviationDeclaration *AbbrDecl =

DIE.getAbbreviationDeclarationPtr()) { DIE.getAbbreviationDeclarationPtr()) {

// Normal DIE // Normal DIE

if (AbbrDecl->hasChildren()) if (AbbrDecl->hasChildren())

++Depth; ++Depth;

else if (Depth == 0)

break; // This unit has a single DIE with no children.

} else { } else {

// NULL DIE. // NULL DIE.

if (Depth > 0) if (Depth > 0)

--Depth; --Depth;

if (Depth == 0) if (Depth == 0)

break; // We are done with this compile unit! break; // We are done with this compile unit!

} }

// Give a little bit of info if we encounter corrupt DWARF (our offset

// should always terminate at or before the start of the next compilation

// unit header).

if (DIEOffset > NextCUOffset)

Context.getWarningHandler()(

createStringError(errc::invalid_argument,

"DWARF compile unit extends beyond its "

"bounds cu 0x%8.8" PRIx64 " "

"at 0x%8.8" PRIx64 "\n",

getOffset(), DIEOffset));

dblaikieUnsubmitted

Done

What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc?

dblaikie: What was the motivation to move this code and add the Depth check in? I guess this didn't…

jankratochvilAuthorUnsubmitted

Done

That DIE.extractFast just exited on both errors and successful end of DIEs. As it did not report any errors the return code (false) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside DIE.extractFast as that has all the info available. Therefore this error has moved inside DIE.extractFast (if (Offset >= UEndOffset) there).

jankratochvil: That `DIE.extractFast` just exited on both errors and successful end of DIEs. As it did not…

} }

void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) { void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {

if (Error e = tryExtractDIEsIfNeeded(CUDieOnly)) if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))

Context.getRecoverableErrorHandler()(std::move(e)); Context.getRecoverableErrorHandler()(std::move(e));

} }

Error DWARFUnit::tryExtractDIEsIfNeeded(bool CUDieOnly) { Error DWARFUnit::tryExtractDIEsIfNeeded(bool CUDieOnly) {

▲ Show 20 Lines • Show All 374 Lines • ▼ Show 20 Lines if (DieArray[I].getDepth() == Depth + 1 &&

return DWARFDie(this, &DieArray[I]); return DWARFDie(this, &DieArray[I]);

assert(DieArray[I].getDepth() > Depth && "Not processing children?"); assert(DieArray[I].getDepth() > Depth && "Not processing children?");

} }

return DWARFDie(); return DWARFDie();

} }

const DWARFAbbreviationDeclarationSet *DWARFUnit::getAbbreviations() const { const DWARFAbbreviationDeclarationSet *DWARFUnit::getAbbreviations() const {

if (!Abbrevs) if (!Abbrevs)

Abbrevs = Abbrev->getAbbreviationDeclarationSet(Header.getAbbrOffset()); Abbrevs = Abbrev->getAbbreviationDeclarationSet(getAbbreviationsOffset());

return Abbrevs; return Abbrevs;

} }

llvm::Optional<object::SectionedAddress> DWARFUnit::getBaseAddress() { llvm::Optional<object::SectionedAddress> DWARFUnit::getBaseAddress() {

if (BaseAddr) if (BaseAddr)

return BaseAddr; return BaseAddr;

DWARFDie UnitDie = getUnitDIE(); DWARFDie UnitDie = getUnitDIE();

▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s

This file was added.

## Test llvm-dwarfdump detects and reports invalid DWARF format of the file.

jhendersonUnsubmitted

Done

Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives.

jhenderson: Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from…

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=CUEND=1 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=CUEND %s

# CUEND: warning: DWARF unit from offset 0x0000000c incl. to offset 0x0000002b excl. tries to read DIEs at offset 0x0000002b

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=ABBREVSETINVALID=1 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=ABBREVSETINVALID %s

# ABBREVSETINVALID: warning: DWARF unit at offset 0x0000000c contains invalid abbreviation set offset 0x0

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=ABBREVNO=2 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=ABBREVNO %s

# ABBREVNO: warning: DWARF unit at offset 0x0000000c contains invalid abbreviation 2 at offset 0x00000018, valid abbreviations are 1, 5, 3-4

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=FORMNO=0xdead \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=FORMNO %s

# FORMNO: warning: DWARF unit at offset 0x0000000c contains invalid FORM_* 0xdead at offset 0x00000018

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=SHORTINITLEN=1 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=SHORTINITLEN %s

# SHORTINITLEN: warning: DWARF unit at 0x0000002c cannot be parsed:

# SHORTINITLEN-NEXT: warning: unexpected end of data at offset 0x2d while reading [0x2c, 0x30)

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=BADTYPEUNIT=1 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=BADTYPEUNITBEFORE %s

# BADTYPEUNITBEFORE: warning: DWARF type unit at offset 0x0000002c has its relocated type_offset 0x0000002d pointing inside the header

jhendersonUnsubmitted

Done

I'd test both the cases where the offset points to within the header and past the end of the unit.

jhenderson: I'd test both the cases where the offset points to within the header and past the end of the…

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=BADTYPEUNIT=0x100 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=BADTYPEUNITAFTER %s

# BADTYPEUNITAFTER: warning: DWARF type unit from offset 0x0000002c incl. to offset 0x00000045 excl. has its relocated type_offset 0x0000012c pointing past the unit end

# RUN: llvm-mc -triple x86_64-pc-linux %s -filetype=obj --defsym=TOOLONG=1 \

# RUN: | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=TOOLONG %s

# TOOLONG: warning: DWARF unit from offset 0x0000000c incl. to offset 0x0000002d excl. extends past section size 0x0000002c

.section .debug_abbrev,"",@progbits

.ifndef ABBREVSETINVALID

.uleb128 1 # Abbreviation Code

.uleb128 17 # DW_TAG_compile_unit

.uleb128 1 # DW_CHILDREN_yes

.uleb128 37 # DW_AT_producer

.ifndef FORMNO

.uleb128 8 # DW_FORM_string

.else

.uleb128 FORMNO

.endif

.uleb128 0 # end abbrev 1 DW_AT_*

.uleb128 0 # end abbrev 1 DW_FORM_*

.uleb128 5 # Abbreviation Code

.uleb128 10 # DW_TAG_label

.uleb128 0 # DW_CHILDREN_no

.uleb128 0 # end abbrev 4 DW_AT_*

.uleb128 0 # end abbrev 4 DW_FORM_*

.uleb128 3 # Abbreviation Code

.uleb128 10 # DW_TAG_label

.uleb128 0 # DW_CHILDREN_no

.uleb128 0 # end abbrev 3 DW_AT_*

.uleb128 0 # end abbrev 3 DW_FORM_*

.uleb128 4 # Abbreviation Code

.uleb128 10 # DW_TAG_label

.uleb128 0 # DW_CHILDREN_no

.uleb128 0 # end abbrev 4 DW_AT_*

.uleb128 0 # end abbrev 4 DW_FORM_*

.uleb128 0 # end abbrevs section

.endif

.section .debug_info,"",@progbits

## The first CU is here to shift the next CU being really tested to non-zero CU

## offset to check more for error messages.

.long .Lcu_endp-.Lcu_startp # Length of Unit

.Lcu_startp:

.short 4 # DWARF version number

.long .debug_abbrev # Offset Into Abbrev. Section

.byte 8 # Address Size (in bytes)

.uleb128 0 # End Of Children Mark

.Lcu_endp:

.ifndef TOOLONG

.equ TOOLONG, 0

.endif

.long .Lcu_end0-.Lcu_start0 + TOOLONG # Length of Unit

.Lcu_start0:

.short 4 # DWARF version number

.long .debug_abbrev # Offset Into Abbrev. Section

.byte 8 # Address Size (in bytes)

.ifndef ABBREVNO

.uleb128 1 # Abbrev [1] DW_TAG_compile_unit

.else

.uleb128 ABBREVNO

.endif

.asciz "hand-written DWARF" # DW_AT_producer

.ifndef CUEND

.uleb128 0 # End Of Children Mark

.endif

jhendersonUnsubmitted

Done

.ifdef SHORTINITLEN

- .byte 0x55 # too short Length of Unit

+ .byte 0x55 # Too short Length of Unit

.endif

Nit: all the other comments use upper-case for their first letter.

jhenderson: Nit: all the other comments use upper-case for their first letter.

.Lcu_end0:

.ifdef SHORTINITLEN

.byte 0x55 # Too short Length of Unit

.endif

.ifdef BADTYPEUNIT

.long .Lcu_end1-.Lcu_start1 # Length of Unit

.Lcu_start1:

.short 5 # DWARF version number

.byte 2 # DW_UT_type

.byte 8 # Address Size (in bytes)

.long .debug_abbrev # Offset Into Abbrev. Section

.quad 0xbaddefacedfacade # Type Signature

.long BADTYPEUNIT # Type DIE Offset

.uleb128 0 # End Of Children Mark

.Lcu_end1:

.endif

This is an archive of the discontinued LLVM Phabricator instance.

llvm-dwarfdump: Print warnings on invalid DWARFClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 353937

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s

llvm-dwarfdump: Print warnings on invalid DWARF
ClosedPublic