This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/DebugInfo/DWARF/
-
llvm/
-
DebugInfo/
-
DWARF/
-
DWARFDebugLoc.h
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
9/13
DWARFDebugLoc.cpp
-
DWARFDie.cpp
-
test/DebugInfo/X86/
-
DebugInfo/
-
X86/
1/2
dwarfdump-debug-loc-error-cases.s
-
dwarfdump-debug-loclists-error-cases.s

Differential D63591

DWARFDebugLoc: Make parsing and error reporting more robust
ClosedPublic

Authored by labath on Jun 20 2019, 3:01 AM.

Download Raw Diff

Details

Reviewers

dblaikie
JDevlieghere
probinson

Commits

rGbd546e59026d: DWARFDebugLoc: Make parsing and error reporting more robust
rL370363: DWARFDebugLoc: Make parsing and error reporting more robust

Summary

While examining this class for possible use in lldb, I noticed two
things:

it spits out parsing errors directly to stderr
the loclists parser can incorrectly return valid location lists when parsing malformed (truncated) data

I improve the stderr situation by making the parseOneLocationList
functions return Expected<T>s. The errors are still dumped to stderr by
their callers, so this is only a partial fix, but it is enough for my
use case, as I intend to parse the locations lists one by one.

I fix the behavior in the truncated scenario by using the newly
introduced DataExtractor Cursor API.

I also add tests for handling the error cases, as they currently have no
coverage.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 33675
Build 33674: arc lint + arc unit

Event Timeline

labath created this revision.Jun 20 2019, 3:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 3:01 AM

Harbormaster completed remote builds in B33675: Diff 205770.Jun 20 2019, 3:01 AM

Looks pretty good, and thanks especially for the error-case tests!
I'll give other folks a chance to chime in if they want to.

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
101	This identical createError call occurs many times, maybe add a createLocListOverflowError() helper?
115	You could do `SavedOffset = Offset;` here, and then add a `SavedOffset == Offset` check to the next one. There's no harm to calling a `get*` function with an invalid offset.
218	Maybe put an llvm_unreachable here.
test/DebugInfo/X86/dwarfdump-debug-loc-error-cases.s
1	I was not aware of `--defsym` that looks incredibly useful! In a test that generates multiple .o files I prefer to give each one a unique name, e.g. `%t0.o` and `%t1.o` etc. It can make it easier to debug a broken test.

add createOverflowError helper
use unique file names in tests

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
115	The debug_loc function doesn't use the SavedOffset pattern, because it is always reading data in fixed-size chunks. I think it would be better to keep it that way, as this is slightly more readable.
test/DebugInfo/X86/dwarfdump-debug-loc-error-cases.s
1	I'm writing a bunch of tests in assembly these days, so I've learned a lot of interesting tricks there. :) I'll update the tests to use distinct file names.

Harbormaster completed remote builds in B33682: Diff 205792.Jun 20 2019, 6:25 AM

LGTM but give the West Coast folks a chance to look at it.

This revision is now accepted and ready to land.Jun 20 2019, 6:41 AM

dblaikie added inline comments.Jun 20 2019, 12:47 PM

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
27	I guess "Ts &&... Vals" should be "const Ts &... Vals" since they're taken by const ref by createStringError anyway - no need for the fancy &&.
31	Should this be StringRef rather than const char*?
167–181	Looks to me like getULEB128 doesn't quite have the right error handling, if I'm reading it correctly: unsigned shift = 0; uint32_t offset = offset_ptr; uint8_t byte = 0; while (isValidOffset(offset)) { byte = Data[offset++]; result \|= uint64_t(byte & 0x7f) << shift; shift += 7; if ((byte & 0x80) == 0) break; } offset_ptr = offset; return result; I /imagine/ it shouldn't update offset_ptr if it breaks out of the loop via !isValidOffset, rather than via the break? More broadly, I wonder if we should consider a more convenient way to do error handling here - since it's a bit unfortunate that you've had to split the logic for parsing these things across two switch statements - makes it a bit hard to follow what shape each LLE entry has, since it's spread out like this.
217–218	Given the loop condition is "while (true)" this unreachable seems a bit unnecessary (& the function has non-void return, so if there was a path that got through the loop I imagine the compiler would warn us about that?) Or is this working around a compiler that warns here despite the lack of any path out of the loop?

probinson added inline comments.Jun 20 2019, 1:41 PM

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
217–218	I have had to add llvm_unreachable before in this kind of situation, IIRC, which is why I suggested it. Might not be necessary, if all 3 supported toolchains are smart enough nowadays.

remove fancy references
remove llvm_unreachable

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
31	createStringError uses a printf format string. Taking a StringRef would mean I'd have to add `.str().c_str()` blurb.
167–181	Nice catch about the getULEB function. I'll create a separate patch for that. Overall, I'm not really happy about how the error handling is implemented here. The DataExtractor functions seem to be really good at making sure you don't crash while using them, but they also make it incredibly hard to check the result for errors. We could change them to return an Optional<T> or something, but that would make using them a lot more verbose. It sounds like me like it may be best to have some thing similar to what std::istream has. I.e., have an object which encapsulates three things: the data to parse current offset in that data an error flag This would mean one can still call GetXXX functions in sequence without additional error checking. However, at a suitable point in time (e.g., after parsing a single record/DIE/...), one can have a peek at the error flag to verify that the data he got is actually valid. WDYT?
217–218	The debug_loc parser already uses this pattern without the terminating llvm_unreachable so I'd say we can assume the current compilers are fine with that..

Harbormaster completed remote builds in B33724: Diff 205973.Jun 21 2019, 4:56 AM

labath mentioned this in D63645: [Support] Fix error handling in DataExtractor::get[US]LEB128.Jun 21 2019, 5:08 AM

Removing that llvm_unreachable is fine, in that case.
The idea for error handling for DataExtractor sounds reasonable, looks like adding an error flag wouldn't even increase the size.

In D63591#1553416, @probinson wrote:

The idea for error handling for DataExtractor sounds reasonable, looks like adding an error flag wouldn't even increase the size.

Hmm... Originally I was thinking of building something on top of DataExtractor. Putting the logic *into* the DataExtractor is an interesting idea. I kind of like it (it would solve the problem I had of how to capture the DataExtractor vs. DWARFDataExtractor relationship in the "on top" model), but there's also something that bothers me about that. I think it's the fact that this would make the DataExtractor class stateful, whereas previously it was completely stateless. That may not me all bad, but it would mean the transition has to be done more carefully (watch out for thread races, and other unintended effects of the error bit leaking out). However, it also feels weird to have the error flag be a part of DataExtractor state, while the offset isn't. So, e.g. if one extracts from the DataExtractor using two independent offsets simultaneously, the error state set by one extraction would impact the other. This would be most obvious with the strtab-style data extractors, which almost always get a bunch of completely independent queries.

One way to achieve this while keeping the DataExtractor stateless would be to pass the error flag as an additional argument to the extraction methods, just like the offset is now. But that would make things more verbose, which means one might still want to implement some kind of an abstraction on top of that to keep these things together...

Ah, hadn't considered statefulness. But if you layer another class on top of DataExtractor to handle the error flag, it would have to be replicating all the offset-is-valid checks, because of course DataExtractor itself doesn't return errors.

I have a couple more ideas to toss out there...

A DataExtractorBase class that returns Optional<whatever>, and then DataExtractor layers on top and converts None to zero, which preserves the non-statefulness as well as the current API. This adds some runtime overhead, not sure how much.
Or, a template DataExtractorBase that takes an error-handling class as a parameter (sort of like how STL containers take an allocator) and a DataExtractor specialization uses a no-op error-handling class. Should avoid runtime overhead at the cost of template cruft.

In D63591#1553600, @probinson wrote:

Ah, hadn't considered statefulness. But if you layer another class on top of DataExtractor to handle the error flag, it would have to be replicating all the offset-is-valid checks, because of course DataExtractor itself doesn't return errors.

Not really. The data extractor kind of does return errors, as we've seen in this patch, it's just that they're incredibly hard to check for. I was thinking of having the new class rely on the SavedOffset == Offset behavior, but only internally. In my prototype, I managed to tuck it all away into a single template function which takes a member function pointer argument :D. Unfortunately, that was ICE-ing on gcc :P, but that was because I was beeing too clever -- I'm sure it can be dumbed down a bit.

I have a couple more ideas to toss out there...

A DataExtractorBase class that returns Optional<whatever>, and then DataExtractor layers on top and converts None to zero, which preserves the non-statefulness as well as the current API. This adds some runtime overhead, not sure how much.

That would work. It wouldn't even have to be a separate class, if you just make sure the function names are somehow different. However, it would mean that one has to explicitly check the result of every get operation. Not the end of the world, but it would make the code using it more noisy.

Or, a template DataExtractorBase that takes an error-handling class as a parameter (sort of like how STL containers take an allocator) and a DataExtractor specialization uses a no-op error-handling class. Should avoid runtime overhead at the cost of template cruft.

I'm not exactly sure how you imagined that, but I'm sure it could be made to work, as templates can be made to do almost anything. :P I'm not sure if it would be simpler than having a wrapper class or not. I have a feeling it might end up looking fairly similar from the outside.

Pick whatever mechanism you like, we should debate it in that patch not here. :-)

Given figuring out error handling for DataExtractor is perhaps a wider issue - if you want to go ahead with this change (continue with the review & defer error handling improvements for later, leave a FIXME, etc) that seems fine.

labath mentioned this in rL364169: [Support] Fix error handling in DataExtractor::get[US]LEB128.Jun 24 2019, 2:12 AM

labath mentioned this in rGbb6d0b8e7b0d: [Support] Fix error handling in DataExtractor::get[US]LEB128.

Leave a TODO in the code.

Harbormaster completed remote builds in B33785: Diff 206205.Jun 24 2019, 6:08 AM

In D63591#1553757, @dblaikie wrote:

Given figuring out error handling for DataExtractor is perhaps a wider issue - if you want to go ahead with this change (continue with the review & defer error handling improvements for later, leave a FIXME, etc) that seems fine.

How about this ? Theoretically I could also back out the SavedOffset changes. The main thing I was trying to fix is the stderr messages, this is just something I found while trying to write tests for the error handling code. I'm not too worried about the extra "zero" location lists being reported, as those are unlikely to be valid (but it would definitely be nice to fix them).

I also have a kind of a WIP patch for doing the error handling in a better way. I'm going to put that up separately so we can discuss it there.

PS: I'm going to have about two more patches here to make this stuff usable from lldb.

labath mentioned this in D63713: Add error handling to the DataExtractor class.Jun 24 2019, 6:29 AM

DataExtractor is a copy of the one from LLDB from a while back and changes have been made to adapt it to llvm. DataExtractor was designed so that you can have one of them (like for .debug_info or any other DWARF section) and use this same extractor from multiple threads. This is why it is currently stateless.

One solution to allowing for correct error handling would be to replace the current "uint32_t *offset_ptr" arguments to DataExtractor decoding functions with a "DataCursor &Pos" where DataCursor is something like:

class DataCursor {
  llvm::Expected<uint32_t> OffsetOrError;
};

Then all of the state like the offset and any error state. Or it could be two members, an offset and an error.

The main issues is to not decrease parsing performance by introducing error checking on each byte. The current DataExtractor will return zeroes when things fail to extract, which is kind of tuned for DWARF since zeros are not valid DW_TAG, DW_AT, DW_FORM and many other DWARF values. But it does allow for fast parsing. The idea was to quickly try and parse a bunch of data, and then make sure things are ok after doing some work (like parsing an entire DIE). So be careful with any changes to ensure DWARF parsing doesn't seriously regress.

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
181	We should switch the LEB functions in DataExtractor over to use the ones from: #include <llvm/Support/LEB128.h and use the: inline uint64_t decodeULEB128(const uint8_t p, unsigned n = nullptr, const uint8_t end = nullptr, const char error = nullptr); inline int64_t decodeSLEB128(const uint8_t p, unsigned n = nullptr, const uint8_t end = nullptr, const char **error = nullptr); functions... They have all the error checking and are quite efficient. since DataExtractor had been converted from LLDB over into LLVM, the person that moved DataExtractor into LLVM hadn't realized these functions (might have been me) were there when the move happened.

labath mentioned this in rGb1f29cec2511: Add error handling to the DataExtractor class.Aug 27 2019, 4:27 AM

labath mentioned this in rL370042: Add error handling to the DataExtractor class.Aug 27 2019, 4:33 AM

Rebase the patch on top of DataExtractor Cursor changes.

labath requested review of this revision.Aug 27 2019, 5:22 AM

labath edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B37356: Diff 217378.Aug 27 2019, 5:23 AM

LGTM

JDevlieghere accepted this revision.Aug 27 2019, 8:35 AM

This revision is now accepted and ready to land.Aug 27 2019, 8:35 AM

Closed by commit rL370363: DWARFDebugLoc: Make parsing and error reporting more robust (authored by labath). · Explain WhyAug 29 2019, 7:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

DebugInfo/

DWARF/

DWARFDebugLoc.h

6 lines

lib/

DebugInfo/

DWARF/

DWARFDebugLoc.cpp

86 lines

DWARFDie.cpp

12 lines

test/

DebugInfo/

X86/

dwarfdump-debug-loc-error-cases.s

52 lines

dwarfdump-debug-loclists-error-cases.s

62 lines

Diff 205770

include/llvm/DebugInfo/DWARF/DWARFDebugLoc.h

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	public:

/// Parse the debug_loc section accessible via the 'data' parameter using the		/// Parse the debug_loc section accessible via the 'data' parameter using the
/// address size also given in 'data' to interpret the address ranges.		/// address size also given in 'data' to interpret the address ranges.
void parse(const DWARFDataExtractor &data);		void parse(const DWARFDataExtractor &data);

/// Return the location list at the given offset or nullptr.		/// Return the location list at the given offset or nullptr.
LocationList const *getLocationListAtOffset(uint64_t Offset) const;		LocationList const *getLocationListAtOffset(uint64_t Offset) const;

Optional<LocationList> parseOneLocationList(DWARFDataExtractor Data,		static Expected<LocationList> parseOneLocationList(DWARFDataExtractor Data,
uint32_t *Offset);		uint32_t *Offset);
};		};

class DWARFDebugLoclists {		class DWARFDebugLoclists {
public:		public:
struct Entry {		struct Entry {
uint8_t Kind;		uint8_t Kind;
uint64_t Value0;		uint64_t Value0;
uint64_t Value1;		uint64_t Value1;
Show All 20 Lines
public:		public:
void parse(DataExtractor data, unsigned Version);		void parse(DataExtractor data, unsigned Version);
void dump(raw_ostream &OS, uint64_t BaseAddr, const MCRegisterInfo *RegInfo,		void dump(raw_ostream &OS, uint64_t BaseAddr, const MCRegisterInfo *RegInfo,
Optional<uint64_t> Offset) const;		Optional<uint64_t> Offset) const;

/// Return the location list at the given offset or nullptr.		/// Return the location list at the given offset or nullptr.
LocationList const *getLocationListAtOffset(uint64_t Offset) const;		LocationList const *getLocationListAtOffset(uint64_t Offset) const;

static Optional<LocationList>		static Expected<LocationList>
parseOneLocationList(DataExtractor Data, unsigned *Offset, unsigned Version);		parseOneLocationList(DataExtractor Data, unsigned *Offset, unsigned Version);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGLOC_H		#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGLOC_H

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp

Show All 17 Lines
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cinttypes>		#include <cinttypes>
#include <cstdint>		#include <cstdint>

using namespace llvm;		using namespace llvm;

		template <typename... Ts>
		static llvm::Error createError(const char *Fmt, Ts &&... Vals) {
		dblaikieUnsubmitted Done Reply Inline Actions I guess "Ts &&... Vals" should be "const Ts &... Vals" since they're taken by const ref by createStringError anyway - no need for the fancy &&. dblaikie: I guess "Ts &&... Vals" should be "const Ts &... Vals" since they're taken by const ref by…
		return createStringError(inconvertibleErrorCode(), Fmt, Vals...);
		}

// When directly dumping the .debug_loc without a compile unit, we have to guess		// When directly dumping the .debug_loc without a compile unit, we have to guess
		dblaikieUnsubmitted Done Reply Inline Actions Should this be StringRef rather than const char? dblaikie:* Should this be StringRef rather than const char*?
		labathAuthorUnsubmitted Done Reply Inline Actions createStringError uses a printf format string. Taking a StringRef would mean I'd have to add `.str().c_str()` blurb. labath: createStringError uses a printf format string. Taking a StringRef would mean I'd have to add `.
// at the DWARF version. This only affects DW_OP_call_ref, which is a rare		// at the DWARF version. This only affects DW_OP_call_ref, which is a rare
// expression that LLVM doesn't produce. Guessing the wrong version means we		// expression that LLVM doesn't produce. Guessing the wrong version means we
// won't be able to pretty print expressions in DWARF2 binaries produced by		// won't be able to pretty print expressions in DWARF2 binaries produced by
// non-LLVM tools.		// non-LLVM tools.
static void dumpExpression(raw_ostream &OS, ArrayRef<char> Data,		static void dumpExpression(raw_ostream &OS, ArrayRef<char> Data,
bool IsLittleEndian, unsigned AddressSize,		bool IsLittleEndian, unsigned AddressSize,
const MCRegisterInfo MRI, DWARFUnit U) {		const MCRegisterInfo MRI, DWARFUnit U) {
DWARFDataExtractor Extractor(StringRef(Data.data(), Data.size()),		DWARFDataExtractor Extractor(StringRef(Data.data(), Data.size()),
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (Offset) {
return;		return;
}		}

for (const LocationList &L : Locations) {		for (const LocationList &L : Locations) {
DumpLocationList(L);		DumpLocationList(L);
}		}
}		}

Optional<DWARFDebugLoc::LocationList>		Expected<DWARFDebugLoc::LocationList>
DWARFDebugLoc::parseOneLocationList(DWARFDataExtractor Data, unsigned *Offset) {		DWARFDebugLoc::parseOneLocationList(DWARFDataExtractor Data, unsigned *Offset) {
LocationList LL;		LocationList LL;
LL.Offset = *Offset;		LL.Offset = *Offset;

// 2.6.2 Location Lists		// 2.6.2 Location Lists
// A location list entry consists of:		// A location list entry consists of:
while (true) {		while (true) {
Entry E;		Entry E;
if (!Data.isValidOffsetForDataOfSize(Offset, 2 Data.getAddressSize())) {		if (!Data.isValidOffsetForDataOfSize(Offset, 2 Data.getAddressSize()))
WithColor::error() << "location list overflows the debug_loc section.\n";		return createError("location list overflows the debug_loc section");
		probinsonUnsubmitted Done Reply Inline Actions This identical createError call occurs many times, maybe add a createLocListOverflowError() helper? probinson: This identical createError call occurs many times, maybe add a createLocListOverflowError()…
return None;
}

// 1. A beginning address offset. ...		// 1. A beginning address offset. ...
E.Begin = Data.getRelocatedAddress(Offset);		E.Begin = Data.getRelocatedAddress(Offset);

// 2. An ending address offset. ...		// 2. An ending address offset. ...
E.End = Data.getRelocatedAddress(Offset);		E.End = Data.getRelocatedAddress(Offset);

// The end of any given location list is marked by an end of list entry,		// The end of any given location list is marked by an end of list entry,
// which consists of a 0 for the beginning address offset and a 0 for the		// which consists of a 0 for the beginning address offset and a 0 for the
// ending address offset.		// ending address offset.
if (E.Begin == 0 && E.End == 0)		if (E.Begin == 0 && E.End == 0)
return LL;		return LL;

if (!Data.isValidOffsetForDataOfSize(*Offset, 2)) {		if (!Data.isValidOffsetForDataOfSize(*Offset, 2))
		probinsonUnsubmitted Not Done Reply Inline Actions You could do `SavedOffset = Offset;` here, and then add a `SavedOffset == Offset` check to the next one. There's no harm to calling a `get` function with an invalid offset. probinson:* You could do `SavedOffset = Offset;` here, and then add a `SavedOffset == Offset` check to…
		labathAuthorUnsubmitted Done Reply Inline Actions The debug_loc function doesn't use the SavedOffset pattern, because it is always reading data in fixed-size chunks. I think it would be better to keep it that way, as this is slightly more readable. labath: The debug_loc function doesn't use the SavedOffset pattern, because it is always reading data…
WithColor::error() << "location list overflows the debug_loc section.\n";		return createError("location list overflows the debug_loc section");
return None;
}

unsigned Bytes = Data.getU16(Offset);		unsigned Bytes = Data.getU16(Offset);
if (!Data.isValidOffsetForDataOfSize(*Offset, Bytes)) {		if (!Data.isValidOffsetForDataOfSize(*Offset, Bytes))
WithColor::error() << "location list overflows the debug_loc section.\n";		return createError("location list overflows the debug_loc section");
return None;
}
// A single location description describing the location of the object...		// A single location description describing the location of the object...
StringRef str = Data.getData().substr(*Offset, Bytes);		StringRef str = Data.getData().substr(*Offset, Bytes);
*Offset += Bytes;		*Offset += Bytes;
E.Loc.reserve(str.size());		E.Loc.reserve(str.size());
llvm::copy(str, std::back_inserter(E.Loc));		llvm::copy(str, std::back_inserter(E.Loc));
LL.Entries.push_back(std::move(E));		LL.Entries.push_back(std::move(E));
}		}
}		}

void DWARFDebugLoc::parse(const DWARFDataExtractor &data) {		void DWARFDebugLoc::parse(const DWARFDataExtractor &data) {
IsLittleEndian = data.isLittleEndian();		IsLittleEndian = data.isLittleEndian();
AddressSize = data.getAddressSize();		AddressSize = data.getAddressSize();

uint32_t Offset = 0;		uint32_t Offset = 0;
while (data.isValidOffset(Offset + data.getAddressSize() - 1)) {		while (data.isValidOffset(Offset + data.getAddressSize() - 1)) {
if (auto LL = parseOneLocationList(data, &Offset))		if (auto LL = parseOneLocationList(data, &Offset))
Locations.push_back(std::move(*LL));		Locations.push_back(std::move(*LL));
else		else {
		logAllUnhandledErrors(LL.takeError(), WithColor::error());
break;		break;
}		}
		}
if (data.isValidOffset(Offset))		if (data.isValidOffset(Offset))
WithColor::error() << "failed to consume entire .debug_loc section\n";		WithColor::error() << "failed to consume entire .debug_loc section\n";
}		}

Optional<DWARFDebugLoclists::LocationList>		Expected<DWARFDebugLoclists::LocationList>
DWARFDebugLoclists::parseOneLocationList(DataExtractor Data, unsigned *Offset,		DWARFDebugLoclists::parseOneLocationList(DataExtractor Data, unsigned *Offset,
unsigned Version) {		unsigned Version) {
LocationList LL;		LocationList LL;
LL.Offset = *Offset;		LL.Offset = *Offset;

// dwarf::DW_LLE_end_of_list_entry is 0 and indicates the end of the list.		while (true) {
while (auto Kind =		if (!Data.isValidOffset(*Offset))
static_cast<dwarf::LocationListEntry>(Data.getU8(Offset))) {		return createError("location list overflows the debug_loclists section");

Entry E;		Entry E;
E.Kind = Kind;		E.Kind = static_cast<dwarf::LocationListEntry>(Data.getU8(Offset));
switch (Kind) {		if (E.Kind == dwarf::DW_LLE_end_of_list)
		return LL;

		unsigned SavedOffset = *Offset;
		switch (E.Kind) {
case dwarf::DW_LLE_startx_length:		case dwarf::DW_LLE_startx_length:
		case dwarf::DW_LLE_offset_pair:
E.Value0 = Data.getULEB128(Offset);		E.Value0 = Data.getULEB128(Offset);
		break;
		case dwarf::DW_LLE_start_length:
		case dwarf::DW_LLE_base_address:
		E.Value0 = Data.getAddress(Offset);
		break;
		default:
		return createError("LLE of kind %x not implemented", (int)E.Kind);
		}
		if (SavedOffset == *Offset)
		return createError("location list overflows the debug_loclists section");

		SavedOffset = *Offset;
		switch (E.Kind) {
		case dwarf::DW_LLE_startx_length:
		dblaikieUnsubmitted Not Done Reply Inline Actions Looks to me like getULEB128 doesn't quite have the right error handling, if I'm reading it correctly: unsigned shift = 0; uint32_t offset = offset_ptr; uint8_t byte = 0; while (isValidOffset(offset)) { byte = Data[offset++]; result \|= uint64_t(byte & 0x7f) << shift; shift += 7; if ((byte & 0x80) == 0) break; } offset_ptr = offset; return result; I /imagine/ it shouldn't update offset_ptr if it breaks out of the loop via !isValidOffset, rather than via the break? More broadly, I wonder if we should consider a more convenient way to do error handling here - since it's a bit unfortunate that you've had to split the logic for parsing these things across two switch statements - makes it a bit hard to follow what shape each LLE entry has, since it's spread out like this. dblaikie: Looks to me like getULEB128 doesn't quite have the right error handling, if I'm reading it…
		labathAuthorUnsubmitted Done Reply Inline Actions Nice catch about the getULEB function. I'll create a separate patch for that. Overall, I'm not really happy about how the error handling is implemented here. The DataExtractor functions seem to be really good at making sure you don't crash while using them, but they also make it incredibly hard to check the result for errors. We could change them to return an Optional<T> or something, but that would make using them a lot more verbose. It sounds like me like it may be best to have some thing similar to what std::istream has. I.e., have an object which encapsulates three things: the data to parse current offset in that data an error flag This would mean one can still call GetXXX functions in sequence without additional error checking. However, at a suitable point in time (e.g., after parsing a single record/DIE/...), one can have a peek at the error flag to verify that the data he got is actually valid. WDYT? labath: Nice catch about the getULEB function. I'll create a separate patch for that. Overall, I'm not…
		clayborgUnsubmitted Not Done Reply Inline Actions We should switch the LEB functions in DataExtractor over to use the ones from: #include <llvm/Support/LEB128.h and use the: inline uint64_t decodeULEB128(const uint8_t p, unsigned n = nullptr, const uint8_t end = nullptr, const char error = nullptr); inline int64_t decodeSLEB128(const uint8_t p, unsigned n = nullptr, const uint8_t end = nullptr, const char error = nullptr); functions... They have all the error checking and are quite efficient. since DataExtractor had been converted from LLDB over into LLVM, the person that moved DataExtractor into LLVM hadn't realized these functions (might have been me) were there when the move happened. clayborg:** We should switch the LEB functions in DataExtractor over to use the ones from: ``` #include…
// Pre-DWARF 5 has different interpretation of the length field. We have		// Pre-DWARF 5 has different interpretation of the length field. We have
// to support both pre- and standartized styles for the compatibility.		// to support both pre- and standartized styles for the compatibility.
if (Version < 5)		if (Version < 5)
E.Value1 = Data.getU32(Offset);		E.Value1 = Data.getU32(Offset);
else		else
E.Value1 = Data.getULEB128(Offset);		E.Value1 = Data.getULEB128(Offset);
break;		break;
case dwarf::DW_LLE_start_length:		case dwarf::DW_LLE_start_length:
E.Value0 = Data.getAddress(Offset);
E.Value1 = Data.getULEB128(Offset);
break;
case dwarf::DW_LLE_offset_pair:		case dwarf::DW_LLE_offset_pair:
E.Value0 = Data.getULEB128(Offset);
E.Value1 = Data.getULEB128(Offset);		E.Value1 = Data.getULEB128(Offset);
break;		break;
case dwarf::DW_LLE_base_address:		case dwarf::DW_LLE_base_address:
E.Value0 = Data.getAddress(Offset);
break;		break;
default:
WithColor::error() << "dumping support for LLE of kind " << (int)Kind
<< " not implemented\n";
return None;
}		}
		if (E.Kind != dwarf::DW_LLE_base_address) {
		if (SavedOffset == *Offset)
		return createError(
		"location list overflows the debug_loclists section");

if (Kind != dwarf::DW_LLE_base_address) {		SavedOffset = *Offset;
unsigned Bytes =		unsigned Bytes =
Version >= 5 ? Data.getULEB128(Offset) : Data.getU16(Offset);		Version >= 5 ? Data.getULEB128(Offset) : Data.getU16(Offset);
		if (SavedOffset == *Offset \|\|
		!Data.isValidOffsetForDataOfSize(*Offset, Bytes))
		return createError(
		"location list overflows the debug_loclists section");

// A single location description describing the location of the object...		// A single location description describing the location of the object...
StringRef str = Data.getData().substr(*Offset, Bytes);		StringRef str = Data.getData().substr(*Offset, Bytes);
*Offset += Bytes;		*Offset += Bytes;
E.Loc.resize(str.size());		E.Loc.resize(str.size());
llvm::copy(str, E.Loc.begin());		llvm::copy(str, E.Loc.begin());
}		}

LL.Entries.push_back(std::move(E));		LL.Entries.push_back(std::move(E));
}		}
return LL;
}		}
		probinsonUnsubmitted Not Done Reply Inline Actions Maybe put an llvm_unreachable here. probinson: Maybe put an llvm_unreachable here.
		dblaikieUnsubmitted Done Reply Inline Actions Given the loop condition is "while (true)" this unreachable seems a bit unnecessary (& the function has non-void return, so if there was a path that got through the loop I imagine the compiler would warn us about that?) Or is this working around a compiler that warns here despite the lack of any path out of the loop? dblaikie: Given the loop condition is "while (true)" this unreachable seems a bit unnecessary (& the…
		probinsonUnsubmitted Done Reply Inline Actions I have had to add llvm_unreachable before in this kind of situation, IIRC, which is why I suggested it. Might not be necessary, if all 3 supported toolchains are smart enough nowadays. probinson: I have had to add llvm_unreachable before in this kind of situation, IIRC, which is why I…
		labathAuthorUnsubmitted Done Reply Inline Actions The debug_loc parser already uses this pattern without the terminating llvm_unreachable so I'd say we can assume the current compilers are fine with that.. labath: The debug_loc parser already uses this pattern without the terminating llvm_unreachable so I'd…

void DWARFDebugLoclists::parse(DataExtractor data, unsigned Version) {		void DWARFDebugLoclists::parse(DataExtractor data, unsigned Version) {
IsLittleEndian = data.isLittleEndian();		IsLittleEndian = data.isLittleEndian();
AddressSize = data.getAddressSize();		AddressSize = data.getAddressSize();

uint32_t Offset = 0;		uint32_t Offset = 0;
while (data.isValidOffset(Offset)) {		while (data.isValidOffset(Offset)) {
if (auto LL = parseOneLocationList(data, &Offset, Version))		if (auto LL = parseOneLocationList(data, &Offset, Version))
Locations.push_back(std::move(*LL));		Locations.push_back(std::move(*LL));
else		else {
		logAllUnhandledErrors(LL.takeError(), WithColor::error());
return;		return;
}		}
}		}
		}

DWARFDebugLoclists::LocationList const *		DWARFDebugLoclists::LocationList const *
DWARFDebugLoclists::getLocationListAtOffset(uint64_t Offset) const {		DWARFDebugLoclists::getLocationListAtOffset(uint64_t Offset) const {
auto It = llvm::bsearch(		auto It = llvm::bsearch(
Locations, [=](const LocationList &L) { return Offset <= L.Offset; });		Locations, [=](const LocationList &L) { return Offset <= L.Offset; });
if (It != Locations.end() && It->Offset == Offset)		if (It != Locations.end() && It->Offset == Offset)
return &(*It);		return &(*It);
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

lib/DebugInfo/DWARF/DWARFDie.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	static void dumpLocation(raw_ostream &OS, DWARFFormValue &FormValue,

FormValue.dump(OS, DumpOpts);		FormValue.dump(OS, DumpOpts);
if (FormValue.isFormClass(DWARFFormValue::FC_SectionOffset)) {		if (FormValue.isFormClass(DWARFFormValue::FC_SectionOffset)) {
uint32_t Offset = *FormValue.getAsSectionOffset();		uint32_t Offset = *FormValue.getAsSectionOffset();
if (!U->isDWOUnit() && !U->getLocSection()->Data.empty()) {		if (!U->isDWOUnit() && !U->getLocSection()->Data.empty()) {
DWARFDebugLoc DebugLoc;		DWARFDebugLoc DebugLoc;
DWARFDataExtractor Data(Obj, *U->getLocSection(), Ctx.isLittleEndian(),		DWARFDataExtractor Data(Obj, *U->getLocSection(), Ctx.isLittleEndian(),
Obj.getAddressSize());		Obj.getAddressSize());
auto LL = DebugLoc.parseOneLocationList(Data, &Offset);		if (auto LL = DebugLoc.parseOneLocationList(Data, &Offset)) {
if (LL) {
uint64_t BaseAddr = 0;		uint64_t BaseAddr = 0;
if (Optional<object::SectionedAddress> BA = U->getBaseAddress())		if (Optional<object::SectionedAddress> BA = U->getBaseAddress())
BaseAddr = BA->Address;		BaseAddr = BA->Address;
LL->dump(OS, Ctx.isLittleEndian(), Obj.getAddressSize(), MRI, U,		LL->dump(OS, Ctx.isLittleEndian(), Obj.getAddressSize(), MRI, U,
BaseAddr, Indent);		BaseAddr, Indent);
} else		} else
OS << "error extracting location list.";		logAllUnhandledErrors(LL.takeError(), OS,
		"error extracting location list:");
return;		return;
}		}

bool UseLocLists = !U->isDWOUnit();		bool UseLocLists = !U->isDWOUnit();
StringRef LoclistsSectionData =		StringRef LoclistsSectionData =
UseLocLists ? Obj.getLoclistsSection().Data : U->getLocSectionData();		UseLocLists ? Obj.getLoclistsSection().Data : U->getLocSectionData();

if (!LoclistsSectionData.empty()) {		if (!LoclistsSectionData.empty()) {
Show All 9 Lines	if (!LoclistsSectionData.empty()) {

uint64_t BaseAddr = 0;		uint64_t BaseAddr = 0;
if (Optional<object::SectionedAddress> BA = U->getBaseAddress())		if (Optional<object::SectionedAddress> BA = U->getBaseAddress())
BaseAddr = BA->Address;		BaseAddr = BA->Address;

if (LL)		if (LL)
LL->dump(OS, BaseAddr, Ctx.isLittleEndian(), Obj.getAddressSize(), MRI,		LL->dump(OS, BaseAddr, Ctx.isLittleEndian(), Obj.getAddressSize(), MRI,
U, Indent);		U, Indent);
else		else {
OS << "error extracting location list.";		logAllUnhandledErrors(LL.takeError(), OS,
		"error extracting location list:");
		}
}		}
}		}
}		}

/// Dump the name encoded in the type tag.		/// Dump the name encoded in the type tag.
static void dumpTypeTagName(raw_ostream &OS, dwarf::Tag T) {		static void dumpTypeTagName(raw_ostream &OS, dwarf::Tag T) {
StringRef TagStr = TagString(T);		StringRef TagStr = TagString(T);
if (!TagStr.startswith("DW_TAG_") \|\| !TagStr.endswith("_type"))		if (!TagStr.startswith("DW_TAG_") \|\| !TagStr.endswith("_type"))
▲ Show 20 Lines • Show All 597 Lines • Show Last 20 Lines

test/DebugInfo/X86/dwarfdump-debug-loc-error-cases.s

This file was added.

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE1=0 -o %t.o
				probinsonUnsubmitted Not Done Reply Inline Actions I was not aware of `--defsym` that looks incredibly useful! In a test that generates multiple .o files I prefer to give each one a unique name, e.g. `%t0.o` and `%t1.o` etc. It can make it easier to debug a broken test. probinson: I was not aware of `--defsym` that looks incredibly useful! In a test that generates multiple .
				labathAuthorUnsubmitted Done Reply Inline Actions I'm writing a bunch of tests in assembly these days, so I've learned a lot of interesting tricks there. :) I'll update the tests to use distinct file names. labath: I'm writing a bunch of tests in assembly these days, so I've learned a lot of interesting…
				# RUN: llvm-dwarfdump -debug-loc %t.o 2>&1 \| FileCheck %s --check-prefix=CONSUME

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE2=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loc %t.o 2>&1 \| FileCheck %s --check-prefix=CONSUME

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE3=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loc %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE4=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loc %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE5=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loc %t.o 2>&1 \| FileCheck %s

				# CONSUME: error: failed to consume entire .debug_loc section

				# CHECK: error: location list overflows the debug_loc section

				.section .debug_loc,"",@progbits
				.ifdef CASE1
				.byte 1 # bogus
				.endif
				.ifdef CASE2
				.long 0 # starting offset
				.endif
				.ifdef CASE3
				.long 0 # starting offset
				.long 1 # ending offset
				.endif
				.ifdef CASE4
				.long 0 # starting offset
				.long 1 # ending offset
				.word 0 # Loc expr size
				.endif
				.ifdef CASE5
				.long 0 # starting offset
				.long 1 # ending offset
				.word 0 # Loc expr size
				.long 0 # starting offset
				.endif

				# A minimal compile unit is needed to deduce the address size of the location
				# lists
				.section .debug_info,"",@progbits
				.long .Lcu_end0-.Lcu_begin0 # Length of Unit
				.Lcu_begin0:
				.short 4 # DWARF version number
				.long 0 # Offset Into Abbrev. Section
				.byte 8 # Address Size (in bytes)
				.byte 0 # End Of Children Mark
				.Lcu_end0:

test/DebugInfo/X86/dwarfdump-debug-loclists-error-cases.s

This file was added.

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE1=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE2=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE3=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE4=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE5=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s

				# RUN: llvm-mc %s -filetype obj -triple x86_64-pc-linux --defsym CASE6=0 -o %t.o
				# RUN: llvm-dwarfdump -debug-loclists %t.o 2>&1 \| FileCheck %s --check-prefix=UNIMPL

				# CHECK: error: location list overflows the debug_loclists section

				# UNIMPL: error: LLE of kind 47 not implemented

				.section .debug_loclists,"",@progbits
				.long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0
				.Ldebug_loclist_table_start0:
				.short 5 # Version.
				.byte 8 # Address size.
				.byte 0 # Segment selector size.
				.long 0 # Offset entry count.
				.Lloclists_table_base0:
				.Ldebug_loc0:
				.ifdef CASE1
				.byte 4 # DW_LLE_offset_pair
				.endif
				.ifdef CASE2
				.byte 4 # DW_LLE_offset_pair
				.uleb128 0x0 # starting offset
				.endif
				.ifdef CASE3
				.byte 4 # DW_LLE_offset_pair
				.uleb128 0x0 # starting offset
				.uleb128 0x10 # ending offset
				.endif
				.ifdef CASE4
				.byte 4 # DW_LLE_offset_pair
				.uleb128 0x0 # starting offset
				.uleb128 0x10 # ending offset
				.byte 1 # Loc expr size
				.endif
				.ifdef CASE5
				.byte 4 # DW_LLE_offset_pair
				.uleb128 0x0 # starting offset
				.uleb128 0x10 # ending offset
				.byte 1 # Loc expr size
				.byte 117 # DW_OP_breg5
				.endif
				.ifdef CASE6
				.byte 0x47
				.endif

				.Ldebug_loclist_table_end0:

This is an archive of the discontinued LLVM Phabricator instance.

DWARFDebugLoc: Make parsing and error reporting more robustClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 205770

include/llvm/DebugInfo/DWARF/DWARFDebugLoc.h

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp

lib/DebugInfo/DWARF/DWARFDie.cpp

test/DebugInfo/X86/dwarfdump-debug-loc-error-cases.s

test/DebugInfo/X86/dwarfdump-debug-loclists-error-cases.s

DWARFDebugLoc: Make parsing and error reporting more robust
ClosedPublic