This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
SyntheticSections.cpp
-
test/ELF/
-
ELF/
-
Inputs/
-
gdb-index.s
-
gdb-index-invalid-pubnames.s
-
gdb-index.s
-
llvm/
-
include/llvm/DebugInfo/DWARF/
-
llvm/
-
DebugInfo/
-
DWARF/
-
DWARFDebugPubTable.h
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
-
DWARFContext.cpp
17/27
DWARFDebugPubTable.cpp
-
test/tools/llvm-dwarfdump/X86/
-
tools/
-
llvm-dwarfdump/
-
X86/
5/8
debug_pub_tables_error_cases.s
-
debug_pub_tables_invalid.s

Differential D83050

[DebugInfo] Add more checks to parsing .debug_pub* sections.
ClosedPublic

Authored by ikudrin on Jul 2 2020, 7:16 AM.

Download Raw Diff

Details

Reviewers

jhenderson
dblaikie
aprantl
probinson
MaskRay
labath
• espindola

Commits

rGca4d8da0c33c: [DebugInfo] Add more checks to parsing .debug_pub* sections.

Summary

The patch adds checking for various potential issues in parsing name lookup tables and reporting them as recoverable errors, similarly as we do for other tables.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ikudrin created this revision.Jul 2 2020, 7:16 AM

Herald added a reviewer: • espindola. · View Herald TranscriptJul 2 2020, 7:16 AM

Herald added subscribers: arphaman, hiraditya, arichardson, emaste. · View Herald Transcript

ikudrin added a parent revision: D83049: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section..Jul 2 2020, 7:16 AM

ikudrin edited reviewers, added: labath; removed: • espindola.Jul 2 2020, 7:22 AM

Herald added a reviewer: • espindola. · View Herald TranscriptJul 2 2020, 7:22 AM

Harbormaster completed remote builds in B62681: Diff 275107.Jul 2 2020, 8:38 AM

dblaikie added inline comments.Jul 2 2020, 3:52 PM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
58	Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when the caller will add the specific section name?
76–77	Hmm, why only do this on the last one? If the goal is to be able to parse/dump things that might be a bit broken (which seems generally good) I think we should parse from the length to however long the length says (unless it extends beyond the section) - terminate early if the list terminates early (& warn about the fact that it terminated before its length) & then parse the next thing at current start + length. Don't think that needs a special case for the last one.
78–91	I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas.

ikudrin marked 3 inline comments as done.Jul 2 2020, 7:34 PM

ikudrin added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
58	I would be really grateful for the better wording.
76–77	Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment. I meant the set that was just read. The error handlers before the `while` loop drop the last added set with `Sets.pop_back()` because even header for it can not be parsed. The error handlers after the loop preserve it and the comment was aimed to explain that difference.
78–91	These wordings are already better than mine. Thanks!

jhenderson mentioned this in D83049: [DebugInfo] Do not hang when parsing a malformed .debug_pub* section..Jul 3 2020, 12:27 AM

jhenderson added inline comments.Jul 3 2020, 12:48 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30	What's behind the reasoning for no longer using the `Cursor` throughout?
31	FWIW, I'd prefer this to remain not `auto`, as it isn't clear to me what the type of `Set` is from the immediate context.
46	You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator).
58	It seems to me that the caller doesn't add the section name to the message itself, at least in some cases? See the test case. Personally, I think this is fine, although I'd be tempted to be specific and say something about the "pub..." section, probably using a better word, to distinguish it from a .debug_names section.
76–77	In the .debug_line code, we dump as much of the prologue as possible, despite the values not necessarily all having been read. For example, if the standard opcode lengths array was truncated, we'd still dump the values for the header fields. I think it would make sense to drop the `pop_back` calls entirely, with the possible exception of the one to do with the initial length field, although even then I'm not 100% sure.
78–91	How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context.

ikudrin updated this revision to Diff 275404.Jul 3 2020, 8:34 AM

ikudrin marked 7 inline comments as done.

Herald added a subscriber: cmtice. · View Herald TranscriptJul 3 2020, 8:34 AM

ikudrin added inline comments.Jul 3 2020, 8:34 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30	The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless.
31	OK. I'll rename the variable to `NewSet` then.
46	The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one.
58	While the term might be a bit confusing, note that the wording "Name lookup tables" is used in DWARFv4 to refer to these sections. The collocation is not used in DWARFv5, where the sections are deprecated. Anyway, I open to suggestions.
76–77	OK. I'll preserve the set for the case when the complete header is not read. You are right, some fields can be dumped even in that case, at least, the length.

jhenderson marked an inline comment as done.Jul 6 2020, 12:38 AM

jhenderson added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30	I'm not sure I follow. As far as my understanding of `Cursor` goes, you can have: DataExtractor::Cursor C(0); while (C && Data.isValidOffset(C.tell())) { // Parse the length if (!C) { /* report invalid length, using C.takeError() / return; } // Parse the header while (C) { / parse entries / } if (C && C.tell() != Offset) { / report bad terminator / } } if (!C) { / report parsing error using C.takeError() / The `Cursor` is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that `C.takeError()` does not need calling if the `Cursor` is in a success state, much like `Expected`). The only case where it might be different is if `Cursor` is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got: while (Offset) { DataExtractor::Cursor C(Offset); ... = Data.getInitialLength(C); if (!C) { / report invalid length, using C.takeError() / return; } // Parse the header while (C) { / parse entries / } if (C && C.tell() != Offset) { / report bad terminator / } if (!C) { / report parsing error using C.takeError() */ } I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around?
37	Perhaps "newly" instead of "lastly".
46	Thanks I misread.
55	Same "lastly" -> "newly" maybe. I feel like it reads a little better. Also "field" -> "fields"
llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s
1	I'd probably fold in this test case now into the other file. I don't think there's any benefit having them separate. Alternatively, this lives separately, and move the other test case into the library testing. The idea is that we test the code in detail with the library tests, and at a high level in the tool tests (i.e. showing we handle the reported output). I don't mind either approach.
llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s
8 ↗	(On Diff #275404)	Strictly speaking, we should have an error check for each individual field, not just the header in general. This is because we could be using the non-checking version of the `get*` functions. This check currently only checks parsing of the offset field, but there's also the version and size fields. Similar comment applies for the individual entries.
30 ↗	(On Diff #275404)	preparly -> properly
61 ↗	(On Diff #275404)	contein -> contain

Thanks, @jhenderson!

Use Cursor in the loop from the beginning.
Fix typos.
Extended tests.
Removed the old test; Moved the checks to the new test file.

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30	I'll take the second one, thanks!
llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s
1	OK, I'll move that into the new test. I find using gtest unit tests for things like dumping and error reporting clumsy because they require lots of boilerplate code.

dblaikie added inline comments.Jul 6 2020, 5:59 PM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91	The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. What are the two messages in total (with all the added context, for both too short and too long) & how clear are they?

jhenderson added inline comments.Jul 7 2020, 2:21 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91	Taken from the test case: error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c
llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4	I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what).
8–9	I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined.
18	does not -> do not
45–46	For consistency, either offset -> Offset or Length -> length (here and below).

Fixed typos (again). Thanks, @jhenderson!
Updated the test to shrink the number of tool runs.

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91	Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative?
llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4	Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment sound better?
8–9	That is done to improve readability. The error messages are printed during parsing and dumping of all sets in the section comes after that. Thus, if we want to check all the messages at once, the error messages (or dumping messages) have to be separated from the corresponding lines of source code.

LGTM, I think, but please wait for @dblaikie.

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4	Looks okay to me.
8–9	Thanks, makes sense.

This revision is now accepted and ready to land.Jul 7 2020, 6:01 AM

dblaikie added inline comments.Jul 7 2020, 5:37 PM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91	This one sounds OK (guess it could be more precise in this case "bounds reached without finding expected null terminator" perhaps - but I realize that's fairly orthogonal to this patch & could be improved in the general DataExtractor infrastructure) - honestly the verbosity of these messages doesn't seem like a problem to me. They should be pretty rare & when they do come up, the more explicit/precise the better, it seems to me. error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 This one error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c Still seems like it could be more precise - exactly why was the terminator unexpected? "has a terminator at 0x8c before the expected end at 0x??" perhaps.

jhenderson added inline comments.Jul 8 2020, 12:09 AM

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91	"has a terminator at 0x8c before the expected end at 0x??" perhaps. Sounds good to me.

Updated the wording for the reporting of premature termination.

Great, thanks!

Latest update LGTM.

Closed by commit rGca4d8da0c33c: [DebugInfo] Add more checks to parsing .debug_pub* sections. (authored by ikudrin). · Explain WhyJul 9 2020, 5:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lld/

ELF/

SyntheticSections.cpp

3 lines

test/

ELF/

Inputs/

gdb-index.s

2 lines

gdb-index-invalid-pubnames.s

2 lines

gdb-index.s

2 lines

llvm/

include/

llvm/

DebugInfo/

DWARF/

DWARFDebugPubTable.h

3 lines

lib/

DebugInfo/

DWARF/

DWARFContext.cpp

3 lines

DWARFDebugPubTable.cpp

74 lines

test/

tools/

llvm-dwarfdump/

X86/

debug_pub_tables_error_cases.s

150 lines

debug_pub_tables_invalid.s

Diff 276709

lld/ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 2,706 Lines • ▼ Show 20 Lines	readPubNamesAndTypes(const LLDDwarfObj<ELFT> &obj,
const std::vector<GdbIndexSection::CuEntry> &cus) {		const std::vector<GdbIndexSection::CuEntry> &cus) {
const LLDDWARFSection &pubNames = obj.getGnuPubnamesSection();		const LLDDWARFSection &pubNames = obj.getGnuPubnamesSection();
const LLDDWARFSection &pubTypes = obj.getGnuPubtypesSection();		const LLDDWARFSection &pubTypes = obj.getGnuPubtypesSection();

std::vector<GdbIndexSection::NameAttrEntry> ret;		std::vector<GdbIndexSection::NameAttrEntry> ret;
for (const LLDDWARFSection *pub : {&pubNames, &pubTypes}) {		for (const LLDDWARFSection *pub : {&pubNames, &pubTypes}) {
DWARFDataExtractor data(obj, *pub, config->isLE, config->wordsize);		DWARFDataExtractor data(obj, *pub, config->isLE, config->wordsize);
DWARFDebugPubTable table;		DWARFDebugPubTable table;
if (Error e = table.extract(data, /GnuStyle=/true))		table.extract(data, /GnuStyle=/true, [&](Error e) {
warn(toString(pub->sec) + ": " + toString(std::move(e)));		warn(toString(pub->sec) + ": " + toString(std::move(e)));
		});
for (const DWARFDebugPubTable::Set &set : table.getData()) {		for (const DWARFDebugPubTable::Set &set : table.getData()) {
// The value written into the constant pool is kind << 24 \| cuIndex. As we		// The value written into the constant pool is kind << 24 \| cuIndex. As we
// don't know how many compilation units precede this object to compute		// don't know how many compilation units precede this object to compute
// cuIndex, we compute (kind << 24 \| cuIndexInThisObject) instead, and add		// cuIndex, we compute (kind << 24 \| cuIndexInThisObject) instead, and add
// the number of preceding compilation units later.		// the number of preceding compilation units later.
uint32_t i = llvm::partition_point(cus,		uint32_t i = llvm::partition_point(cus,
[&](GdbIndexSection::CuEntry cu) {		[&](GdbIndexSection::CuEntry cu) {
return cu.cuOffset < set.Offset;		return cu.cuOffset < set.Offset;
▲ Show 20 Lines • Show All 1,090 Lines • Show Last 20 Lines

lld/test/ELF/Inputs/gdb-index.s

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	.uleb128 0x17			.uleb128 0x17
	.uleb128 0x2131			.uleb128 0x2131
	.uleb128 0x7			.uleb128 0x7
	.byte 0			.byte 0
	.byte 0			.byte 0
	.byte 0			.byte 0

	.section .debug_gnu_pubnames,"",@progbits			.section .debug_gnu_pubnames,"",@progbits
	.long 0x18			.long 0x24
	.value 0x2			.value 0x2
	.long 0			.long 0
	.long 0x33			.long 0x33
	.long 0x18			.long 0x18
	.byte 0x30			.byte 0x30
	.string "aaaaaaaaaaaaaaaa"			.string "aaaaaaaaaaaaaaaa"
	.long 0			.long 0

	Show All 9 Lines

lld/test/ELF/gdb-index-invalid-pubnames.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t			# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t
	# RUN: ld.lld --gdb-index %t -o /dev/null 2>&1 \| FileCheck %s			# RUN: ld.lld --gdb-index %t -o /dev/null 2>&1 \| FileCheck %s

	# CHECK: warning: {{.*}}(.debug_gnu_pubnames): unexpected end of data at offset 0x1 while reading [0x0, 0x4)			# CHECK: warning: {{.*}}(.debug_gnu_pubnames): name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

	.section .debug_abbrev,"",@progbits			.section .debug_abbrev,"",@progbits
	.byte 1 # Abbreviation Code			.byte 1 # Abbreviation Code
	.byte 17 # DW_TAG_compile_unit			.byte 17 # DW_TAG_compile_unit
	.byte 0 # DW_CHILDREN_no			.byte 0 # DW_CHILDREN_no
	.byte 0 # EOM(1)			.byte 0 # EOM(1)
	.byte 0 # EOM(2)			.byte 0 # EOM(2)
	.byte 0 # EOM(3)			.byte 0 # EOM(3)
	Show All 13 Lines

lld/test/ELF/gdb-index.s

	Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	.uleb128 0x17			.uleb128 0x17
	.uleb128 0x2131			.uleb128 0x2131
	.uleb128 0x7			.uleb128 0x7
	.byte 0			.byte 0
	.byte 0			.byte 0
	.byte 0			.byte 0

	.section .debug_gnu_pubnames,"",@progbits			.section .debug_gnu_pubnames,"",@progbits
	.long 0x18			.long 0x1e
	.value 0x2			.value 0x2
	.long 0			.long 0
	.long 0x33			.long 0x33
	.long 0x18			.long 0x18
	.byte 0x30			.byte 0x30
	.string "entrypoint"			.string "entrypoint"
	.long 0			.long 0

	Show All 9 Lines

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	private:

/// gnu styled tables contains additional information.		/// gnu styled tables contains additional information.
/// This flag determines whether or not section we parse is debug_gnu* table.		/// This flag determines whether or not section we parse is debug_gnu* table.
bool GnuStyle = false;		bool GnuStyle = false;

public:		public:
DWARFDebugPubTable() = default;		DWARFDebugPubTable() = default;

Error extract(DWARFDataExtractor Data, bool GnuStyle);		void extract(DWARFDataExtractor Data, bool GnuStyle,
		function_ref<void(Error)> RecoverableErrorHandler);

void dump(raw_ostream &OS) const;		void dump(raw_ostream &OS) const;

ArrayRef<Set> getData() { return Sets; }		ArrayRef<Set> getData() { return Sets; }
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGPUBTABLE_H		#endif // LLVM_DEBUGINFO_DWARF_DWARFDEBUGPUBTABLE_H

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	while (Data.isValidOffset(Offset)) {
}		}
Offset = EndOffset;		Offset = EndOffset;
}		}
}		}

static void dumpPubTableSection(raw_ostream &OS, DIDumpOptions DumpOpts,		static void dumpPubTableSection(raw_ostream &OS, DIDumpOptions DumpOpts,
DWARFDataExtractor Data, bool GnuStyle) {		DWARFDataExtractor Data, bool GnuStyle) {
DWARFDebugPubTable Table;		DWARFDebugPubTable Table;
if (Error E = Table.extract(Data, GnuStyle))		Table.extract(Data, GnuStyle, DumpOpts.RecoverableErrorHandler);
DumpOpts.RecoverableErrorHandler(std::move(E));
Table.dump(OS);		Table.dump(OS);
}		}

void DWARFContext::dump(		void DWARFContext::dump(
raw_ostream &OS, DIDumpOptions DumpOpts,		raw_ostream &OS, DIDumpOptions DumpOpts,
std::array<Optional<uint64_t>, DIDT_ID_Count> DumpOffsets) {		std::array<Optional<uint64_t>, DIDT_ID_Count> DumpOffsets) {
uint64_t DumpType = DumpOpts.DumpType;		uint64_t DumpType = DumpOpts.DumpType;

▲ Show 20 Lines • Show All 1,644 Lines • Show Last 20 Lines

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp

	//===- DWARFDebugPubTable.cpp ---------------------------------------------===//			//===- DWARFDebugPubTable.cpp ---------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/DebugInfo/DWARF/DWARFDebugPubTable.h"			#include "llvm/DebugInfo/DWARF/DWARFDebugPubTable.h"
	#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"			#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/BinaryFormat/Dwarf.h"			#include "llvm/BinaryFormat/Dwarf.h"
	#include "llvm/Support/DataExtractor.h"			#include "llvm/Support/DataExtractor.h"
				#include "llvm/Support/Errc.h"
	#include "llvm/Support/Format.h"			#include "llvm/Support/Format.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <cstdint>			#include <cstdint>

	using namespace llvm;			using namespace llvm;
	using namespace dwarf;			using namespace dwarf;

	Error DWARFDebugPubTable::extract(DWARFDataExtractor Data, bool GnuStyle) {			void DWARFDebugPubTable::extract(
				DWARFDataExtractor Data, bool GnuStyle,
				function_ref<void(Error)> RecoverableErrorHandler) {
	this->GnuStyle = GnuStyle;			this->GnuStyle = GnuStyle;
	Sets.clear();			Sets.clear();
	DataExtractor::Cursor C(0);			uint64_t Offset = 0;
	while (C && Data.isValidOffset(C.tell())) {			while (Data.isValidOffset(Offset)) {
				uint64_t SetOffset = Offset;
	Sets.push_back({});			Sets.push_back({});
				jhendersonUnsubmitted Not Done Reply Inline Actions What's behind the reasoning for no longer using the `Cursor` throughout? jhenderson: What's behind the reasoning for no longer using the `Cursor` throughout?
				ikudrinAuthorUnsubmitted Done Reply Inline Actions The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless. ikudrin: The method now reports all encountered errors through `RecoverableErrorHandler` and does not…
				jhendersonUnsubmitted Done Reply Inline Actions I'm not sure I follow. As far as my understanding of `Cursor` goes, you can have: DataExtractor::Cursor C(0); while (C && Data.isValidOffset(C.tell())) { // Parse the length if (!C) { /* report invalid length, using C.takeError() / return; } // Parse the header while (C) { / parse entries / } if (C && C.tell() != Offset) { / report bad terminator / } } if (!C) { / report parsing error using C.takeError() / The `Cursor` is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that `C.takeError()` does not need calling if the `Cursor` is in a success state, much like `Expected`). The only case where it might be different is if `Cursor` is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got: while (Offset) { DataExtractor::Cursor C(Offset); ... = Data.getInitialLength(C); if (!C) { / report invalid length, using C.takeError() / return; } // Parse the header while (C) { / parse entries / } if (C && C.tell() != Offset) { / report bad terminator / } if (!C) { / report parsing error using C.takeError() / } I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around? jhenderson:* I'm not sure I follow. As far as my understanding of `Cursor` goes, you can have: ```…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions I'll take the second one, thanks! ikudrin: I'll take the second one, thanks!
	Set &SetData = Sets.back();			Set &NewSet = Sets.back();
				jhendersonUnsubmitted Done Reply Inline Actions FWIW, I'd prefer this to remain not `auto`, as it isn't clear to me what the type of `Set` is from the immediate context. jhenderson: FWIW, I'd prefer this to remain not `auto`, as it isn't clear to me what the type of `Set` is…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions OK. I'll rename the variable to `NewSet` then. ikudrin: OK. I'll rename the variable to `NewSet` then.

	std::tie(SetData.Length, SetData.Format) = Data.getInitialLength(C);			DataExtractor::Cursor C(Offset);
	const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(SetData.Format);			std::tie(NewSet.Length, NewSet.Format) = Data.getInitialLength(C);
				if (!C) {
	SetData.Version = Data.getU16(C);			// Drop the newly added set because it does not contain anything useful
	SetData.Offset = Data.getRelocatedValue(C, OffsetSize);			// to dump.
				jhendersonUnsubmitted Done Reply Inline Actions Perhaps "newly" instead of "lastly". jhenderson: Perhaps "newly" instead of "lastly".
	SetData.Size = Data.getUnsigned(C, OffsetSize);			Sets.pop_back();
				RecoverableErrorHandler(createStringError(
				errc::invalid_argument,
				"name lookup table at offset 0x%" PRIx64 " parsing failed: %s",
				SetOffset, toString(C.takeError()).c_str()));
				return;
				}

				Offset = C.tell() + NewSet.Length;
				jhendersonUnsubmitted Done Reply Inline Actions You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator). jhenderson: You probably want to include the expected length of the table in this data extractor too, to…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one. ikudrin: The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the…
				jhendersonUnsubmitted Done Reply Inline Actions Thanks I misread. jhenderson: Thanks I misread.
				DWARFDataExtractor SetData(Data, Offset);
				const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(NewSet.Format);

				NewSet.Version = SetData.getU16(C);
				NewSet.Offset = SetData.getRelocatedValue(C, OffsetSize);
				NewSet.Size = SetData.getUnsigned(C, OffsetSize);

				if (!C) {
				// Preserve the newly added set because at least some fields of the header
				jhendersonUnsubmitted Done Reply Inline Actions Same "lastly" -> "newly" maybe. I feel like it reads a little better. Also "field" -> "fields" jhenderson: Same "lastly" -> "newly" maybe. I feel like it reads a little better. Also "field" -> "fields"
				// are read and can be dumped.
				RecoverableErrorHandler(
				createStringError(errc::invalid_argument,
				dblaikieUnsubmitted Not Done Reply Inline Actions Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when the caller will add the specific section name? dblaikie: Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions I would be really grateful for the better wording. ikudrin: I would be really grateful for the better wording.
				jhendersonUnsubmitted Not Done Reply Inline Actions It seems to me that the caller doesn't add the section name to the message itself, at least in some cases? See the test case. Personally, I think this is fine, although I'd be tempted to be specific and say something about the "pub..." section, probably using a better word, to distinguish it from a .debug_names section. jhenderson: It seems to me that the caller doesn't add the section name to the message itself, at least in…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions While the term might be a bit confusing, note that the wording "Name lookup tables" is used in DWARFv4 to refer to these sections. The collocation is not used in DWARFv5, where the sections are deprecated. Anyway, I open to suggestions. ikudrin: While the term might be a bit confusing, note that the wording "Name lookup tables" is used in…
				"name lookup table at offset 0x%" PRIx64
				" does not have a complete header: %s",
				SetOffset, toString(C.takeError()).c_str()));
				continue;
				}

	while (C) {			while (C) {
	uint64_t DieRef = Data.getUnsigned(C, OffsetSize);			uint64_t DieRef = SetData.getUnsigned(C, OffsetSize);
	if (DieRef == 0)			if (DieRef == 0)
	break;			break;
	uint8_t IndexEntryValue = GnuStyle ? Data.getU8(C) : 0;			uint8_t IndexEntryValue = GnuStyle ? SetData.getU8(C) : 0;
	StringRef Name = Data.getCStrRef(C);			StringRef Name = SetData.getCStrRef(C);
	SetData.Entries.push_back(			if (C)
				NewSet.Entries.push_back(
	{DieRef, PubIndexEntryDescriptor(IndexEntryValue), Name});			{DieRef, PubIndexEntryDescriptor(IndexEntryValue), Name});
	}			}

				if (!C) {
				RecoverableErrorHandler(createStringError(
				dblaikieUnsubmitted Not Done Reply Inline Actions Hmm, why only do this on the last one? If the goal is to be able to parse/dump things that might be a bit broken (which seems generally good) I think we should parse from the length to however long the length says (unless it extends beyond the section) - terminate early if the list terminates early (& warn about the fact that it terminated before its length) & then parse the next thing at current start + length. Don't think that needs a special case for the last one. dblaikie: Hmm, why only do this on the last one? If the goal is to be able to parse/dump things that…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment. I meant the set that was just read. The error handlers before the `while` loop drop the last added set with `Sets.pop_back()` because even header for it can not be parsed. The error handlers after the loop preserve it and the comment was aimed to explain that difference. ikudrin: Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment.
				jhendersonUnsubmitted Not Done Reply Inline Actions In the .debug_line code, we dump as much of the prologue as possible, despite the values not necessarily all having been read. For example, if the standard opcode lengths array was truncated, we'd still dump the values for the header fields. I think it would make sense to drop the `pop_back` calls entirely, with the possible exception of the one to do with the initial length field, although even then I'm not 100% sure. jhenderson: In the .debug_line code, we dump as much of the prologue as possible, despite the values not…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions OK. I'll preserve the set for the case when the complete header is not read. You are right, some fields can be dumped even in that case, at least, the length. ikudrin: OK. I'll preserve the set for the case when the complete header is not read. You are right…
				errc::invalid_argument,
				"name lookup table at offset 0x%" PRIx64 " parsing failed: %s",
				SetOffset, toString(std::move(C.takeError())).c_str()));
				continue;
				}
				if (C.tell() != Offset)
				RecoverableErrorHandler(createStringError(
				errc::invalid_argument,
				"name lookup table at offset 0x%" PRIx64
				" has a terminator at offset 0x%" PRIx64
				" before the expected end at 0x%" PRIx64,
				SetOffset, C.tell() - OffsetSize, Offset - OffsetSize));
	}			}
	return C.takeError();
	}			}
				dblaikieUnsubmitted Not Done Reply Inline Actions I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended. Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas. dblaikie: I think phrasing of these two might use some improvement. "terminated prematurely" actually…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions These wordings are already better than mine. Thanks! ikudrin: These wordings are already better than mine. Thanks!
				jhendersonUnsubmitted Done Reply Inline Actions How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context. jhenderson: How about the first one be just generic, allowing the cursor's error to provide the context…
				dblaikieUnsubmitted Not Done Reply Inline Actions The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them. What are the two messages in total (with all the added context, for both too short and too long) & how clear are they? dblaikie: The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was…
				jhendersonUnsubmitted Not Done Reply Inline Actions Taken from the test case: error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 (the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)") error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c jhenderson: Taken from the test case: ``` error: name lookup table at offset 0x5f parsing failed: no null…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative? ikudrin: Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better…
				dblaikieUnsubmitted Not Done Reply Inline Actions This one sounds OK (guess it could be more precise in this case "bounds reached without finding expected null terminator" perhaps - but I realize that's fairly orthogonal to this patch & could be improved in the general DataExtractor infrastructure) - honestly the verbosity of these messages doesn't seem like a problem to me. They should be pretty rare & when they do come up, the more explicit/precise the better, it seems to me. error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72 This one error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c Still seems like it could be more precise - exactly why was the terminator unexpected? "has a terminator at 0x8c before the expected end at 0x??" perhaps. dblaikie: This one sounds OK (guess it could be more precise in this case "bounds reached without finding…
				jhendersonUnsubmitted Not Done Reply Inline Actions "has a terminator at 0x8c before the expected end at 0x??" perhaps. Sounds good to me. jhenderson: > "has a terminator at 0x8c before the expected end at 0x??" perhaps. Sounds good to me.

	void DWARFDebugPubTable::dump(raw_ostream &OS) const {			void DWARFDebugPubTable::dump(raw_ostream &OS) const {
	for (const Set &S : Sets) {			for (const Set &S : Sets) {
	int OffsetDumpWidth = 2 * dwarf::getDwarfOffsetByteSize(S.Format);			int OffsetDumpWidth = 2 * dwarf::getDwarfOffsetByteSize(S.Format);
	OS << "length = " << format("0x%0*" PRIx64, OffsetDumpWidth, S.Length);			OS << "length = " << format("0x%0*" PRIx64, OffsetDumpWidth, S.Length);
	OS << ", format = " << dwarf::FormatString(S.Format);			OS << ", format = " << dwarf::FormatString(S.Format);
	OS << ", version = " << format("0x%04x", S.Version);			OS << ", version = " << format("0x%04x", S.Version);
	OS << ", unit_offset = "			OS << ", unit_offset = "
	Show All 19 Lines

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s

This file was added.

				# RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t

				## All four name lookup table sections share the same parser, but slightly
				## different code paths are used to reach it. Do a comprehensive check for one
				jhendersonUnsubmitted Not Done Reply Inline Actions I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what). jhenderson: I think this is the first time I've heard the term "public name sections" being used. Is this…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment sound better? ikudrin: Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment…
				jhendersonUnsubmitted Not Done Reply Inline Actions Looks okay to me. jhenderson: Looks okay to me.
				## of the sections and minimal checks for the others.

				# RUN: not llvm-dwarfdump -debug-gnu-pubnames %t 2> %t.err \| FileCheck %s
				# RUN: FileCheck %s --input-file=%t.err --check-prefix=ERR

				jhendersonUnsubmitted Done Reply Inline Actions I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined. jhenderson: I don't mind too much either way, especially given the difficulties I recently had with the…
				ikudrinAuthorUnsubmitted Done Reply Inline Actions That is done to improve readability. The error messages are printed during parsing and dumping of all sets in the section comes after that. Thus, if we want to check all the messages at once, the error messages (or dumping messages) have to be separated from the corresponding lines of source code. ikudrin: That is done to improve readability. The error messages are printed during parsing and dumping…
				jhendersonUnsubmitted Not Done Reply Inline Actions Thanks, makes sense. jhenderson: Thanks, makes sense.
				# RUN: not llvm-dwarfdump -debug-pubnames -debug-pubtypes -debug-gnu-pubtypes %t 2>&1 \| \
				# RUN: FileCheck %s --check-prefix=ERR-MIN

				.section .debug_gnu_pubnames,"",@progbits
				# CHECK: .debug_gnu_pubnames contents:

				## The next few sets do not contain all required fields in the header.
				# ERR: error: name lookup table at offset 0x0 does not have a complete header: unexpected end of data at offset 0x5 while reading [0x4, 0x6)
				# CHECK-NEXT: length = 0x00000001, format = DWARF32, version = 0x0000, unit_offset = 0x00000000, unit_size = 0x00000000
				jhendersonUnsubmitted Done Reply Inline Actions does not -> do not jhenderson: does not -> do not
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet0End-.LSet0 # Length
				.LSet0:
				.byte 1 # Version (truncated)
				.LSet0End:

				# ERR: error: name lookup table at offset 0x5 does not have a complete header: unexpected end of data at offset 0xe while reading [0xb, 0xf)
				# CHECK-NEXT: length = 0x00000005, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000000
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet1End-.LSet1 # Length
				.LSet1:
				.short 2 # Version
				.byte 1, 2, 3 # Debug Info Offset (truncated)
				.LSet1End:

				# ERR: error: name lookup table at offset 0xe does not have a complete header: unexpected end of data at offset 0x1b while reading [0x18, 0x1c)
				# CHECK-NEXT: length = 0x00000009, format = DWARF32, version = 0x0002, unit_offset = 0x00000032, unit_size = 0x00000000
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet2End-.LSet2 # Length
				.LSet2:
				.short 2 # Version
				.long 0x32 # Debug Info Offset
				.byte 1, 2, 3 # Debug Info Length (truncated)
				.LSet2End:

				jhendersonUnsubmitted Done Reply Inline Actions For consistency, either offset -> Offset or Length -> length (here and below). jhenderson: For consistency, either offset -> Offset or Length -> length (here and below).
				## This set is terminated just after the header.
				# ERR: error: name lookup table at offset 0x1b parsing failed: unexpected end of data at offset 0x29 while reading [0x29, 0x2d)
				# CHECK-NEXT: length = 0x0000000a, format = DWARF32, version = 0x0002, unit_offset = 0x00000048, unit_size = 0x00000064
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet3End-.LSet3 # Length
				.LSet3:
				.short 2 # Version
				.long 0x48 # Debug Info Offset
				.long 0x64 # Debug Info Length
				.LSet3End:

				## The offset in the first pair is truncated.
				# ERR: error: name lookup table at offset 0x29 parsing failed: unexpected end of data at offset 0x3a while reading [0x37, 0x3b)
				# CHECK-NEXT: length = 0x0000000d, format = DWARF32, version = 0x0002, unit_offset = 0x000000ac, unit_size = 0x00000036
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet4End-.LSet4 # Length
				.LSet4:
				.short 2 # Version
				.long 0xac # Debug Info Offset
				.long 0x36 # Debug Info Length
				.byte 1, 2, 3 # Offset (truncated)
				.LSet4End:

				## The set is truncated just after the offset of the first pair.
				# ERR: error: name lookup table at offset 0x3a parsing failed: unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)
				# CHECK-NEXT: length = 0x0000000e, format = DWARF32, version = 0x0002, unit_offset = 0x000000e2, unit_size = 0x00000015
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet5End-.LSet5 # Length
				.LSet5:
				.short 2 # Version
				.long 0xe2 # Debug Info Offset
				.long 0x15 # Debug Info Length
				.long 0xf4 # Offset
				.LSet5End:

				## The set is truncated just after the index entry field of the first pair.
				# ERR: error: name lookup table at offset 0x4c parsing failed: no null terminated string at offset 0x5f
				# CHECK-NEXT: length = 0x0000000f, format = DWARF32, version = 0x0002, unit_offset = 0x000000f7, unit_size = 0x00000010
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet6End-.LSet6 # Length
				.LSet6:
				.short 2 # Version
				.long 0xf7 # Debug Info Offset
				.long 0x10 # Debug Info Length
				.long 0xf4 # Offset
				.byte 0x30 # Index Entry
				.LSet6End:

				## This set contains a string which is not properly terminated.
				# ERR: error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72
				# CHECK-NEXT: length = 0x00000012, format = DWARF32, version = 0x0002, unit_offset = 0x00000107, unit_size = 0x0000004b
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NOT: 0x
				.long .LSet7End-.LSet7 # Length
				.LSet7:
				.short 2 # Version
				.long 0x107 # Debug Info Offset
				.long 0x4b # Debug Info Length
				.long 0x111 # Offset
				.byte 0x30 # Index Entry
				.ascii "foo" # The string does not terminate before the set data ends.
				.LSet7End:

				## This set occupies some space after the terminator.
				# ERR: error: name lookup table at offset 0x75 has a terminator at offset 0x8c before the expected end at 0x8d
				# CHECK-NEXT: length = 0x00000018, format = DWARF32, version = 0x0002, unit_offset = 0x00000154, unit_size = 0x000002ac
				# CHECK-NEXT: Offset Linkage Kind Name
				# CHECK-NEXT: 0x0000018e EXTERNAL FUNCTION "foo"
				# CHECK-NOT: 0x
				.long .LSet8End-.LSet8 # Length
				.LSet8:
				.short 2 # Version
				.long 0x154 # Debug Info Offset
				.long 0x2ac # Debug Info Length
				.long 0x18e # Offset
				.byte 0x30 # Index Entry
				.asciz "foo" # Name
				.long 0 # Terminator
				.space 1
				.LSet8End:

				## The remaining space in the section is too short to even contain a unit length
				## field.
				# ERR: error: name lookup table at offset 0x91 parsing failed: unexpected end of data at offset 0x94 while reading [0x91, 0x95)
				# CHECK-NOT: length =
				.space 3

				# ERR-MIN: .debug_pubnames contents:
				# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4)
				# ERR-MIN: .debug_pubtypes contents:
				# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4)
				# ERR-MIN: .debug_gnu_pubtypes contents:
				# ERR-MIN-NEXT: error: name lookup table at offset 0x0 parsing failed: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

				.section .debug_pubnames,"",@progbits
				.byte 0
				.section .debug_pubtypes,"",@progbits
				.byte 0
				.section .debug_gnu_pubtypes,"",@progbits
				.byte 0

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s

This file was deleted.

	# RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t
	# RUN: not llvm-dwarfdump -v %t 2>&1 \| FileCheck %s

	# CHECK: .debug_pubnames contents:
	# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

	# CHECK: .debug_pubtypes contents:
	# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

	# CHECK: .debug_gnu_pubnames contents:
	# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

	# CHECK: .debug_gnu_pubtypes contents:
	# CHECK-NEXT: error: unexpected end of data at offset 0x1 while reading [0x0, 0x4)

	.section .debug_pubnames,"",@progbits
	.byte 0

	.section .debug_pubtypes,"",@progbits
	.byte 0

	.section .debug_gnu_pubnames,"",@progbits
	.byte 0

	.section .debug_gnu_pubtypes,"",@progbits
	.byte 0

This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add more checks to parsing .debug_pub* sections.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 276709

lld/ELF/SyntheticSections.cpp

lld/test/ELF/Inputs/gdb-index.s

lld/test/ELF/gdb-index-invalid-pubnames.s

lld/test/ELF/gdb-index.s

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugPubTable.h

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp

llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s

[DebugInfo] Add more checks to parsing .debug_pub* sections.
ClosedPublic