This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Fix v5 debug_line parsing of prologues with many files
ClosedPublic

Authored by labath on Mar 20 2020, 7:35 AM.

Download Raw Diff

Details

Reviewers

dblaikie
jhenderson

Commits

rGd381b6a8d3e8: [DWARF] Fix v5 debug_line parsing of prologues with many files

Summary

The directory_count and file_name_count fields are (section 6.2.4 of
DWARF5 spec) supposed to be uleb128s, not bytes. This bug meant that it
was not possible to correctly parse headers with more than 128 files or
directories.

I've found this bug by code inspection, though the limit is so small
someone would have run into it for real sooner or later. I've verified
that the producer side handles many files correctly, and that we are
able to parse such files after this fix.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labath created this revision.Mar 20 2020, 7:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2020, 7:35 AM

Herald added subscribers: hiraditya, aprantl. · View Herald Transcript

Harbormaster completed remote builds in B49889: Diff 251638.Mar 20 2020, 8:38 AM

Thank you for finding and fixing my embarrassing mistake.

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

In D76498#1933793, @probinson wrote:

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

If I'm not mistaken, the only way a ULEB could be misencoded can be detected is by it running off the end of the data (although in practice a value greater than max uint64_t will also be detected IIRC). I have a vague memory that the data extractor records an error, if using the appropriate style of parsing, but I don't think we are currently using that type in some cases, so the error is ignored.

llvm/test/tools/llvm-dwarfdump/X86/debug_line_many_files.s
1	Perhaps worth a brief comment saying why many files is interesting? Also, perhaps worth renaming this test to indicate it's specific to version 5? At least, this should be in the comment.
65	Nit: too many blank lines at EOF.

rename test, add comment

In D76498#1936245, @jhenderson wrote:

In D76498#1933793, @probinson wrote:

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

If I'm not mistaken, the only way a ULEB could be misencoded can be detected is by it running off the end of the data (although in practice a value greater than max uint64_t will also be detected IIRC). I have a vague memory that the data extractor records an error, if using the appropriate style of parsing, but I don't think we are currently using that type in some cases, so the error is ignored.

Yeah, there isn't enough (or any) redundancy in ULEBs to detect an invalid value. These kinds of errors will manifest themselves as random failures further down the line when the parser gets out of sync with the data.

I have looked at changing this function to the Error-aware overloads (I'm currently prototyping the truncating data extractor -- that's how I found this patch), but it gets tricky since some of the parsing is delegated to DWARFFormValue, which is also not error-aware.

LGTM, with one minor suggestion.

llvm/test/tools/llvm-dwarfdump/X86/debug_line_many_files_v5.s
1 ↗	(On Diff #251970)	Perhaps "An object with..." to avoid ambiguity between file/files

This revision is now accepted and ready to land.Mar 23 2020, 3:19 AM

Harbormaster completed remote builds in B50083: Diff 251970.Mar 23 2020, 3:48 AM

LGTM too, you two are better suited to work out the error handling part.

labath marked an inline comment as done.Mar 24 2020, 7:12 AM

Closed by commit rGd381b6a8d3e8: [DWARF] Fix v5 debug_line parsing of prologues with many files (authored by labath). · Explain WhyMar 24 2020, 7:30 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

DebugInfo/

DWARF/

DWARFDebugLine.cpp

8 lines

test/

tools/

llvm-dwarfdump/

X86/

debug_line_many_files.s

65 lines

Diff 251638

llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	parseV5DirFileTables(const DWARFDataExtractor &DebugLineData,
std::vector<DWARFDebugLine::FileNameEntry> &FileNames) {		std::vector<DWARFDebugLine::FileNameEntry> &FileNames) {
// Get the directory entry description.		// Get the directory entry description.
llvm::Expected<ContentDescriptors> DirDescriptors =		llvm::Expected<ContentDescriptors> DirDescriptors =
parseV5EntryFormat(DebugLineData, OffsetPtr, nullptr);		parseV5EntryFormat(DebugLineData, OffsetPtr, nullptr);
if (!DirDescriptors)		if (!DirDescriptors)
return DirDescriptors.takeError();		return DirDescriptors.takeError();

// Get the directory entries, according to the format described above.		// Get the directory entries, according to the format described above.
int DirEntryCount = DebugLineData.getU8(OffsetPtr);		uint64_t DirEntryCount = DebugLineData.getULEB128(OffsetPtr);
for (int I = 0; I != DirEntryCount; ++I) {		for (uint64_t I = 0; I != DirEntryCount; ++I) {
for (auto Descriptor : *DirDescriptors) {		for (auto Descriptor : *DirDescriptors) {
DWARFFormValue Value(Descriptor.Form);		DWARFFormValue Value(Descriptor.Form);
switch (Descriptor.Type) {		switch (Descriptor.Type) {
case DW_LNCT_path:		case DW_LNCT_path:
if (!Value.extractValue(DebugLineData, OffsetPtr, FormParams, &Ctx, U))		if (!Value.extractValue(DebugLineData, OffsetPtr, FormParams, &Ctx, U))
return createStringError(errc::invalid_argument,		return createStringError(errc::invalid_argument,
"failed to parse directory entry because "		"failed to parse directory entry because "
"extracting the form value failed.");		"extracting the form value failed.");
Show All 10 Lines	parseV5DirFileTables(const DWARFDataExtractor &DebugLineData,

// Get the file entry description.		// Get the file entry description.
llvm::Expected<ContentDescriptors> FileDescriptors =		llvm::Expected<ContentDescriptors> FileDescriptors =
parseV5EntryFormat(DebugLineData, OffsetPtr, &ContentTypes);		parseV5EntryFormat(DebugLineData, OffsetPtr, &ContentTypes);
if (!FileDescriptors)		if (!FileDescriptors)
return FileDescriptors.takeError();		return FileDescriptors.takeError();

// Get the file entries, according to the format described above.		// Get the file entries, according to the format described above.
int FileEntryCount = DebugLineData.getU8(OffsetPtr);		uint64_t FileEntryCount = DebugLineData.getULEB128(OffsetPtr);
for (int I = 0; I != FileEntryCount; ++I) {		for (uint64_t I = 0; I != FileEntryCount; ++I) {
DWARFDebugLine::FileNameEntry FileEntry;		DWARFDebugLine::FileNameEntry FileEntry;
for (auto Descriptor : *FileDescriptors) {		for (auto Descriptor : *FileDescriptors) {
DWARFFormValue Value(Descriptor.Form);		DWARFFormValue Value(Descriptor.Form);
if (!Value.extractValue(DebugLineData, OffsetPtr, FormParams, &Ctx, U))		if (!Value.extractValue(DebugLineData, OffsetPtr, FormParams, &Ctx, U))
return createStringError(errc::invalid_argument,		return createStringError(errc::invalid_argument,
"failed to parse file entry because "		"failed to parse file entry because "
"extracting the form value failed.");		"extracting the form value failed.");
switch (Descriptor.Type) {		switch (Descriptor.Type) {
▲ Show 20 Lines • Show All 1,053 Lines • Show Last 20 Lines

llvm/test/tools/llvm-dwarfdump/X86/debug_line_many_files.s

This file was added.

				# RUN: llvm-mc -triple x86_64-pc-linux -filetype=obj %s -o %t
				jhendersonUnsubmitted Not Done Reply Inline Actions Perhaps worth a brief comment saying why many files is interesting? Also, perhaps worth renaming this test to indicate it's specific to version 5? At least, this should be in the comment. jhenderson: Perhaps worth a brief comment saying why many files is interesting? Also, perhaps worth…
				# RUN: llvm-dwarfdump -debug-line %t \| FileCheck %s

				# CHECK: include_directories[ 0] = "/d000"
				# CHECK: include_directories[299] = "/d299"
				# CHECK: file_names[ 0]:
				# CHECK-NEXT: name: "000.c"
				# CHECK-NEXT: dir_index: 0
				# CHECK: file_names[299]:
				# CHECK-NEXT: name: "299.c"
				# CHECK-NEXT: dir_index: 299

				.section .debug_line,"",@progbits
				.long .Lunit_end0-.Lunit_start0 # Length of Unit
				.Lunit_start0:
				.short 5 # DWARF version number
				.byte 8 # Address Size
				.byte 0 # Segment Selector Size
				.long .Lunit_header_end0 - .Lunit_params0 # Length of Prologue (invalid)
				.Lunit_params0:
				.byte 1 # Minimum Instruction Length
				.byte 1 # Maximum Operations per Instruction
				.byte 1 # Default is_stmt
				.byte -5 # Line Base
				.byte 14 # Line Range
				.byte 13 # Opcode Base
				.byte 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 # Standard Opcode Lengths

				# Directory table format
				.byte 1 # One element per directory entry
				.byte 1 # DW_LNCT_path
				.byte 0x08 # DW_FORM_string

				# Directory table entries
				.uleb128 300 # 300 directories
				.irpc a,012
				.irpc b,0123456789
				.irpc c,0123456789
				.byte '/', 'd', '0'+\a, '0'+\b, '0'+\c, 0
				.endr
				.endr
				.endr

				# File table format
				.byte 2 # 2 elements per file entry
				.byte 1 # DW_LNCT_path
				.byte 0x08 # DW_FORM_string
				.byte 2 # DW_LNCT_directory_index
				.byte 0x05 # DW_FORM_data2

				# File table entries
				.uleb128 300 # 300 files
				.irpc a,012
				.irpc b,0123456789
				.irpc c,0123456789
				.byte '0'+\a, '0'+\b, '0'+\c, '.', 'c', 0 # File name
				.word \a100+\b10+\c # Dir index
				.endr
				.endr
				.endr

				.Lunit_header_end0:
				.byte 0, 1, 1 # DW_LNE_end_sequence
				.Lunit_end0:

				jhendersonUnsubmitted Not Done Reply Inline Actions Nit: too many blank lines at EOF. jhenderson: Nit: too many blank lines at EOF.