This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Fix v5 debug_line parsing of prologues with many files
ClosedPublic

Authored by labath on Mar 20 2020, 7:35 AM.

Details

Summary

The directory_count and file_name_count fields are (section 6.2.4 of
DWARF5 spec) supposed to be uleb128s, not bytes. This bug meant that it
was not possible to correctly parse headers with more than 128 files or
directories.

I've found this bug by code inspection, though the limit is so small
someone would have run into it for real sooner or later. I've verified
that the producer side handles many files correctly, and that we are
able to parse such files after this fix.

Diff Detail

Event Timeline

labath created this revision.Mar 20 2020, 7:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2020, 7:35 AM

Thank you for finding and fixing my embarrassing mistake.

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

If I'm not mistaken, the only way a ULEB could be misencoded can be detected is by it running off the end of the data (although in practice a value greater than max uint64_t will also be detected IIRC). I have a vague memory that the data extractor records an error, if using the appropriate style of parsing, but I don't think we are currently using that type in some cases, so the error is ignored.

llvm/test/tools/llvm-dwarfdump/X86/debug_line_many_files.s
1 ↗(On Diff #251638)

Perhaps worth a brief comment saying why many files is interesting?

Also, perhaps worth renaming this test to indicate it's specific to version 5? At least, this should be in the comment.

65 ↗(On Diff #251638)

Nit: too many blank lines at EOF.

labath updated this revision to Diff 251970.Mar 23 2020, 2:45 AM
  • rename test, add comment

I suppose this should lead to questions about whether mis-encoded ULEBs can cause parsing sadness, and are those handled in a reasonable way?

If I'm not mistaken, the only way a ULEB could be misencoded can be detected is by it running off the end of the data (although in practice a value greater than max uint64_t will also be detected IIRC). I have a vague memory that the data extractor records an error, if using the appropriate style of parsing, but I don't think we are currently using that type in some cases, so the error is ignored.

Yeah, there isn't enough (or any) redundancy in ULEBs to detect an invalid value. These kinds of errors will manifest themselves as random failures further down the line when the parser gets out of sync with the data.

I have looked at changing this function to the Error-aware overloads (I'm currently prototyping the truncating data extractor -- that's how I found this patch), but it gets tricky since some of the parsing is delegated to DWARFFormValue, which is also not error-aware.

jhenderson accepted this revision.Mar 23 2020, 3:19 AM

LGTM, with one minor suggestion.

llvm/test/tools/llvm-dwarfdump/X86/debug_line_many_files_v5.s
2

Perhaps "An object with..." to avoid ambiguity between file/files

This revision is now accepted and ready to land.Mar 23 2020, 3:19 AM

LGTM too, you two are better suited to work out the error handling part.

labath marked an inline comment as done.Mar 24 2020, 7:12 AM
This revision was automatically updated to reflect the committed changes.