This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add more checks to parsing .debug_pub* sections.
ClosedPublic

Authored by ikudrin on Jul 2 2020, 7:16 AM.

Details

Summary

The patch adds checking for various potential issues in parsing name lookup tables and reporting them as recoverable errors, similarly as we do for other tables.

Diff Detail

Event Timeline

ikudrin created this revision.Jul 2 2020, 7:16 AM
ikudrin edited reviewers, added: labath; removed: espindola.Jul 2 2020, 7:22 AM
dblaikie added inline comments.Jul 2 2020, 3:52 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
58

Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when the caller will add the specific section name?

76–77

Hmm, why only do this on the last one?
If the goal is to be able to parse/dump things that might be a bit broken (which seems generally good) I think we should parse from the length to however long the length says (unless it extends beyond the section) - terminate early if the list terminates early (& warn about the fact that it terminated before its length) & then parse the next thing at current start + length. Don't think that needs a special case for the last one.

78–91

I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended.

Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas.

ikudrin marked 3 inline comments as done.Jul 2 2020, 7:34 PM
ikudrin added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
58

I would be really grateful for the better wording.

76–77

Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment. I meant the set that was just read. The error handlers before the while loop drop the last added set with Sets.pop_back() because even header for it can not be parsed. The error handlers after the loop preserve it and the comment was aimed to explain that difference.

78–91

These wordings are already better than mine. Thanks!

jhenderson added inline comments.Jul 3 2020, 12:48 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30

What's behind the reasoning for no longer using the Cursor throughout?

31

FWIW, I'd prefer this to remain not auto, as it isn't clear to me what the type of Set is from the immediate context.

46

You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator).

58

It seems to me that the caller doesn't add the section name to the message itself, at least in some cases? See the test case. Personally, I think this is fine, although I'd be tempted to be specific and say something about the "pub..." section, probably using a better word, to distinguish it from a .debug_names section.

76–77

In the .debug_line code, we dump as much of the prologue as possible, despite the values not necessarily all having been read. For example, if the standard opcode lengths array was truncated, we'd still dump the values for the header fields. I think it would make sense to drop the pop_back calls entirely, with the possible exception of the one to do with the initial length field, although even then I'm not 100% sure.

78–91

How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context.

ikudrin updated this revision to Diff 275404.Jul 3 2020, 8:34 AM
ikudrin marked 7 inline comments as done.
ikudrin added inline comments.Jul 3 2020, 8:34 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30

The method now reports all encountered errors through RecoverableErrorHandler and does not return Error. The Cursor requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless.

31

OK. I'll rename the variable to NewSet then.

46

The second parameter, Offset, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one.

58

While the term might be a bit confusing, note that the wording "Name lookup tables" is used in DWARFv4 to refer to these sections. The collocation is not used in DWARFv5, where the sections are deprecated. Anyway, I open to suggestions.

76–77

OK. I'll preserve the set for the case when the complete header is not read. You are right, some fields can be dumped even in that case, at least, the length.

jhenderson marked an inline comment as done.Jul 6 2020, 12:38 AM
jhenderson added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30

I'm not sure I follow.

As far as my understanding of Cursor goes, you can have:

DataExtractor::Cursor C(0);
while (C && Data.isValidOffset(C.tell())) {
  // Parse the length
  if (!C) { /* report invalid length, using C.takeError() */ return; }
  // Parse the header
  while (C) { /* parse entries */ }
  if (C && C.tell() != Offset) { /* report bad terminator */ }
}
if (!C) { /* report parsing error using C.takeError() */

The Cursor is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that C.takeError() does not need calling if the Cursor is in a success state, much like Expected). The only case where it might be different is if Cursor is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got:

while (Offset) {
  DataExtractor::Cursor C(Offset);
  ... = Data.getInitialLength(C);
  if (!C) { /* report invalid length, using C.takeError() */ return; }
  // Parse the header
  while (C) { /* parse entries */ }
  if (C && C.tell() != Offset) { /* report bad terminator */ }
  if (!C) { /* report parsing error using C.takeError() */
}

I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around?

37

Perhaps "newly" instead of "lastly".

46

Thanks I misread.

55

Same "lastly" -> "newly" maybe. I feel like it reads a little better.

Also "field" -> "fields"

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s
1

I'd probably fold in this test case now into the other file. I don't think there's any benefit having them separate. Alternatively, this lives separately, and move the other test case into the library testing. The idea is that we test the code in detail with the library tests, and at a high level in the tool tests (i.e. showing we handle the reported output). I don't mind either approach.

llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s
8 ↗(On Diff #275404)

Strictly speaking, we should have an error check for each individual field, not just the header in general. This is because we could be using the non-checking version of the get* functions. This check currently only checks parsing of the offset field, but there's also the version and size fields.

Similar comment applies for the individual entries.

30 ↗(On Diff #275404)

preparly -> properly

61 ↗(On Diff #275404)

contein -> contain

ikudrin updated this revision to Diff 275708.Jul 6 2020, 7:37 AM
ikudrin marked 10 inline comments as done.

Thanks, @jhenderson!

  • Use Cursor in the loop from the beginning.
  • Fix typos.
  • Extended tests.
  • Removed the old test; Moved the checks to the new test file.
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
27–30

I'll take the second one, thanks!

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s
1

OK, I'll move that into the new test. I find using gtest unit tests for things like dumping and error reporting clumsy because they require lots of boilerplate code.

dblaikie added inline comments.Jul 6 2020, 5:59 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91

The suggestion wasn't for brevity, but clarity. I found the original messages unclear & was hoping to clarify them.

What are the two messages in total (with all the added context, for both too short and too long) & how clear are they?

jhenderson added inline comments.Jul 7 2020, 2:21 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91

Taken from the test case:

error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72

(the "no null teminated" bit might differ depending on the exact failure, e.g. "unexpected end of data at offset 0x4c while reading [0x4c, 0x4d)")

error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c
llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4

I think this is the first time I've heard the term "public name sections" being used. Is this called that in the standard? Otherwise, I might suggest using a different phrasing (though don't necessarily know what).

8–9

I don't mind too much either way, especially given the difficulties I recently had with the debug line equivalent test, but is there a particular reason you've kept the two streams separate? By combining them you can show the relative position of output for the common case of the streams being combined.

18

does not -> do not

45–46

For consistency, either offset -> Offset or Length -> length (here and below).

ikudrin updated this revision to Diff 276028.Jul 7 2020, 5:57 AM
ikudrin marked 6 inline comments as done.
  • Fixed typos (again). Thanks, @jhenderson!
  • Updated the test to shrink the number of tool runs.
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91

Thanks, @jhenderson! @dblaikie are you OK with these messages or going to suggest a better alternative?

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4

Well, the standard sometimes uses the term "name lookup tables". Do you think now the comment sound better?

8–9

That is done to improve readability. The error messages are printed during parsing and dumping of all sets in the section comes after that. Thus, if we want to check all the messages at once, the error messages (or dumping messages) have to be separated from the corresponding lines of source code.

jhenderson accepted this revision.Jul 7 2020, 6:01 AM

LGTM, I think, but please wait for @dblaikie.

llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_error_cases.s
4

Looks okay to me.

8–9

Thanks, makes sense.

This revision is now accepted and ready to land.Jul 7 2020, 6:01 AM
dblaikie added inline comments.Jul 7 2020, 5:37 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91

This one sounds OK (guess it could be more precise in this case "bounds reached without finding expected null terminator" perhaps - but I realize that's fairly orthogonal to this patch & could be improved in the general DataExtractor infrastructure) - honestly the verbosity of these messages doesn't seem like a problem to me. They should be pretty rare & when they do come up, the more explicit/precise the better, it seems to me.

error: name lookup table at offset 0x5f parsing failed: no null terminated string at offset 0x72

This one

error: name lookup table at offset 0x75 has an unexpected terminator at offset 0x8c

Still seems like it could be more precise - exactly why was the terminator unexpected? "has a terminator at 0x8c before the expected end at 0x??" perhaps.

jhenderson added inline comments.Jul 8 2020, 12:09 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp
78–91

"has a terminator at 0x8c before the expected end at 0x??" perhaps.

Sounds good to me.

ikudrin updated this revision to Diff 276456.Jul 8 2020, 8:55 AM
  • Updated the wording for the reporting of premature termination.
dblaikie accepted this revision.Jul 8 2020, 11:36 AM

Great, thanks!

jhenderson accepted this revision.Jul 9 2020, 12:54 AM

Latest update LGTM.

This revision was automatically updated to reflect the committed changes.