This is an archive of the discontinued LLVM Phabricator instance.

llvm-dwarfdump: Print warnings on invalid DWARF
ClosedPublic

Authored by jankratochvil on Jun 14 2021, 3:03 PM.

Details

Summary

llvm-dwarfdump was silent even when the format of DWARF was invalid and/or llvm-dwarfdump did not understand/support some of the constructs. This can be pretty confusing as llvm-dwarfdump is a tool for DWARF producers+consumers development.

Diff Detail

Event Timeline

jankratochvil created this revision.Jun 14 2021, 3:03 PM
jankratochvil requested review of this revision.Jun 14 2021, 3:03 PM
dblaikie added inline comments.Jun 14 2021, 9:05 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37–39

Could probably include the bounds in the message? Maybe something like:

"DWARF compile unit extends beyond its bounds [x, y) to z"?

52–55

Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406

What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc?

jhenderson added inline comments.Jun 15 2021, 12:00 AM
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1 ↗(On Diff #352004)

The test deserves an introductory commetn describing what it is testing.

Additionally, the name is too generic and possibly slightly misleading "format-warnings.s" suggests that the main aim of the test is to test the formatting of warning messages in general, whereas this test is more about invalid debug abbrev/debug info, so perhaps it could just be debug-entry-invalid.s or something like that.

I have a personal preference to not use stdin to drive llvm-dwarfdump, and instead create an object file on disk with llvm-mc. This makes it easier to debug the test should something go wrong, since you can inspect the object without needing to change the test code.

3 ↗(On Diff #352004)

It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).

Same goes for the remaining messages.

Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.

21 ↗(On Diff #352004)

Indentation here is a little inconsistent.

31 ↗(On Diff #352004)

Here and elsewhere, consider lining up your comments vertically. They're a bit all over the place currently.

dblaikie added inline comments.Jun 15 2021, 11:08 AM
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
1 ↗(On Diff #352004)

There is somewhat of a convention to prefer piping, because it means the input file name (whatever it happens to be on the buildbot/local filesystem) does not appear in the output - this reduces the chance that FileCheck commands might accidentally match on some part of the input file name - making the test more hermetic/reliable.

jankratochvil marked 7 inline comments as done.Jun 17 2021, 2:24 PM

To satisfy all the required detailed reporting I had to extend the patch far more than I originally inteded. Is it OK this way?

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
387–406

That DIE.extractFast just exited on both errors and successful end of DIEs. As it did not report any errors the return code (false) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside DIE.extractFast as that has all the info available. Therefore this error has moved inside DIE.extractFast (if (Offset >= UEndOffset) there).

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗(On Diff #352004)

It would probably be a good idea if you used a non-zero value for the cu index.

Done.

That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).

That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.

jankratochvil marked 2 inline comments as done.
jankratochvil added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103

This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.

Not had time to review the test cases yet, but will do that next week.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375

I'd rename this function to getSupportedAddressSizes(), which reads slightly better.

Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
74

Lambdas follow variable naming style, so flush -> Flush.

103

I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it.

I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs:

std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {
  // Create a sorted list of all abbrev codes.
  std::vector<uint32_t> Codes;
  Codes.reserve(Decls.size());
  std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes),
    [](const DWARFAbbreviationDeclaration &Decl){
      return Decl.getCode();
  });
  std::sort(Codes.begin(), Codes.end());

  std::string Buffer = "";
  raw_string_ostream Stream(Buffer);
  // Each iteration through this look represents a single contiguous range in the set of codes.
  for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) {
    uint32_t RangeStart = *Current;
    // Add the current range start.
    Stream << *Current;
    uint32_t RangeEnd = RangeStart;
    // Find the end of the current range.
    while(++Current != End && *Current == RangeEnd + 1)
      ++RangeEnd;
    // If there is more than one value in the range, add the range end too.
    if (RangeStart != RangeEnd)
      Stream << "-" << RangeEnd;
    // If there is at least one more range, add a separator.
    if (Current != End)
      Stream << ", ";
  }
  return Buffer;
}
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37

It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."

Same applies below.

56

It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
411

I think this is a little clearer.

llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
2

Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives.

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗(On Diff #352004)

Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.

dblaikie added inline comments.Jun 18 2021, 11:49 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103

Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point.

@jhenderson's version seems nice, if we're going to do this.

jankratochvil marked 9 inline comments as done.Jun 20 2021, 2:09 PM
jankratochvil added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375

String literals don't have lifetime issues, so there's no need for the static local variable.

True, I am stupid.

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
103

TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version.
The look ahead instead of a lambda flusher is an interesting idea.

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37

I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users.
And then we should change an already existing error message: "while reading [0x%x, 0x%x)"

56

I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

I agree, thanks.

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗(On Diff #352004)

It is true 32-bit buildbots would catch it.

jankratochvil marked 3 inline comments as done.
jhenderson added inline comments.Jun 21 2021, 12:45 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
81
103

I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it).

I'm also happy if you'd prefer to drop the entire thing.

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37

With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...".

I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.

54

In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.

66

AbbrCode is a uint64_t. The test will fail on some platforms due to either bitness or endianness issues.

94

dwarf::Form is specified to be a uint16_t in its declaration.

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
305

As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end".

(I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).

307

Size is a uint8_t.

329

I wonder if this error needs putting earlier, in case the header is truncated?

333

debug_info.size() returns a size_t, so may not always be 64 bits.

337

I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.

341

getVersion() returns a uint16_t.

349

As above, I don't think you need the end offset here, as it isn't relevant for this message.

350

getAddressByteSize() returns a uint8_t.

llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s
26

I'd test both the cases where the offset points to within the header and past the end of the unit.

94

Nit: all the other comments use upper-case for their first letter.

llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s
3 ↗(On Diff #352004)

It would probably be a good idea if you used a non-zero value for the cu index.

Done.

That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).

That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.

3 ↗(On Diff #352004)

I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs.

From my understanding, integer promotion of variadic arguments is only as far as int/unsigned int. The size of int is implementation defined and not necessarily the same as uint32_t (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using uint*_t types, you should use their corresponding macros for printing.

jankratochvil marked 13 inline comments as done.Jun 21 2021, 7:24 AM
jankratochvil added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37

without the ambiguity risk by saying "with length X at offset Y" instead

That is definitely ambiguous as it can mean length b-a of [a, b) or it can mean length stored in the binary which is b-a-4 (for DWARF32). I used the offsets with "incl." and "excl." to move forward.

The length is actually the thing that's encoded in the DWARF after all.

So you did mean the b-a-4.

jankratochvil marked an inline comment as done.

Changes the messages to "incl." and "excl." as I hope that can be acceptable for both of us.
I have newly used joinErrors there which I intended originally and then I forgot about it. It creates a two-line error:

warning: DWARF unit at 0x0000002c cannot be parsed: 
warning: unexpected end of data at offset 0x2d while reading [0x2c, 0x30)
jankratochvil marked 4 inline comments as done.Jun 21 2021, 7:26 AM
jhenderson accepted this revision.Jun 23 2021, 12:59 AM

LGTM, with one possible suggestion, but also happy if this is committed as-is. You might want to wait for @dblaikie too.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375

This is probably absolutely fine, but I was thinking about it and wondering whether it would make some sense to factor out the commonality into some sort of container, that the string function can iterate over to generate a string, and the bool function can just compare values against. What do you think?

llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
37

Fair point. Let's stick with your latest version.

This revision is now accepted and ready to land.Jun 23 2021, 12:59 AM
jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:22 AM
jankratochvil added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
372–375

I did not want to code it that myself first as it looked overengineered to me and it still looks so. But why not if there is an agreement upon it. I agree there is no information duplication then.

llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp
75

I have put there a simple iterator instead of llvm::transform as it is really shorter and easier to read.

Just some clang-tidy warnings.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
377
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
319

Nit: fix clang-tidy warnings.

jankratochvil marked an inline comment as done.

Fixed clang-tidy warnings.

jankratochvil marked 2 inline comments as done.Jun 23 2021, 5:45 AM
This revision was landed with ongoing or failed builds.Jun 27 2021, 2:41 AM
This revision was automatically updated to reflect the committed changes.