This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Check that we do not run past a line table end when parsing
AbandonedPublic

Authored by jhenderson on Jan 29 2020, 3:41 AM.

Details

Summary

This change adds another check to the debug line table parsing code, namely to make sure the final offset after parsing a table matches the end offset as expected based on the unit length field. These two won't match if the unit length points to part way through an opcode, as the parsing does not stop midway through opcodes.

If the problem is detected, the offset will be reset to the expected value and an error reported.

Diff Detail

Event Timeline

jhenderson created this revision.Jan 29 2020, 3:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2020, 3:41 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
ikudrin added inline comments.Jan 30 2020, 2:50 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
924

I guess that the comment is not 100% accurate. Let's imagine the following purely illustrative and meaningless sequence at the end of the section: 0, 5, DW_LNE_set_address, 0, 1, DW_LNE_end_sequence. The extractor will not read an argument of DW_LNE_set_address and will not increment the offset, but after that, something which looks like a correct termination will be read.

927

Maybe it is more consistent to put this code before the previous check. I mean, this check is comparatively low level, and if it triggers, the input is probably corrupted so deeply that it makes no sense to interpret it, no?

jhenderson marked 2 inline comments as done.Jan 30 2020, 3:08 AM
jhenderson added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
924

In your example, the set address opcode would consume the three bytes comprising the end sequence opcode, followed by reading a single zero value for the final byte. The offset would be adjusted to the end of the three readable bytes (thus going past the end of the DW_LNE_end_sequence opcode). The parser would then stop as *OffsetPtr == EndOffset. This warning wouldn't then be triggered but a no end sequence one would be.

Would changing this statement "As the offset isn't incremented by the data extractor when reading past the end of the available data" to "As the offset isn't incremented by the data extractor past the end of the available data" be clearer?

927

Just to be clear, are you saying that we shouldn't report the no end sequence error if we hit this case? That's fine with me. If I did that, I'd probably pull the sort below to before this check, as that would allow me to simply return after the error has been reported, whilst still allowing clients to use the data, if they want to.

ikudrin added inline comments.Jan 30 2020, 4:44 AM
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
924

Ah, yes, I forgot about the special code which fixes OffsetPtr at the end of the branch for Opcode == 0. Well, I cannot imagine another malicious sequence. It is sad that the whole code is not straightforward and requires a long explanatory comment.

I don't insist on any specific wording, but maybe it is worth to add a note about the code at the end of the Opcode == 0 branch which I missed.

927

Right, I believe that we do not need to report about the unterminated sequence if we have reported the incorrect offset. The latter is more fundamental, to my taste.

llvm/unittests/DebugInfo/DWARF/DwarfGenerator.cpp
178

Do I understand it right that this fix along with the corresponding changes in other places may be extracted into a separate patch?

@labath - maybe some other parts of the DWARF parsing that could benefit from a constrained DWARFDataExtractor

llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test
39

Should this dump in verbose mode to show more clearly which operations were parsed and which ones were not? So they match up with the LNE_* descriptions in the comments in the test?

185

(similar to other comment) - this warning sounds problematic to me & any chunk of debug info that has a specified length shouldn't be read beyond that length (as if the section itself ended at the end of the length - we should get the same error messages in both those cases)

llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
568–569

This phrasing doesn't match up with not reading past the end of a specified length - I'd expect something more like "last opcode in line table at offset 0x..1 is incomplete/truncated at offset 0x...34" or the like. Maybe with "expected to extend to 0x35" if we know that some localized implied/expressed length extends beyond some broader length that was specified.

794–796

Could you explain this further? What's incorrect about the existing usage (where is the opcode length field? Is that the 0x2 in the line above? Why would it be too short? (should the DWARFGenerator API be changed? is it computing a length that's too short for the table?))

@labath - maybe some other parts of the DWARF parsing that could benefit from a constrained DWARFDataExtractor

I think that pretty much everything would benefit from a data extractor constrained in this way. Prefixing the content with length is used in nearly every dwarf section, and so in theory, everything should be checking that it does not cross the specified length. I've seen code which attempts to do that via something like while(!endReached() && data.isValidOffset(*Offset) && *Offset < EndOffset) parseOneThing(Offset), but that is:
a) complicated
b) probably incorrect, because the end boundary is only checked at the end of the parseOneThing call, so we can still cross that boundary if the "one thing" is sitting on both sides of the boundary

If we had a "constrained" data extractor, then we wouldn't need the *Offset < EndOffset check, because the extractor would check that for us (and it would do that _everywhere_). It would also allow us to treat the "'thing' crosses a contribution boundary, but there is another contribution after it" and "'thing' crosses a contribution boundary, but hits the end of the section" cases uniformly, because as far as the code would be concerned, everything would be at the end of the section.

@labath - maybe some other parts of the DWARF parsing that could benefit from a constrained DWARFDataExtractor

I think that pretty much everything would benefit from a data extractor constrained in this way. Prefixing the content with length is used in nearly every dwarf section, and so in theory, everything should be checking that it does not cross the specified length. I've seen code which attempts to do that via something like while(!endReached() && data.isValidOffset(*Offset) && *Offset < EndOffset) parseOneThing(Offset), but that is:
a) complicated
b) probably incorrect, because the end boundary is only checked at the end of the parseOneThing call, so we can still cross that boundary if the "one thing" is sitting on both sides of the boundary

If we had a "constrained" data extractor, then we wouldn't need the *Offset < EndOffset check, because the extractor would check that for us (and it would do that _everywhere_). It would also allow us to treat the "'thing' crosses a contribution boundary, but there is another contribution after it" and "'thing' crosses a contribution boundary, but hits the end of the section" cases uniformly, because as far as the code would be concerned, everything would be at the end of the section.

Ok, I think I now understand (after reading the other emails) that you were letting me know of the use cases, and not asking me to provide some. :)

jhenderson marked 5 inline comments as done.Jan 31 2020, 2:26 AM

Given the proposed changes to the DWARFDataExtractor to limit it to a certain region, I think this and similar changes I have in the pipeline should probably be shelved, as the behaviour will change and the error messages will need updating. I'll focus on other areas instead in the meantime.

llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test
39

This last check is just for the last (good) section, so doesn't really need to show things in that much detail. However, I could certainly see a benefit for checking things more verbosely. We already do verbose dumps on line 29 of the test, but the checks don't check the operands. I'll look at updating that, but it should be a separate change.

185

That's fair. Saying it's been truncated at this point is somewhat lying, but would be correct once we switch to the limited data extractor.

llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
568–569

I think your proposal makes sense with the proposed data extractor changes, but possibly not before. See my out-of-line comment.

794–796

addExtendedOpcode takes a length for an extended opcode, followed by the opcode itself, and then data for the operands. I realise this byte should be moved into the operands argument, so I'll fix that. I don't think the API here needs changing, as it would prevent us creating broken situations like this (i.e. an extended opcode without sufficient data for its claimed length).

llvm/unittests/DebugInfo/DWARF/DwarfGenerator.cpp
178

It probably can be. It's a requirement for this patch, but dosen't need to be a part of the same commit. I'lll look at splitting it out.

dblaikie added inline comments.Feb 3 2020, 5:12 PM
llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test
39

Oh, I figured this change related to the other change in this review (Case 13) which wasn't super clear to me & figured verbosity might help.

185

Right - I meant that the warning implies the implementation isn't what I'd hope/expect (& that both message and implementation should be fixed in the direction of not reading more than the prefixed length indicates for any given entity).

llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
568–569

Right right - implementation and messaging should be fixed in that direction.

794–796

What's the mention of the "table end" here, then? I guess "table end" is indicated by the table length field elsewhere? Is it autogenerated (with a possible override for invalid situations)? Should it be autogenerated differently?

jhenderson marked 2 inline comments as done.Feb 4 2020, 1:01 AM
jhenderson added inline comments.
llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test
39

It's only indirectly related. Case 13 required adding new data to the input, which pushed the last table later, meaning the offset needed updating.

llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
794–796

Yeah, the table length is auto-generated, but it's auto-generated based on the raw data we add (i.e. not the interpretation of the data, but the physical number of bytes we add). See writeDefaultPrologue in DWARFGenerator.cpp. The ability to override the length is provided by the createBasicPrologue function which returns a line table prologue that can be updated for use in the test. See e.g. ErrorForTooLargePrologueLength. I think this is a reasonable approach.

In this test case, we are overriding the DW_LNE_end_sequence length with something different, by specifying the "2" argument. Previously, by not specifying any data after it in the test, the majority of the bytes covered by this size were past the end of the table, which is itself an interesting test case, albeit not the focus of this test. Consequently it was triggering the new error, which I didn't want. When I next update this patch, I'll replace this and the previous line of code and comment with:
LT.addExtendedOpcode(0x2, DW_LNE_end_sequence, {{LineTable::Byte, 0xaa});

jhenderson abandoned this revision.Jun 9 2020, 1:58 AM

Abandoning in favour of D80796 and related patches.