This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Add more error handling to debug line parser.
ClosedPublic

Authored by JDevlieghere on Jul 10 2019, 6:59 PM.

Details

Summary

This patch exnteds the error handling in the debug line parser to get rid of the existing MD5 assertion. I want to reuse the debug line parser from LLVM in LLDB where we cannot crash on invalid input.

Diff Detail

Event Timeline

JDevlieghere created this revision.Jul 10 2019, 6:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2019, 6:59 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
JDevlieghere marked an inline comment as done.Jul 10 2019, 6:59 PM
JDevlieghere added inline comments.
llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
357–359

Looks like clang-format needs a little help here. I'll reflow this.

  • Reflow string in unit test
  • Update debug_line_invalid.test
JDevlieghere retitled this revision from [DWAF] Add more error handling to debug line parser. to [DWARF] Add more error handling to debug line parser..Jul 11 2019, 2:52 PM

Looks like this code already had rudimentary error handling along this codepath where the assert is (the function returns empty, which returns false, which becomes an error further up, I think?) - and adding more descriptive/precise error handling along this path is probably good - but probably also means more testing is required to demonstrate all the new error results/messages/codepaths?

I'm not sure if there's a better way to test this without checking in a binary. I ended up modifying MCDwarf to emit a data8 instead of a data16 form for the MD5 hash to trigger this code path.

I was rather hoping for test coverage for all the new error messages this change introduced - is that unrealistic/impractical?

Yeah, the line table is especially tricky to hand-craft compared to checking in an object file. I think it technically can still be hand-crafted assembly (no line directives, etc, just a debug_line section with raw byte (etc) directives) - might be plausible & make it clearer what the input is? (checked in assembly, assembled with llvm-mc then run through llvm-dwarfdump to test the parsing)

I was rather hoping for test coverage for all the new error messages this change introduced - is that unrealistic/impractical?

Yeah, the line table is especially tricky to hand-craft compared to checking in an object file. I think it technically can still be hand-crafted assembly (no line directives, etc, just a debug_line section with raw byte (etc) directives) - might be plausible & make it clearer what the input is? (checked in assembly, assembled with llvm-mc then run through llvm-dwarfdump to test the parsing)

Hear hear. It looks like this patch is all about diagnosing issues in the v5 prologue/file-table, and it should be relatively easy to construct bogus examples in assembler. You don't need a line-number program at all. There ought to be examples of *valid* v5 line-table headers lying around, I know I did all the dumper work with assembler tests before I ever generated any v5 headers.

I was rather hoping for test coverage for all the new error messages this change introduced - is that unrealistic/impractical?

Oh, sure, I was focussed on the MD5 one because that's the only one that's "new". I'll make sure the other ones are covered too.

Yeah, the line table is especially tricky to hand-craft compared to checking in an object file. I think it technically can still be hand-crafted assembly (no line directives, etc, just a debug_line section with raw byte (etc) directives) - might be plausible & make it clearer what the input is? (checked in assembly, assembled with llvm-mc then run through llvm-dwarfdump to test the parsing)

Hear hear. It looks like this patch is all about diagnosing issues in the v5 prologue/file-table, and it should be relatively easy to construct bogus examples in assembler. You don't need a line-number program at all. There ought to be examples of *valid* v5 line-table headers lying around, I know I did all the dumper work with assembler tests before I ever generated any v5 headers.

How would you be able to emit a wrong form value for the MD5 hash from assembly? The directive (which I think you came up with?) is just md5 [data]? Deciding which form to emit is hard-coded in MCDwarf.

How would you be able to emit a wrong form value for the MD5 hash from assembly? The directive (which I think you came up with?) is just md5 [data]? Deciding which form to emit is hard-coded in MCDwarf.

Only if you use the .file directive to emit it. As David said, you can put whatever you want into a hand-coded .debug_line section.
See for example llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s

I was rather hoping for test coverage for all the new error messages this change introduced - is that unrealistic/impractical?

Oh, sure, I was focussed on the MD5 one because that's the only one that's "new". I'll make sure the other ones are covered too.

Yeah, the line table is especially tricky to hand-craft compared to checking in an object file. I think it technically can still be hand-crafted assembly (no line directives, etc, just a debug_line section with raw byte (etc) directives) - might be plausible & make it clearer what the input is? (checked in assembly, assembled with llvm-mc then run through llvm-dwarfdump to test the parsing)

Hear hear. It looks like this patch is all about diagnosing issues in the v5 prologue/file-table, and it should be relatively easy to construct bogus examples in assembler. You don't need a line-number program at all. There ought to be examples of *valid* v5 line-table headers lying around, I know I did all the dumper work with assembler tests before I ever generated any v5 headers.

How would you be able to emit a wrong form value for the MD5 hash from assembly? The directive (which I think you came up with?) is just md5 [data]? Deciding which form to emit is hard-coded in MCDwarf.

Not using line directives to create the line table - using normal/raw assembly (the same way we have assembly for debug_info sections for some tests) to create the bytes in the debug_line section. Does that make sense?

Got it, thanks! I'll give that a shot :-)

Add more tests!

ormris removed a subscriber: ormris.Jul 18 2019, 9:55 AM
dblaikie accepted this revision.Jul 22 2019, 3:15 PM

Generally looks good (optional thoughT: maybe the .test file and the .s file could be merged (so the CHECK lines could sit near the assembly they're checking, would make it easier to eyeball whether the tests are testing the right thing, etc)

This revision is now accepted and ready to land.Jul 22 2019, 3:15 PM

Generally looks good (optional thoughT: maybe the .test file and the .s file could be merged (so the CHECK lines could sit near the assembly they're checking, would make it easier to eyeball whether the tests are testing the right thing, etc)

I agree with the general idea, but given that this test is checking the same prefixes with two inputs (debug_line_reserved_length.s and debug_line_malformed.s) I think it would just increate the complexity within the test.

This revision was automatically updated to reflect the committed changes.

Hi @JDevlieghere, thanks for doing this! If you do any more new error handling for the debug line parser, could you subscibe me too, please? We've got a number of local patches to add additional error handling which I've been meaning to put up for review, but simply haven't had the time (and don't foresee having the time in the near future). If you subscribe me, I could probably provide you with bits that might be useful though, as there's a reasonable chance we already have the test case for example! It'll also help with our merging in of your change into our downstream repo.