In trying to hoist errors further up this callstack, I discovered that
if the data in the debug_abbrev section is invalid entirely, the code
that parses the debug_abbrev section may do strange or unpredictable
things. The underlying issue is that DataExtractor will return a value
of 0 when it encounters an error in extracting a LEB128 value. It's thus
difficult to determine if there was an error just by looking at the
return value. This patch aims to bail at the first sight of an error in
the debug_abbrev parsing code.
Details
- Reviewers
JDevlieghere aprantl fdeazeve jhenderson int3 - Group Reviewers
Restricted Project - Commits
- rG6836a47b7e6b: [DebugInfo] Add error checking around data extraction in…
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There are five new places that the function you've changed can bail out with an error, but you've only got one test. I think it makes sense to test the other new places too?
llvm/unittests/DebugInfo/DWARF/DWARFDebugAbbrevTest.cpp | ||
---|---|---|
47 | Might be worth adding an assert that Size is a multiple of 4? | |
129–130 | Nit: it seems unnecessary to have this over two lines? Is it also worth checking that Offset is some specific value? |
llvm/unittests/DebugInfo/DWARF/DWARFDebugAbbrevTest.cpp | ||
---|---|---|
45 | Small nit but it would be helpful for readers if we improve the naming here a bit. There is nothing inherently "Bad" or inherently "Abbreviation" in the function, this is a form of write_n algorithm. Is it bad because we always write an odd number of elements? Or is it bad because we never have the proper zero terminators? Or because the contents of Data, when splat Size times, would fail to be decoded as an ULEB? I think the answer to this is related to @jhenderson's suggestion for different tests In all those cases, my suggestion would be to rename this to writeValueNTimes and then on the callee side we name the data/size to BadData and BadSize and maybe add a comment on why it's bad |
- Add more rigorous testing. Specifically, I'm testing every possible place where we we can bail early from DWARFAbbreviationDeclaration::extract.
- Replace the writeBadAbbreviationDeclarationData with more specific functions to handle both malformed input and input that is too large (both kinds of LEB128 extraction errors).
Looks like some lld-macho tests are failing... I'm looking at them and the debug_abbrev data in them looks malformed. I'll update those tests here as well.
@int3 Hey can you take a look at the lld tests to make sure I didn't make a mistake? The debug_abbrev sections were malformed, I've manually added the necessary terminating 0s into the tests. They do pass on my machine after the change.
Seems reasonable, I probably deleted those bytes when trying to make a 'minimal' test case
Small nit but it would be helpful for readers if we improve the naming here a bit. There is nothing inherently "Bad" or inherently "Abbreviation" in the function, this is a form of write_n algorithm.
Is it bad because we always write an odd number of elements? Or is it bad because we never have the proper zero terminators? Or because the contents of Data, when splat Size times, would fail to be decoded as an ULEB? I think the answer to this is related to @jhenderson's suggestion for different tests
In all those cases, my suggestion would be to rename this to writeValueNTimes and then on the callee side we name the data/size to BadData and BadSize and maybe add a comment on why it's bad