This is an archive of the discontinued LLVM Phabricator instance.

Restructure the minidump loading path and add early & explicit consistency checks
ClosedPublic

Authored by lemo on Jul 11 2018, 1:03 PM.

Details

Summary

Corrupted minidumps was leading to unpredictable behavior.

This change adds explicit consistency checks for the minidump early on. The checks are not comprehensive but they should catch obvious structural violations:

  • streams with type == 0
  • duplicate streams (same type)
  • overlapping streams
  • truncated minidumps

Another early check is to make sure we actually support the minidump architecture instead of crashing at a random place deep inside LLDB.

Diff Detail

Repository
rL LLVM

Event Timeline

lemo created this revision.Jul 11 2018, 1:03 PM
amccarth added inline comments.Jul 11 2018, 2:18 PM
source/Plugins/Process/minidump/MinidumpParser.cpp
23 ↗(On Diff #155049)

Why add <unordered_set>? It looks like your new map is just a vector.

367 ↗(On Diff #155049)

This looks like an inadvertent change. Please change "i" back to "index".

454 ↗(On Diff #155049)

For consistency, please consider renaming result to error.

558 ↗(On Diff #155049)

This is a long function (which, unfortunately, is not unusual in LLVM/LLDB), but the comments here are really good, so I'm OK with it.

source/Plugins/Process/minidump/MinidumpParser.h
93 ↗(On Diff #155049)

I'm not a big fan of 2-step initialization, but that seems to be the way of LLDB. :-(

source/Plugins/Process/minidump/ProcessMinidump.cpp
172 ↗(On Diff #155049)

Should the architecture check be in the MinidumpParser::Initialize with the other checks?

I don't know the answer; I'm just asking for your thinking about this.

193 ↗(On Diff #155049)

I realize nothing is perfectly consistent, but I think LLVM tends to start error strings with a lowercase letter (except for proper nouns). Can you revert this case change and make sure your new error strings follow that pattern?

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

I would rather see this function return the result of the Initialize and let the individual tests check the expectation explicitly.

I know that will lead to a little bit of duplication in the tests, but it will make the individual tests easier to understand in isolation. It also makes it possible for each test to decide whether to ASSERT or EXPECT. And it eliminates the need for the bool parameter which is hard to decipher at the call sites.

80 ↗(On Diff #155049)

See comment on SetUpData. As much as practical, I'd rather have the EXPECTs and ASSERTs directly in the tests rather than hiding them in a helper function. Also note that, while the first two arguments are pretty intuitive, there's no way to understand the true/false in the third argument without going to look up to see what SetUpData does.

lemo updated this revision to Diff 155069.Jul 11 2018, 2:55 PM
lemo marked 9 inline comments as done.

Incorporating CR feedback

source/Plugins/Process/minidump/MinidumpParser.cpp
23 ↗(On Diff #155049)

Good catch

558 ↗(On Diff #155049)

I agree size-wise it's borderline. It's also a bit smelly in that it does two things (checks + initialization).

I originally had a separate "ConsistencyCheck", but there was significant duplication (the directory traversal) and the combined version won by a tiny margin :) If this grows with more checks I'd be happy to revisit and refactor it.

source/Plugins/Process/minidump/MinidumpParser.h
93 ↗(On Diff #155049)

Agreed. I don't see how to avoid this w/o much bigger changes though (we don't have a chance to return meaningful errors during process creation).

source/Plugins/Process/minidump/ProcessMinidump.cpp
172 ↗(On Diff #155049)

Good question, here's my take: the checks are for consistency and a minidump with a currently unsupported architecture is a valid minidump.

So I think it's better to have this check external since the architecture support is not a minidump parser concern. WDYT?

193 ↗(On Diff #155049)

Sigh, sure. I was hoping it's the other way around.

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

Good idea, although gunit doesn't let us ASSERT in non-void returning functions.

I agree that the bool parameter is ugly, so I created a separate TruncateMinidump() helper (which cleans up the SetUpData since the load_size was only used for truncation)

Also, I'm not seeing tests for the other consistency checks you're adding (like whether there are any streams or whether the streams overlap). Is it difficult to create such malformed minidumps?

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

This isn't quit what I meant. I'd rather not have the ASSERTs in helper functions at all (regardless of return type). The helpers should return a status and the actual test code should do whatever ASSERT or EXPECT is appropriate.

amccarth added inline comments.Jul 11 2018, 3:23 PM
source/Plugins/Process/minidump/ProcessMinidump.cpp
172 ↗(On Diff #155049)

Yes, this makes sense to me.

lemo added a comment.Jul 11 2018, 3:33 PM

Regarding test for the other checks, I'll try to fabricate a few invalid minidumps (although it would obviously provide limited coverage)

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

So how would we handle the existing checks in SetupData()?

lemo updated this revision to Diff 155076.Jul 11 2018, 3:57 PM

Adding a few ill-formed minidumps for testing the new checks

lemo updated this revision to Diff 155080.Jul 11 2018, 4:16 PM
amccarth accepted this revision.Jul 12 2018, 8:37 AM

LGTM if you don't want to consider my remaining suggestion for this patch.

Thanks for the extra tests.

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

SetUpData would just return an error status instead of ASSERTing. The actual ASSERTs would be placed in the tests that call SetUpData. As I said, that would add some duplication, because individual tests would have to ASSERT (or EXPECT) on the result of SetUpData, but it makes the tests easier to read and maintain the tests.

Since there are other tests using SetUpData, they would have to be updated, so maybe you want to declare this suggestion as out-of-scope for this patch. Anyway, I'm happy that you eliminated the boolean argument.

This revision is now accepted and ready to land.Jul 12 2018, 8:37 AM
lemo added a comment.Jul 12 2018, 9:47 AM

Thanks Adrian for the thorough review.

unittests/Process/minidump/MinidumpParserTest.cpp
52 ↗(On Diff #155049)

I think I understand the idea and I agree it's tempting. The problem is that not all the checks in SetupData() are status based, so there's no direct mapping from each operation to a return value.

It doesn't seem terribly bad to let the helper encapsulate some of the checks.

This revision was automatically updated to reflect the committed changes.

I don't agree with the two-stage initialization of the MinidumpParser class being introduced here. We deliberately introduced the Create static function to avoid this. If this Initialize function in checking invariants which are assumed to be hold by other parser methods, then it should be done by the Create function. Ideally this would be done before even constructing the parser object, but if this is impractical for some reason then you can make the Initialize function private and call it directly from Create. This way a user will never be able to see an malformed parser object. To make sure you propagate the error, you can change the return type of Create to llvm::Expected<MinidumpParser (the only reason we did not do this back then was that there was no precedent for using Expected in LLDB, but this is no longer the case).

The problem is not returning an error from Minidump::Create() - if that was
the case this could easily be improved indeed. The two-phase initialization
is a consequence of the LLDB plugin lookup:

  1. Target::CreateProcess() calls Process::FindPlugin()
  2. ProcessMinidump::CreateInstance() then has to inspect the core file to

see if it's a minidump
2b. ... if it is a minidump, we need to create a ProcessMinidump (which
calls MinidumpParser::Create())

  1. once the plugin is selected, Process::LoadCore() is finally called and

this the earliest we can do minidump-specific error checking

Note that at step 2b. we don't have a way to propagate the error since
we're just doing the plugin lookup (the most we can pass on the lookup to
the rest of the plugins). We can't easily defer the
MinidumpParser::Create() as step 2b either since that only morphs into a
different kind of two-stage initialization (having a ProcessMinidump w/o a
parser).

I agree that it would be nicer with a one step initialization but overall
changing the LLDB plugin lookup is too intrusive for the relatively small
benefit. If you have any suggestions I'd love to hear them.

I have an upcoming patch for adding ARM and ARM64 support to the minidump parser and i was curious how you created the invalid minidump files that are part of this patch?

lemo added a comment.Jul 17 2018, 11:16 AM

Great timing! ARM support would be most welcome. Are you planning to
support the Breakpad flavor or ARM minidumps or the Microsoft one? (Mark
Mentovai just reminded me that the ARM support was added independently and
some of the structures are different)

Regarding the invalid minidumps, I used my favorite hex editor
https://www.sweetscape.com/010editor/ to manually corrupt the minidump
streams directory.

Great timing! ARM support would be most welcome. Are you planning to
support the Breakpad flavor or ARM minidumps or the Microsoft one? (Mark
Mentovai just reminded me that the ARM support was added independently and
some of the structures are different)

Currently I am adding support for the Breakpad ARM and ARM64 architectures.

Regarding the invalid minidumps, I used my favorite hex editor
https://www.sweetscape.com/010editor/ to manually corrupt the minidump
streams directory.

Gotcha. I have a minidump generator I am writing as part of a python module that can save minidumps from any live process so I will continue to use that.