Restructure the minidump loading path and add early & explicit consistency checks

Authored by lemo on Wed, Jul 11, 1:03 PM.



Corrupted minidumps was leading to unpredictable behavior.

This change adds explicit consistency checks for the minidump early on. The checks are not comprehensive but they should catch obvious structural violations:

  • streams with type == 0
  • duplicate streams (same type)
  • overlapping streams
  • truncated minidumps

Another early check is to make sure we actually support the minidump architecture instead of crashing at a random place deep inside LLDB.

Diff Detail

lemo created this revision.Wed, Jul 11, 1:03 PM
amccarth added inline comments.Wed, Jul 11, 2:18 PM
23 ↗(On Diff #155049)

Why add <unordered_set>? It looks like your new map is just a vector.

367 ↗(On Diff #155049)

This looks like an inadvertent change. Please change "i" back to "index".

454 ↗(On Diff #155049)

For consistency, please consider renaming result to error.

558 ↗(On Diff #155049)

This is a long function (which, unfortunately, is not unusual in LLVM/LLDB), but the comments here are really good, so I'm OK with it.

93 ↗(On Diff #155049)

I'm not a big fan of 2-step initialization, but that seems to be the way of LLDB. :-(

172 ↗(On Diff #155049)

Should the architecture check be in the MinidumpParser::Initialize with the other checks?

I don't know the answer; I'm just asking for your thinking about this.

193 ↗(On Diff #155049)

I realize nothing is perfectly consistent, but I think LLVM tends to start error strings with a lowercase letter (except for proper nouns). Can you revert this case change and make sure your new error strings follow that pattern?

52 ↗(On Diff #155049)

I would rather see this function return the result of the Initialize and let the individual tests check the expectation explicitly.

I know that will lead to a little bit of duplication in the tests, but it will make the individual tests easier to understand in isolation. It also makes it possible for each test to decide whether to ASSERT or EXPECT. And it eliminates the need for the bool parameter which is hard to decipher at the call sites.

80 ↗(On Diff #155049)

See comment on SetUpData. As much as practical, I'd rather have the EXPECTs and ASSERTs directly in the tests rather than hiding them in a helper function. Also note that, while the first two arguments are pretty intuitive, there's no way to understand the true/false in the third argument without going to look up to see what SetUpData does.

lemo updated this revision to Diff 155069.Wed, Jul 11, 2:55 PM
lemo marked 9 inline comments as done.

Incorporating CR feedback

23 ↗(On Diff #155049)

Good catch

558 ↗(On Diff #155049)

I agree size-wise it's borderline. It's also a bit smelly in that it does two things (checks + initialization).

I originally had a separate "ConsistencyCheck", but there was significant duplication (the directory traversal) and the combined version won by a tiny margin :) If this grows with more checks I'd be happy to revisit and refactor it.

93 ↗(On Diff #155049)

Agreed. I don't see how to avoid this w/o much bigger changes though (we don't have a chance to return meaningful errors during process creation).

172 ↗(On Diff #155049)

Good question, here's my take: the checks are for consistency and a minidump with a currently unsupported architecture is a valid minidump.

So I think it's better to have this check external since the architecture support is not a minidump parser concern. WDYT?

193 ↗(On Diff #155049)

Sigh, sure. I was hoping it's the other way around.

52 ↗(On Diff #155049)

Good idea, although gunit doesn't let us ASSERT in non-void returning functions.

I agree that the bool parameter is ugly, so I created a separate TruncateMinidump() helper (which cleans up the SetUpData since the load_size was only used for truncation)

Also, I'm not seeing tests for the other consistency checks you're adding (like whether there are any streams or whether the streams overlap). Is it difficult to create such malformed minidumps?

52 ↗(On Diff #155049)

This isn't quit what I meant. I'd rather not have the ASSERTs in helper functions at all (regardless of return type). The helpers should return a status and the actual test code should do whatever ASSERT or EXPECT is appropriate.

amccarth added inline comments.Wed, Jul 11, 3:23 PM
172 ↗(On Diff #155049)

Yes, this makes sense to me.

lemo added a comment.Wed, Jul 11, 3:33 PM

Regarding test for the other checks, I'll try to fabricate a few invalid minidumps (although it would obviously provide limited coverage)

52 ↗(On Diff #155049)

So how would we handle the existing checks in SetupData()?

lemo updated this revision to Diff 155076.Wed, Jul 11, 3:57 PM

Adding a few ill-formed minidumps for testing the new checks

lemo updated this revision to Diff 155080.Wed, Jul 11, 4:16 PM
amccarth accepted this revision.Thu, Jul 12, 8:37 AM

LGTM if you don't want to consider my remaining suggestion for this patch.

Thanks for the extra tests.

52 ↗(On Diff #155049)

SetUpData would just return an error status instead of ASSERTing. The actual ASSERTs would be placed in the tests that call SetUpData. As I said, that would add some duplication, because individual tests would have to ASSERT (or EXPECT) on the result of SetUpData, but it makes the tests easier to read and maintain the tests.

Since there are other tests using SetUpData, they would have to be updated, so maybe you want to declare this suggestion as out-of-scope for this patch. Anyway, I'm happy that you eliminated the boolean argument.

This revision is now accepted and ready to land.Thu, Jul 12, 8:37 AM
lemo added a comment.Thu, Jul 12, 9:47 AM

Thanks Adrian for the thorough review.

52 ↗(On Diff #155049)

I think I understand the idea and I agree it's tempting. The problem is that not all the checks in SetupData() are status based, so there's no direct mapping from each operation to a return value.

It doesn't seem terribly bad to let the helper encapsulate some of the checks.

This revision was automatically updated to reflect the committed changes.

I don't agree with the two-stage initialization of the MinidumpParser class being introduced here. We deliberately introduced the Create static function to avoid this. If this Initialize function in checking invariants which are assumed to be hold by other parser methods, then it should be done by the Create function. Ideally this would be done before even constructing the parser object, but if this is impractical for some reason then you can make the Initialize function private and call it directly from Create. This way a user will never be able to see an malformed parser object. To make sure you propagate the error, you can change the return type of Create to llvm::Expected<MinidumpParser (the only reason we did not do this back then was that there was no precedent for using Expected in LLDB, but this is no longer the case).

The problem is not returning an error from Minidump::Create() - if that was
the case this could easily be improved indeed. The two-phase initialization
is a consequence of the LLDB plugin lookup:

  1. Target::CreateProcess() calls Process::FindPlugin()
  2. ProcessMinidump::CreateInstance() then has to inspect the core file to

see if it's a minidump
2b. ... if it is a minidump, we need to create a ProcessMinidump (which
calls MinidumpParser::Create())

  1. once the plugin is selected, Process::LoadCore() is finally called and

this the earliest we can do minidump-specific error checking

Note that at step 2b. we don't have a way to propagate the error since
we're just doing the plugin lookup (the most we can pass on the lookup to
the rest of the plugins). We can't easily defer the
MinidumpParser::Create() as step 2b either since that only morphs into a
different kind of two-stage initialization (having a ProcessMinidump w/o a

I agree that it would be nicer with a one step initialization but overall
changing the LLDB plugin lookup is too intrusive for the relatively small
benefit. If you have any suggestions I'd love to hear them.