Page MenuHomePhabricator

[Object] Add basic minidump support

Authored by labath on Mar 13 2019, 3:42 AM.



This patch adds basic support for reading minidump files. It contains
the definitions of various important minidump data structures (header,
stream directory), and of one minidump stream (SystemInfo). The ability
to read other streams will be added in follow-up patches. However, all
streams can be read even now as raw data, which means lldb's minidump
support (where this code is taken from) can be immediately rebased on
top of this patch as soon as it lands.

As we don't have any support for generating minidump files (yet), this
tests the code via unit tests with some small handcrafted binaries in
the form of c char arrays.

Diff Detail


Event Timeline

labath created this revision.Mar 13 2019, 3:42 AM

Is it possible to unit-test some of the Object and BinaryFormat changes? That would allow you to split those parts away from the obj2yaml changes. Also, if you added support in yaml2obj, you could test that, then use that in testing the obj2yaml changes more easily, rather than relying on a canned binary.

The BinaryFormat changes are basically just defining a bunch of structs and enums, so there isn't much to test there. The only actual functional change there is the addition of the magic recognition code in Magic.cpp. That could easily be unit-tested, but that would test just those 5 lines of changes. These would be covered by any other minidump tests anyway, which is why I guess there are no existing unit tests for the magic recognition code.

The Object changes are more interesting. These could be unit tested, but I think these tests would require "baked" binaries (perhaps in the form of inline C char arrays) since, well, the Object code is all about opening binaries, and we have no way to produce those at that point. Overall, I think a small, well-targeted set of small test binaries is useful for "rooting" the tests of code like this for two reasons:

  • this ultimately needs to interoperate with other producers/consumers, and if everything goes through one (de)serialization point, there nothing to guard against accidental changes in the binary format (it has happened to me while transcribing the lldb structure definitions)
  • some things (particularly various boundary conditions and malformed binaries) cannot be expressed in yaml form. For instance, the yaml format I'm proposing here will not be able to generate a minidump where one of the streams runs off the end of the file (much like the elf yaml representation cannot express truncated sections)

So, if we want to split this up more, I can propose the following:

  • split off the BinaryFormat&Object changes into a separate patch, test it with some inline C char arrays
  • write the yaml2obj equivalent of this patch, test it via unit tests and the Object minidump reader
  • finally, come back to this patch (obj2yaml), test it via yamlobj|obj2yaml round-tripping

What do you think?

So, if we want to split this up more, I can propose the following:

  • split off the BinaryFormat&Object changes into a separate patch, test it with some inline C char arrays
  • write the yaml2obj equivalent of this patch, test it via unit tests and the Object minidump reader
  • finally, come back to this patch (obj2yaml), test it via yamlobj|obj2yaml round-tripping

    What do you think?

This sounds like a reasonable breakdown to me. Another possibility for testing yaml2obj is some form of minidump support in llvm-readobj or similar, which uses the Object minidump reader to verify things in lit tests.

labath planned changes to this revision.Mar 13 2019, 7:03 AM

Ok, let me try to do that.

aprantl added inline comments.Mar 13 2019, 9:02 AM
26 ↗(On Diff #190392)

Can you please make sure to Doxygen-ify all the public headers?

labath updated this revision to Diff 190638.Mar 14 2019, 8:27 AM
  • remove yaml stuff, making this patch only about BinaryFormat&Object changes
  • Add unit tests for the object layer
  • Add a bunch of doxygen comments
  • a couple of tiny implementation tweaks
labath retitled this revision from Add basic minidump support to obj2yaml to [Object] Add basic minidump support.Mar 14 2019, 8:28 AM
labath edited the summary of this revision. (Show Details)
aprantl accepted this revision.Mar 14 2019, 9:45 AM
This revision is now accepted and ready to land.Mar 14 2019, 9:45 AM

I haven't really looked at the behaviour to make sure it makes sense, but I've made a number of comments on the comments and one or two other things.

25 ↗(On Diff #190638)

it's -> its

26 ↗(On Diff #190638)

Nit, you don't need the ',' after 'i.e.'.

42 ↗(On Diff #190638)

raw the -> the raw

61 ↗(On Diff #190638)

I've seen in some places that the error handling automatically appends a full stop, so this would result in two full stops, which is probably not ideal. Also, having a full stop prevents a consumer producing the error message in the middle of a quote or similar, e.g. "the library produced the following error 'Unexpected EOF' and will terminate" or whatever.

Same comment applies in a few other places.

68 ↗(On Diff #190638)

of given -> of a given

77 ↗(On Diff #190638)

Are you deliberately making a copy of StreamMap? I would normally expect this to be passed by some form of reference.

zturner added inline comments.Mar 14 2019, 10:28 AM
77 ↗(On Diff #190638)

It's actually not making a copy. This gives the caller the ability to write std::move(StreamMap), which will actually *prevent* a copy. If you change this to a reference, then in order to store it in the class by value a copy must be made. By passing it by value in the constructor, you can prevent *any* copies from being made, because the caller can move it in at construction time (and indeed, this is what happens later on).

However, I would also ask why we're using std::unordered_map<> instead of llvm::DenseMap<>, which is generally more efficient.

labath updated this revision to Diff 190832.Mar 15 2019, 8:15 AM
labath marked 8 inline comments as done.
labath edited the summary of this revision. (Show Details)
  • fix comments and error messages
  • use DenseMap
77 ↗(On Diff #190638)

I think Zachary answered the "copy" part very well.

As for unordered_map, my reason for doing that was two-fold:

  • laziness to define DenseMapInfo for StreamType
  • the fact that setting aside some values for the empty and tombstone keys this would render some minidumps invalid (although only a couple of dozen values of the entire 32-bit range is used, we must be prepared to handle any input)

However, neither of these two reasons is very strong, so I now define the DenseMapInfo traits, and also hard-reject any minidumps that contain one of the two stream types (to avoid asserting when trying to insert them into the map). We can always switch the special key values, if we find someone uses these for something else.

zturner accepted this revision.Mar 19 2019, 10:48 AM

This LGTM, but let's give it a day to see if anyone else chimes in with comments.

This revision was automatically updated to reflect the committed changes.

I'm pretty sure this was caused by and not by this patch because LLDB does not use this code (yet).

Incidentally, just today I was working on adding string reading to the llvm parser, and noticed this potential misaligned access in lldb code. I don't think it can ever occur in practice for real (valid) minidumps, because those will have those fields correctly aligned, but I believe the reason that patch caught this problem is that the hand-crafted minidumps Greg created did not respect the natural alignment of the types. I think the fix is to regenerate the minidump binaries to correctly align the string fields, and also teach the lldb string reader to reject misaligned strings. The second part may not be so important, as I am going to delete that code soon anyway.

Alternatively, we could teach the lldb string reader to handle the misaligned strings correctly (most likely by memcpying them into aligned storage)...