This is an archive of the discontinued LLVM Phabricator instance.

Object/Minidump: Add support for reading the ModuleList stream
ClosedPublic

Authored by labath on Apr 2 2019, 5:13 AM.

Details

Summary

The ModuleList stream consists of an integer giving the number of
entries in the list, followed by the list itself. Each entry in the list
describes a module (dynamically loaded objects which were loaded in the
process when it crashed (or when the minidump was generated).

The code for reading the list is relatively straight-forward, with a
single gotcha. Some minidump writers are emitting padding after the
"count" field in order to align the subsequent list on 8 byte boundary
(this depends on how their ModuleList type was defined and the native
alignment of various types on their platform). Fortunately, the minidump
format contains enough redundancy (in the form of the stream length
field in the stream directory), which allows us to detect this situation
and correct it.

This patch just adds the ability to parse the stream. Code for
conversion to/from yaml will come in a follow-up patch.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Apr 2 2019, 5:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 5:13 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

Code basically looks fine, but somebody else should confirm that the format is correct.

lib/Object/Minidump.cpp
55 ↗(On Diff #193257)

I wonder whether it would be worth a new class of errors for minidump files? After all, invalid_section_index feels a bit forced for a format without sections!

unittests/Object/MinidumpTest.cpp
317 ↗(On Diff #193257)

It's a bit jarring seeing both hex and decimal numbers in here at the same time. Is there a particular reason you've used hex here, but decimal for e.g. 116 below?

labath marked 2 inline comments as done.Apr 4 2019, 7:13 AM

Code basically looks fine, but somebody else should confirm that the format is correct.

BTW, the structure definitions are just copied from lldb (https://github.com/llvm-mirror/lldb/blob/master/source/Plugins/Process/minidump/MinidumpTypes.h and other files in that directory). I'm also porting the lldb parsing code to use the new llvm stuff as soon as the llvm stuff is ready as a way to ensure things remain interoperable with the existing minidumps and related tests.

lib/Object/Minidump.cpp
55 ↗(On Diff #193257)

Yes, I've been wondering about that too. In practice, I don't expect that the consumers will wan't to differentiate the error cases too much here (it usually does not matter if the stream was not present in the file, or if it was truncated -- the end result is the same -- you cannot use it). However, for clarity it might be better to do have a separate type anyway. I'll whip up a separate patch for that.

unittests/Object/MinidumpTest.cpp
317 ↗(On Diff #193257)

Yeah, I agree this is not ideal. I started with hex here because it looked more "binary", but then I realized that:
a) decimal is shorter
b) the "size" fields are present in the static_asserts as decimal, and so it was easier to copy the numbers from there as decimal

I'll replace all of these to use decimal, except the "version" field, as that is defined in hex in the header.

labath updated this revision to Diff 193713.Apr 4 2019, 8:01 AM
labath marked an inline comment as done.
  • avoid using mismatched error codes
  • use decimal numbers more consistently in the tests
labath marked an inline comment as done.Apr 4 2019, 8:02 AM
labath added inline comments.
lib/Object/Minidump.cpp
55 ↗(On Diff #193257)

After some soul-searching I decided to just abandon the idea of returning error codes here, and just return the generic parse_failed error. That's what most other Object formats do (presumably they don't have a good use for distinguishing the codes either), and having an own error type would require some non-obvious design choices. (Like, since I'd still inherit from BinaryError, which inherits from ECError, I'd still have to interoperate with std::error_code somehow. This means I'd still have to map things to the object_error enum, or I'd have to define a new error_category, which is a deprecated way of doing things).

jhenderson accepted this revision.Apr 4 2019, 8:08 AM

Okay, no more comments from me.

This revision is now accepted and ready to land.Apr 4 2019, 8:08 AM
clayborg accepted this revision.Apr 4 2019, 5:11 PM
This revision was automatically updated to reflect the committed changes.