This is an archive of the discontinued LLVM Phabricator instance.

[ObjectYAML] Add basic minidump generation support
ClosedPublic

Authored by labath on Mar 18 2019, 3:09 AM.

Details

Summary

This patch adds the ability to read a yaml form of a minidump file and
write it out as binary. Apart from the minidump header and the stream
directory, only three basic stream kinds are supported:

  • Text: This kind is used for streams which contain textual data. This is typically the contents of a /proc file on linix (e.g. /proc/PID/maps). In this case, we just put the raw stream contents into the yaml.
  • SystemInfo: This stream contains various bits of information about the host system in binary form. We expose the data in a structured form.
  • Hex: This kind is used as a fallback when we don't have any special knowledge about the stream. In this case, we just print the stream contents in hex.

For this code to be really useful, more stream kinds will need to be
added (particularly for things like lists of memory regions and loaded
modules). However, these can be added incrementally.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Mar 18 2019, 3:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2019, 3:09 AM
Herald added a subscriber: mgorny. · View Herald Transcript

I've not reviewed the unit test yet, but the bulk of the body looks fine, apart from some minor details.

include/llvm/ObjectYAML/MinidumpYAML.h
27 ↗(On Diff #191063)

I think it is worth making this an enum class.

lib/ObjectYAML/MinidumpYAML.cpp
55 ↗(On Diff #191063)

Perhaps worth a quick message:

assert(OS.tell() == BeginOffset + NextOffset && "some message here");

70 ↗(On Diff #191063)

Maybe worth calling this mapRequiredAs, to line up with the mapRequired name.

96 ↗(On Diff #191063)

Similar to above, perhaps mapRequiredHex?

100 ↗(On Diff #191063)

optionam?

181 ↗(On Diff #191063)

Is this definitely necessary? What's the type of Info.ProcessorFeatures? Similar comments for Vendor ID below.

183 ↗(On Diff #191063)

If there is too much data to fit, should this emit an error, rather than silently truncate? Same below with Vendor ID.

306 ↗(On Diff #191063)

I don't think yin matches our (current) variable naming guidelines. It should be Yin at least, though it wasn't immediately clear what was meant even then at first to me, so perhaps a different name might be better (simply Input would work for me).

tools/yaml2obj/yaml2obj.cpp
61 ↗(On Diff #191063)

I feel like it would make more sense for this to match the other versions above, i.e. I think this should be at least return yaml2minidump(...);

labath marked 11 inline comments as done.Mar 18 2019, 6:52 AM
labath added inline comments.
lib/ObjectYAML/MinidumpYAML.cpp
181 ↗(On Diff #191063)

No, it's not necessary, it's just that I was too lazy to implement proper error checking for this (and this field isn't that important -- I don't believe we actually use it for anything right now, but I have to print it in some way). I've now implemented proper checking for this and the VendorID fields (including tests).

306 ↗(On Diff #191063)

There's definitely plenty of places that use yin, with the yaml library itself being the most prominent user. I think you could argue that it fits the naming guidelines due to the "emulates c++ stl interface" (i.e cin) exception, but I don't think the name really matters that much here, so I've just renamed it to Input.

tools/yaml2obj/yaml2obj.cpp
61 ↗(On Diff #191063)

Yeah, I had the same feeling, but then I also didn't like creating a yaml2minidump.cpp just to put a two-line function in it (and putting this function into the ObjectYAML library also didn't feel right).

The root cause here is that I've chose to fully library-ize the minidump code, which doesn't fit in with how the other object files work (though there is a kind of a precedent for this in DWARFYAML). My reason for doing that was that I think it would be very useful to have this functionality accessible to lldb unit tests (and I've also gotten interest from a downstream user who wanted to use this functionality to synthesize minidumps in their unit tests).

Long story short, I've now put the created yaml2minidump.cpp and put the yaml2minidump in it.

labath updated this revision to Diff 191079.Mar 18 2019, 6:52 AM
labath marked 11 inline comments as done.

Updating to address review comments (thank you for the super quick turnaround).

ormris removed a subscriber: ormris.Mar 18 2019, 9:01 AM
jhenderson accepted this revision.Mar 20 2019, 5:03 AM

LGTM. I don't really know much about the minidump stuff, but the general structure looks fine to me.

This revision is now accepted and ready to land.Mar 20 2019, 5:03 AM

Thanks for the review.

I've added some extra people who should be familiar with the minidump format (in addition to @clayborg, and @zturner, who should also have some knowledge of it), just in case they have any thoughts/suggestions about this.

clayborg added inline comments.Mar 20 2019, 7:19 AM
include/llvm/ObjectYAML/MinidumpYAML.h
27 ↗(On Diff #191079)

Maybe "Encoding" might be a better name?

28 ↗(On Diff #191079)

Maybe "Binary" instead of "Hex"? Or maybe "RawBytes"?

48 ↗(On Diff #191079)

Name "ByteStream" maybe? Does hex make sense?

50 ↗(On Diff #191079)

Is the "Size" necessary here? Does yaml::BinaryRef not contain a size?

81 ↗(On Diff #191079)

Maybe further specify the content like "UTF8Stream" or "ASCIIStream"? Not sure what the encoding is expected to be, but might be better to indicate the exact text encoding here?

lib/ObjectYAML/MinidumpYAML.cpp
132 ↗(On Diff #191079)

This is where encoding might be a bit more clear?

Encoding E = getEncoding(Type);
labath marked 8 inline comments as done.Mar 20 2019, 8:58 AM
labath added inline comments.
include/llvm/ObjectYAML/MinidumpYAML.h
27 ↗(On Diff #191079)

On it's own I agree Encoding would be better, but the term Kind is already used by the EFLYAML implementation, so I think it's better to be consistent with that.

28 ↗(On Diff #191079)

Makes sense. In fact ELF already uses the name RawContent for the equivalent functionality, so I went with that. I also changed Text to TextContent.

50 ↗(On Diff #191079)

This is another bit I stole from ELFYAML. The idea is that you can specify only the size, in case you don't actually care about the content, or specify the size+content if you only care about the initial few bytes of the section. However, it looks like I didn't finish copying the idea, so I was missing the bits that check that Size >= Content.size() and the bit which pads the size when serializing. Now I fix that and add some additional tests.

81 ↗(On Diff #191079)

Unfortunately, I don't think there is a well-defined encoding here. I think the files will be mostly ASCII, but for instance the encoding of the "filename" part of /proc/PID/maps can really be anything. So, I think we should just keep this bit deliberately vague.

labath updated this revision to Diff 191508.Mar 20 2019, 8:59 AM
labath marked 2 inline comments as done.
  • rename Text/Hex streams
  • fix Hex stream size handling and add some tests for it
labath updated this revision to Diff 191653.Mar 21 2019, 3:44 AM

It occurred to me that the "invalid input" test cases could be easily rewritten
in lit, as they don't require parsing of the generated binary (because there
isn't one).

In this update I just port those tests to lit, which enables us to actually
assert the error message (which isn't possible from the unit test as that
unfortunately goes to stderr).

jhenderson accepted this revision.Mar 21 2019, 3:49 AM

Test updates LGTM.

This revision was automatically updated to reflect the committed changes.