Page MenuHomePhabricator

Introduce core2yaml tool

Authored by labath on Mar 5 2019, 8:27 AM.



This patch introduces the core2yaml tool, whose purpose is to dump core
files in a human readable (and, ideally, editable) format. It's very
similar to llvm's obj2yaml, except that it works with core files.

Currently, I only add basic support for dumping minidump files, but in
the future, the goal is to be able to dump ELF core files as well (hence
the generic name). Most of the code is implemented as a library (to
enable it being used from unit tests), with the actual executable only
being a very thin wrapper around that.

This patch only sets up the basic plumbing, and implements enough
functionality to be able to dump one of the simple minidump files we
have lying around. More functionality (including the tool for doing the
opposite conversion) will come in subsequent patches.

Event Timeline

labath created this revision.Mar 5 2019, 8:27 AM

Do we want this in LLVM instead of lldb?

labath added a comment.Mar 5 2019, 8:51 AM

Do we want this in LLVM instead of lldb?

The thought has crossed my mind, but for that I would have to also move the minidump parser into llvm. It's already pretty standalone, so it shouldn't be a problem technically, but it's not clear to me whether there is any use case for it in llvm. This is a core file format, and I don't know of any other llvm tool/project working with core files.

clayborg added inline comments.Mar 5 2019, 9:07 AM

I worry about this going stale when the owner of the data lets it go and we crash now that we don't have strong ownership. If this is common in LLVM, then we need to document this in the header file.

Strong ownership is needed for this class IMHO because it hands out pointers to native data

Strong ownership is needed for this class IMHO because it hands out pointers to native data

I disagree here, see my previous comment. Binaries grow large very quickly, and if we always copy data around when we want to hand out some internal pointer, memory usage would explode. Luckily, the scenario that attempts to prevent is very rare. Specifically, it only addresses the scenario where you open a file, parse a bunch of data, close the file, then still expect to be able to do something with the file's internal data structures. I haven't ever seen this be a problem in practice, as the "natural" order is to open a file, process it, then close the file. And in that case this is fine.

I don't mind having it be documented in the class header, but given that the pattern is fairly pervasive in LLVM (e.g. all of lib/Object works this way, among other things), I'm also fine with just letting it be implicitly understood.


This is a fairly pervasive, as well as being an important optimization. We don't want to be copying data around from binary files because the amount of data that ends up being copied could quickly spiral out of control since binaries get quite large. The semantics are that the data is valid as long as the backing file remains opened. Anyone using the class needs to be aware of this, and if they want a lifetime that is not tied to the life of the backing file (which is a fairly uncommon scenario), they need to explicitly copy any data they need to outlive the file.

labath abandoned this revision.Mar 21 2019, 3:54 AM

instead of a fresh tool, minidump support will be added to obj2yaml.