This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Add CodeViewRecordIO for reading and writing, and use it for reading type records.
ClosedPublic

Authored by zturner on Oct 27 2016, 11:45 AM.

Details

Summary

This is the first of two patches intended to merge the reading and writing codepaths for CodeView symbol and type records. This patch contains the following pieces:

  1. Introduces CodeViewRecordIO, which is similar to YamlIO in that it provides a single interface for reading or writing native types from an underlying byte stream, depending on what mode the class is in.
  2. Introduces TypeRecordMapping which uses CodeViewRecordIO to define the layout of CodeView Type records.
  3. Hook TypeRecordMapping up to the existing Pdb -> Yaml code to prove that it works in "reading" (e.g. bytes -> semantic record) mode.

What this patch does NOT do:

  1. Add any of the writing support. Doing so is somewhat non-trivial, as it requires taking all of the logic from TypeRecordBuilder, TypeTableBuilder, ListRecordBuilder, and dealing with record splitting and many other intricacies. That will be part 2 of this review sequence. I have that patch fully working and tested, but it will make this quite large and difficult to review to do it all at once.

Just to be clear: After patch 2, TypeTableBuilder, TypeRecordBuilder, ListRecordBuilder, and MemoryTypeTableBuilder will be deleted. The only thing left will be TypeRecordMapping and perhaps a small facade on top of that to simplify the interface.

Looking out further than that, with patch 3 (which I have not started working on yet), I plan to introduce a SymbolRecordMapping class. This will allow us to then delete all of the deserialize methods in SymbolRecord.h, the entire files RecordSerialization.h and RecordSerialization.cpp, and additionally give us Symbol serialization (which we do not yet have) for free.

Diff Detail

Event Timeline

zturner updated this revision to Diff 76073.Oct 27 2016, 11:45 AM
zturner retitled this revision from to [CodeView] Add CodeViewRecordIO for reading and writing, and use it for reading type records..
zturner updated this object.
zturner added reviewers: rnk, amccarth, inglorion.
zturner added a subscriber: llvm-commits.
zturner updated this object.Oct 27 2016, 11:45 AM
rnk added a reviewer: ruiu.Oct 27 2016, 4:26 PM
ruiu edited edge metadata.Oct 27 2016, 5:40 PM

How much is this similar to YamlIO? From my experience, I have to say that YamlIO is extremely slow when we give a fairly large (but not that large, like a few megabytes) data. I wonder if this is fast enough to handle hundreds of megabytes of debug info.

ruiu added inline comments.Oct 27 2016, 6:09 PM
include/llvm/DebugInfo/CodeView/CodeViewRecordIO.h
130–132

This invariant

IsWriting == (Writer != nullptr)

always holds, so IsWriting is redundant. You could replace all occurrences of if (IsWriting) with if (Writer).

include/llvm/DebugInfo/CodeView/TypeRecord.h
55–56

explicit for a ctor with three parameters?

include/llvm/DebugInfo/CodeView/TypeRecordMapping.h
37

Don't you want to #undef these macros after #include for safety?

zturner added inline comments.Oct 28 2016, 8:19 AM
include/llvm/DebugInfo/CodeView/TypeRecordMapping.h
37

This is such a common pattern that they're already #undef'ed at the end of the include file.

inglorion edited edge metadata.Oct 28 2016, 11:49 AM

I like how this gives us basically a single definition of how to map between objects and byte streams that we can then use for reading and writing. I am a bit unhappy with the repeated if (IsWriting) ... in CodeViewRecordIO. How would you feel about making two different classes - one to implement the reading and one to implement the writing?

I like how this gives us basically a single definition of how to map between objects and byte streams that we can then use for reading and writing. I am a bit unhappy with the repeated if (IsWriting) ... in CodeViewRecordIO. How would you feel about making two different classes - one to implement the reading and one to implement the writing?

Luckily CodeViewRecordIO is really low level and not intended to be the external interface to this thing. For that you use TypeRecordMapping (and in the future, SymbolRecordMapping). If we had something like std::variant<> this would be cleaner, but for now there's always going to be a check. We could hide it through a virtual function indirection, where you make an abstract base class with all the map methods, and then make a ReadMapper and a WriteMapper, and TypeRecordMapper just stores a std::unique_ptr<MapperBase>.

zturner updated this revision to Diff 76254.Oct 28 2016, 3:00 PM
zturner edited edge metadata.

Fixed suggestions from ruiu.

I looked into using virtual function indirection as a means of hiding the branches, and it's not entirely straightforward since there's some template functions to handle many types at once. I could make it work, but I'm not sure it's a win.

Another idea is to raise this whole thing up into CodeView/MSF and make it more generic by using traits classes instead of Mapping functors. That would allow a single Map method that uses template specializations to find mapping classes, and allow the mapping classes to provide read() and write() methods that separate the logic. I gave it a try but it was a little hairy and gets into some design issues, but it seems like a reasonable avenue for improvement in a followup patch.

Thanks for looking into my suggestions, @zturner. I also thought we might want to do traits in a follow-up. I'm good with this if @ruiu is good with it, so I'll let him accept.

Ping. Anyone can stamp this or offer further suggestions?

ruiu accepted this revision.Nov 1 2016, 1:22 PM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 1 2016, 1:22 PM
This revision was automatically updated to reflect the committed changes.