This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Re-write TypeSerializer and TypeTableBuilder
ClosedPublic

Authored by zturner on Nov 27 2017, 1:41 PM.

Details

Summary

The motivation behind this patch is that future directions require us to be able to compute the hash value of records independently of actually using them for de-duplication.

The current structure of TypeSerializer / TypeTableBuilder being a single entry point that takes an unserialized type record, and then hashes and de-duplicates it is not flexible enough to allow this.

At the same time, the existing TypeSerializer is already extremely complex for this very reason -- it tries to be too many things. In addition to serializing, hashing, and de-duplicating, ti also supports splitting up field list records and adding continuations. All of this functionality crammed into this one class makes it very complicated to work with and hard to maintain.

To solve all of these problems, I've re-written everything from scratch and split the functionality into separate pieces that can easily be reused. The end result is that one class TypeSerializer is turned into 3 new classes SimpleTypeSerializer, ContinuationRecordBuilder, and TypeTableBuilder, each of which in isolation is simple and straightforward.

A quick summary of these new classes and their responsibilities are:

  • SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of bytes. Does not do any hashing. Every time you call it, it will re-serialize and return bytes again. The same instance can be re-used over and over to avoid re-allocations, and in exchange for this optimization the bytes returned by the serializer only live until the caller attempts to serialize a new record.
  • ContinuationRecordBuilder : Turns a FieldList-like record into a series of fragments. Does not do any hashing. Like SimpleTypeSerializer, returns references to privately owned bytes, so the storage is invalidated as soon as the caller tries to re-use the instance. Works equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a long-standing theoretical limitation of the previous implementation.
  • TypeTableBuilder - Accepts sequences of bytes that the user has already serialized, and inserts them by de-duplicating with a hash table. For the sake of convenience and efficiency, this class internally stores a SimpleTypeSerializer so that it can accept unserialized records. The same is not true of ContinuationRecordBuilder. The user is required to create their own instance of ContinuationRecordBuilder.

Because of the separation of responsibilities, the proposed algorithm for content hashing now becomes straightforward to implement.

auto Hashes = readHashStream(Obj);
auto Types = readTypeStream(Obj);
for (int I=0; I < Hashes.size(); ++I) {
   TTB.insert_as(Types[I], Hashes[I]);
}

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Nov 27 2017, 1:41 PM
rnk added inline comments.Nov 27 2017, 2:35 PM
llvm/include/llvm/DebugInfo/CodeView/ContinuationRecordBuilder.h
55 ↗(On Diff #124455)

I see how this is supposed to work, but maybe stamping out overloads for all of the member types in the header and redirecting them to use a common template implementation in the .cpp file is a better way to do this.

Compilers sometimes warn when an implicit template instantiation is requested from a function template declaration with no body.

llvm/include/llvm/DebugInfo/CodeView/SimpleTypeSerializer.h
41 ↗(On Diff #124455)

Ditto, re: stamping out overloads.

llvm/include/llvm/DebugInfo/CodeView/TypeTableBuilder.h
1–2 ↗(On Diff #124455)

Fix that so it's on one line.

11 ↗(On Diff #124455)

Fix the header guard macro name

llvm/lib/DebugInfo/CodeView/ContinuationRecordBuilder.cpp
8 ↗(On Diff #124455)

I hate unicorn initialization =P

108 ↗(On Diff #124455)

nit: reflow

114 ↗(On Diff #124455)

nit: reflow

zturner added inline comments.Nov 27 2017, 2:49 PM
llvm/include/llvm/DebugInfo/CodeView/ContinuationRecordBuilder.h
55 ↗(On Diff #124455)

I had it like you suggested originally, but I actually did this on purpose because it's easier to look at a header file with a single declaration than it is to look at a bunch of template goo and list of 100 method overloads (or alternatively, the macro and #include goo that also works).

I explicitly instantiate every template in the implementation file using the macro / #include trick, but at least that way it's hidden in the implementation file. I thought explicit instantiations were correctly supported by all compilers?

zturner added inline comments.Nov 27 2017, 2:58 PM
llvm/lib/DebugInfo/CodeView/ContinuationRecordBuilder.cpp
8 ↗(On Diff #124455)

If you mean the {} syntax, it's actually necessary in this case since it requires an implicit conversion. I couldn't get it to work with an = syntax.

zturner updated this revision to Diff 124478.Nov 27 2017, 2:59 PM

Updated with suggestions.

rnk accepted this revision.Nov 27 2017, 3:08 PM

lgtm

llvm/include/llvm/DebugInfo/CodeView/ContinuationRecordBuilder.h
55 ↗(On Diff #124455)

It'll work, but template declarations in headers without inline definitions below are kind of a code smell. At least, I start looking around going "how can this work". At least put a comment explaining that these are instantiated for all member types in the .cpp file. If it's warning free with that, sounds good, ship it.

llvm/lib/DebugInfo/CodeView/ContinuationRecordBuilder.cpp
8 ↗(On Diff #124455)

Yeah, I figured. I'm just complaining.

This revision is now accepted and ready to land.Nov 27 2017, 3:08 PM
This revision was automatically updated to reflect the committed changes.
llvm/trunk/lib/DebugInfo/CodeView/TypeRecordMapping.cpp