This is an archive of the discontinued LLVM Phabricator instance.

Provide a common interface for all collections of types.
ClosedPublic

Authored by zturner on May 17 2017, 11:53 AM.

Details

Summary

Right now we have multiple notions of things that represent collections of types. Most commonly used are TypeDatabase, which is supposed to keep mappings from TypeIndex to type name when reading a type stream, which happens when reading PDBs. And also TypeTableBuilder, which is used to build up a collection of types dynamically which we will later serialize (i.e. when writing PDBs).

But often you just want to do some operation on a collection of types, and you may want to do the same operation on any kind of collection. For example, you might want to merge two TypeTableBuilders or you might want to merge two type streams that you loaded from various files.

This dichotomy between reading and writing is responsible for a lot of the existing code duplication and overlapping responsibilities in the existing CodeView library classes. For example, after building up a TypeTableBuilder with a bunch of type records, if we want to dump it we have to re-invent a bunch of extra glue because our dumper takes a TypeDatabase or a CVTypeArray, which are both incompatible with TypeTableBuilder.

This patch introduces an abstract base class called TypeCollection which is shared between the various type collection like things. Wherever we previously stored a TypeDatabase& in some common class, we now store a TypeCollection&.

The advantage of this is that all the details of how the collection are implemented, such as lazy deserialization of partial type streams, is completely transparent and you can just treat any collection of types the same regardless of where it came from.

This is a large patch, but I cannot come up with a way to make it smaller. I have D33229 in the queue for review, and D33245 already submitted which themselves were already broken off from this to make it smaller, so I think this may be the best I can do.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.May 17 2017, 11:53 AM
rnk edited edge metadata.May 17 2017, 12:38 PM

Do you think all these TypeCollections should eventually just be the same thing? So, I could read a TypeCollection from a PDB, and then add more type records to it like I would with TypeTableBuilder. Then we would have a full read/write/edit API for types, which is something we seem to want for incremental PDB writing.

llvm/include/llvm/DebugInfo/CodeView/LazyRandomTypeCollection.h
48 ↗(On Diff #99332)

I feel like LazyRandom is really implementation-detail oriented. Basically, this is a TypeCollection that's efficient for PDBs. "Random" also suggests that it generates random results, which would be weird.

Maybe call it PDBTypeCollection? Or TypeStreamCollection? It seems awkward to have a class starting with PDB in DebugInfo/CodeView.

In D33293#757740, @rnk wrote:

Do you think all these TypeCollections should eventually just be the same thing? So, I could read a TypeCollection from a PDB, and then add more type records to it like I would with TypeTableBuilder. Then we would have a full read/write/edit API for types, which is something we seem to want for incremental PDB writing.

Maybe, it's something that I've been thinking about, but it's not entirely obvious how it would work. The reading side is optimized around memory usage, meaning that since we already have the entire PDB file mapped into memory, we don't really want to allocate hundreds of megabytes more memory just to copy the data from some type records. On the other hand, When you're writing you have no choice but to allocate records individually on the fly as you determine what kind of records you're going to need.

It's not entirely obvious how to merge these into one class that doesn't sacrifice memory efficiency on the reading side, but still allows each record to be allocated from private storage

llvm/include/llvm/DebugInfo/CodeView/LazyRandomTypeCollection.h
48 ↗(On Diff #99332)

I don't think it's only efficient for PDBs. It's just most efficient for PDBs. It's still very useful when all you have is a .degug$T section from an object file. Even though you don't have any partial offsets, random access is still a very useful feature.

Heck, we could even put a TypeIndexOffset array into the object file in a magic section, and get all the benefits of lazy deserialization even without a PDB at all.

This revision was automatically updated to reflect the committed changes.
zturner reopened this revision.May 17 2017, 3:20 PM

doh, I didn't actually submit this. I attached the wrong review to the commit message in Phabricator.

rnk added a comment.May 18 2017, 3:54 PM

I don't think you uploaded the right patch

This revision was automatically updated to reflect the committed changes.
llvm/trunk/lib/DebugInfo/CodeView/LazyRandomTypeCollection.cpp