This is an archive of the discontinued LLVM Phabricator instance.

[codeview] Add type stream merging prototype
ClosedPublic

Authored by rnk on May 10 2016, 11:26 AM.

Details

Summary

This code is intended to be used as part of LLD's PDB writing. Until
that exists, this is exposed via llvm-readobj for testing purposes.

The .cpp file implementing the core merging algorithm only weighs in at
~400 lines of code.

Type stream merging uses the following algorithm:

  • Begin with a new empty stream, and a new empty hash table that maps from type record contents to new type index.
  • For each new type stream, maintain a map from source type index to destination type index.
  • For each record, copy it and rewrite its type indices to be valid in the destination type stream.
  • If the new type record is not already present in the destination stream hash table, append it to the destination type stream, assign it the next type index, and update the two hash tables.
  • If the type record already exists in the destination stream, discard it and update the type index map to forward the source type index to the existing destination type index.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk updated this revision to Diff 56768.May 10 2016, 11:26 AM
rnk retitled this revision from to [codeview] Add type stream merging prototype.
rnk updated this object.
rnk added reviewers: zturner, ruiu.
rnk added a subscriber: llvm-commits.
ruiu edited edge metadata.May 11 2016, 1:39 PM

I think this is going to be used in a speed critical path. I'm wondering if this is the fastest way to merge debug types -- one hash table lookup for each derived type may be a bit slow. But at least I don't come up with a better way, so hash table lookup might be unavoidable.

include/llvm/DebugInfo/CodeView/MemoryTypeTableBuilder.h
62 ↗(On Diff #56768)

Why don't you use DenseMap?

lib/DebugInfo/CodeView/TypeStreamMerger.cpp
38–39 ↗(On Diff #56768)

It doesn't look like you are using hash table, but instead you are using vectors to map indices.

Ah, so the hash table is in MemoryTypeTableBuilder.h?

rnk added a comment.May 12 2016, 5:11 PM

Hm, the latest changes to TypeVisitor made this a lot, lot harder. With the old visitor, I duplicated a lot of the layout knowledge, but now I'm having a hard time remapping all the type indices because the current high level record types are immutable.

zturner edited edge metadata.May 12 2016, 5:15 PM
zturner added a subscriber: zturner.

Can't you make them mutable? I agree it's a bit annoying to change all
those classes, but it will probably need to happen eventually anyway

rnk added a comment.May 13 2016, 10:46 AM

I have a working patch, but it's much larger now because it uses the high level TypeTableBuilder APIs, and those needed to be fleshed out. I'll see if I can cut it down a bit.

include/llvm/DebugInfo/CodeView/MemoryTypeTableBuilder.h
62 ↗(On Diff #56768)

I didn't write it. I intend to come back here later and make it efficient, once it's correct and we can profile it.

lib/DebugInfo/CodeView/TypeStreamMerger.cpp
38–39 ↗(On Diff #56768)

Yeah, it's part of the implementation of TypeTableBuilder.

rnk updated this revision to Diff 57236.May 13 2016, 1:21 PM
rnk edited edge metadata.
  • Rewrite type stream merging to use new visitor
  • Update TypeTableBuilder serialization code to work

It is possible to do it faster if the miss rate is high, and the hash
function is expensive.
Otherwise, it's not.

zturner added inline comments.May 13 2016, 4:39 PM
lib/DebugInfo/CodeView/TypeRecord.cpp
45–48 ↗(On Diff #57236)

Will this short circuit if anything fails? Perhaps you should write Success = Success && remapIndex(IndexMap, ReturnType) etc.

zturner accepted this revision.May 13 2016, 4:39 PM
zturner edited edge metadata.
This revision is now accepted and ready to land.May 13 2016, 4:39 PM
rnk added inline comments.May 13 2016, 4:49 PM
lib/DebugInfo/CodeView/TypeRecord.cpp
45–48 ↗(On Diff #57236)

The idea is that it won't short circuit. It will remap all the indices to the "untranslated" index, and then you can decide if you want to warn the user that some of the input was invalid. It seemed useful in the context of writing a PDB, where some object files might be busted, but you don't want to abort the whole link.

This revision was automatically updated to reflect the committed changes.