This is an archive of the discontinued LLVM Phabricator instance.

[pdb] Apply zero-copy algorithms to symbol stream iteration
ClosedPublic

Authored by zturner on May 26 2016, 10:37 PM.

Details

Summary

There is one PDB stream for each module (compiland). This stream contains (among other things) all of the symbols for each compiland. Generally speaking this is where a large portion of memory associated with the PDB lies.

This patch slightly improves the VarStreamArray class by templatizing it on a value type and changing the length conversion function to a value / length extraction function. In doing so, we can now apply the VarStreamArray to symbol streams and iterate them exactly the same as before, but without copying the entire symbol stream into a contiguous buffer.

Measurements:
PDB File Size: 6,598,656 bytes
Combined size of all symbol streams: 1,985,508 bytes

So this is a savings of 30% of the PDB size. We should be able to claim another 30% or so by applying this to the type info streams.

Diff Detail

Event Timeline

zturner updated this revision to Diff 58755.May 26 2016, 10:37 PM
zturner retitled this revision from to [pdb] Apply zero-copy algorithms to symbol stream iteration.
zturner updated this object.
zturner added a reviewer: ruiu.
zturner added a subscriber: llvm-commits.

Also worth pointing out that eventually, once this works with types, we can delete RecordIterator.h. That was the "generic" method that was supposed to be able to be used to treat type and symbol streams the same, but VarStreamArray<T> works with type streams, symbol streams, and any other arbitrary type of variable length record array such as the module info array.

ruiu added inline comments.May 27 2016, 7:50 AM
include/llvm/DebugInfo/CodeView/RecordIterator.h
29

Do you need a separate traits class? Looks like you can merge it into the non-traits class if you change the definition of VarStreamArray.

Not sure I understand, can you elaborate?

ruiu edited edge metadata.May 27 2016, 8:13 AM

I mean it seems you can define static uint32_t extract(const StreamInterface &Stream, CVRecord &Item) for CVRecord class and template VarStreamArray with CVRecord (instead of the traits class).

So you are currently doing this.

VarStreamArrayIterator(const VarStreamArray<TraitsType> &Array)
    : Array(&Array), IterRef(Array.Stream) {
  ThisLen = TraitsType::extract(IterRef, ThisValue);
}

But looks like it can be written like this if you remove the template class and directly use CVRecord class.

VarStreamArrayIterator(const VarStreamArray<T> &Array)
    : Array(&Array), IterRef(Array.Stream) {
  ThisLen = T::extract(IterRef, ThisValue);
}

no?

Ahh, I didn't do this because I imagined cases where you might want
ValueType to be something you can't change like ArrayRef, or have 2
different VarStreamArrays both with the same ValueType but using a
different algorithm to extract them. Currently we don't but I thought we
might need it later

ruiu added a comment.May 27 2016, 8:40 AM

OK, I wouldn't make a trait class at this moment as it has already lots of meta programming, but it's your call.

include/llvm/DebugInfo/CodeView/RecordIterator.h
41–44

Don't you want to propagate the error up to the caller? Hiding errors in a deep function call seems risky.

I could pass 2 template arguments instead, ValueType and extraction functor
separately. Similar to std::map how you can pass a comparison functor

ruiu added a comment.May 27 2016, 8:51 AM

Ah, that would be much better, I think.

zturner updated this revision to Diff 58805.May 27 2016, 10:19 AM
zturner edited edge metadata.

Don't use a traits class, instead specify ValueType and extraction functor as separate template arguments.

Following the pattern of STL data structures like std::map, I've specified a default value for the extraction functor, so if you define a specialization of this template, you don't need to pass the second argument.

ruiu accepted this revision.May 27 2016, 10:23 AM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.May 27 2016, 10:23 AM
Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in r271025.