We map the entire PDB into the process, and then when reading various streams, in order to guarantee that we have certain structures in a contiguous format, we read them out of the stream and order them appropriately, essentially copying the bytes.
This is incredibly memory inefficient, as it essentially means that loading a PDB into memory will use roughly double the file size since almost all important data structures are copied.
To address this, I've introduced a number of steps:
- Introduce a class called StreamView. This is analagous to an ArrayRef. It provides a limited window on top of a larger stream (which can be any type of stream, including another StreamView. This is useful for constraining stream operations to specific substreams or fields, for example when one large stream is broken into multiple logical sections.
- Introduce a set of 3 "stream data structures". These are currently StreamString, FixedStreamArray<T>, and VarStreamArray. These classes all share the same underlying purpose: Try to return references to values in the source byte stream if it can be done contiguously, but if not, copy them into temporary storage. The first one wraps a String, the second one wraps an array of fixed size records, and the third one wraps an array of variable length records. VarStreamArray will prove particularly useful, because in order to return a reference to the data in the source byte stream, the entire array need not be contiguous, only the single record being requested.
- Update the StreamReader class to be able to read values of type StreamString, FixedStreamArray<T>, and VarStreamArray.
I updated DBIStream to use these new classes in a number of places, but currently there is not much memory savings because it's not yet being used on the type and symbol records stream, which comprise 95% of the file size. I plan to do that in a subsequent patch but I just wanted to get the infrastructure in place first.
Is "discontiguous blocks in a file" better?