Type streams are notoriously slow to access, because records are variable length, and so given a particular TypeIndex, finding the record with the given TypeIndex is O(n). But if we know the offsets of *some* of the elements, we can do better.
The type visitor here provides lazy, random access to an arbitrary type stream, only ever doing as much work as is necessary to find the requested type, and never doing a linear scan more than once for any record, even if it is visited more than once.
To do so, it makes use of a "type index offset array", which is essentially an array of pairs, where the first item is a TypeIndex, and the second item is a byte offset into the global type stream. Such an array is present as part of the serialized format of a PDB file, but in theory we could construct these on the fly for non-serialization purposes as well.
Using this visitor, When a given TypeIndex is requested, there are three possible outcomes.
- The TypeIndex has already been requested by the user before. In this case the absolute offset of this record will be cached. We just seek to the appropriate offset, deserialize one record, and invoke the user's callback immediately.
- The TypeIndex has not been requested by the user before, but it *has* visited internally while doing a linear scan for a different TypeIndex that the user did request. In this case, we also know the absolute offset of the record (see #3 below for how), and we proceed as in #1
- Neither 1 or 2 above apply. In this case we begin by finding the nearest item from the type index offset array that comes before the requested item. This is a log(M) operation where M is the number of known offsets, assuming the entries are roughly evenly spaced. There is guaranteed to be at least one, because TypeIndex 0 has an offset of 0.
Then, we also look for the nearest TypeIndex that meets the criteria in either 1 or 2 above. Whichever of these two is larger gives us the nearest known offset to the requested index. From there, we do a linear scan forward to the requested TypeIndex, caching offsets along the way as we deserialize each record. On subsequent attempts to fetch a record with a given TypeIndex, any of the records scanned during the execution of this step will have O(1) lookups.
While this process is going on, a TypeDatabase that the user supplies is being populated with the various individual records providing quick lookup to type names and data of known records without having to go through the visitor.