*IN PROGRESS* This is in progress and is not supposed to be final. Many interfaces are incomplete / unimplemented. I'm submitting this early to solicit comments on the overall design, so that anyone interested can help shape the design of the library.
I approached this design with the following assumptions in mind:
- The interface to the PDB reading code should expose no Windows-specific types.
- LLVMDebugInfoPDB should compile on non-Windows
- LLVMDebugInfoPDB should support multiple simultaneous implementations of a PDB reader, and you should be able to choose which one you create at runtime.
#1 means that, essentially, I need to "shim" or clone the interface to DIA. A full description of DIA and its available interfaces can be found here: https://msdn.microsoft.com/en-us/library/108e9y6d.aspx. Since DIA is COM-based, there is really no way to hide the Windows nature of these methods and interfaces without making an entirely new interface that looks about the same, and has a DIA version as one possible implementation.
#3 is assumed in order to leave open the possibility that someone in the future may create a different implementation of this PDB reader. Even if that happens though, I believe we may still want a DIA-based version of the itnerface in the code. The reason for this is to make it possible to understand how PDBs evolve with new versions of Visual Studio and new releases of the DIA SDK which expose new functionality which may not be recognized by a non-DIA based implementation.
The DIA SDK has a somewhat awkward object model, so before thinking about this patch it helps to understand at a fundamental level how PDBs are queried in DIA. The full set of interfaces is at the aforementioned URL, but one of those interfaces is of particular importance. The IDiaSymbol interface [https://msdn.microsoft.com/en-us/library/w0edf0x4.aspx]. This is the *only* access to symbols in the PDB. This means functions, executables, global variables, and more are all queried through this interface (A complete list of symbol types is enumerated in the PDB_SymType enum included in this patch). The value of this enum for a particular symbol determines which subset of methods on the IDiaSymbol are valid to call. For example, the methods which are valid to call on an IDiaSymbol whose symbol type is PDB_SymType::Function are documented here [https://msdn.microsoft.com/en-us/library/62w760s9.aspx].
I have attempted to model this in my patch as follows:
- An abstract interface named IPDBSymbolBase includes the same set of methods defined by IDiaSymbol. Implementers of a particular PDB reading strategy (e.g. DIA) are expected to implement this interface.
- For each concrete symbol type, a non-abstract class is provided. In this initial patch, this includes PDBFunction and PDBExecutable, which expose the exact set of methods that are valid for that symbol type. This is essentially a wrapper about the base type, whose purpose is to limit the set of methods which the user can call.
- A user can obtain one of the concrete wrapper types by either manually constructing it with an IPDBSymbolBase pointer, or by using a dyn_cast<> like template called pdb_symbol_cast<>.
Note that DIA provides access to other information from a PDB that is not just symbols. For example, line numbers, source files, and various other things are simply independent interfaces and do not follow this tag-based / casting model. For those, I have simply cloned the interface and will expect the implementor to provide an implementation, and the user to use it through a base pointer.
Sorry for the long winded introduction, but since some people on this list have 0 experience with DIA, I hope this helps. I'm open to completely gutting this / re-working it if people think a different design would be more appropriate.
In theory the LLVM coding convention requires that types with vtables have an anchor function (one non-inline virtual function).
I'm not sure if the type /only/ has pure virtual functions whether that matters (but it probably does - even a vtable of only pure virtual functions still has to exist, I think)