-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374 llvm-svn: 286491
- llvmorg-21-init
- llvmorg-20.1.0
- llvmorg-20.1.0-rc3
- llvmorg-20.1.0-rc2
- llvmorg-20.1.0-rc1
- llvmorg-20-init
- llvmorg-19.1.7
- llvmorg-19.1.6
- llvmorg-19.1.5
- llvmorg-19.1.4
- llvmorg-19.1.3
- llvmorg-19.1.2
- llvmorg-19.1.1
- llvmorg-19.1.0
- llvmorg-19.1.0-rc4
- llvmorg-19.1.0-rc3
- llvmorg-19.1.0-rc2
- llvmorg-19.1.0-rc1
- llvmorg-19-init
- llvmorg-18.1.8
- llvmorg-18.1.7
- llvmorg-18.1.6
- llvmorg-18.1.5
- llvmorg-18.1.4
- llvmorg-18.1.3
- llvmorg-18.1.2
- llvmorg-18.1.1
- llvmorg-18.1.0
- llvmorg-18.1.0-rc4
- llvmorg-18.1.0-rc3
- llvmorg-18.1.0-rc2
- llvmorg-18.1.0-rc1
- llvmorg-18-init
- llvmorg-17.0.6
- llvmorg-17.0.5
- llvmorg-17.0.4
- llvmorg-17.0.3
- llvmorg-17.0.2
- llvmorg-17.0.1
- llvmorg-17.0.0
- llvmorg-17.0.0-rc4
- llvmorg-17.0.0-rc3
- llvmorg-17.0.0-rc2
- llvmorg-17.0.0-rc1
- llvmorg-17-init
- llvmorg-16.0.6
- llvmorg-16.0.5
- llvmorg-16.0.4
- llvmorg-16.0.3
- llvmorg-16.0.2
- llvmorg-16.0.1
- llvmorg-16.0.0
- llvmorg-16.0.0-rc4
- llvmorg-16.0.0-rc3
- llvmorg-16.0.0-rc2
- llvmorg-16.0.0-rc1
- llvmorg-16-init
- llvmorg-15.0.7
- llvmorg-15.0.6
- llvmorg-15.0.5
- llvmorg-15.0.4
- llvmorg-15.0.3
- llvmorg-15.0.2
- llvmorg-15.0.1
- llvmorg-15.0.0
- llvmorg-15.0.0-rc3
- llvmorg-15.0.0-rc2
- llvmorg-15.0.0-rc1
- llvmorg-15-init
- llvmorg-14.0.6
- llvmorg-14.0.5
- llvmorg-14.0.4
- llvmorg-14.0.3
- llvmorg-14.0.2
- llvmorg-14.0.1
- llvmorg-14.0.0
- llvmorg-14.0.0-rc4
- llvmorg-14.0.0-rc3
- llvmorg-14.0.0-rc2
- llvmorg-14.0.0-rc1
- llvmorg-14-init
- llvmorg-13.0.1
- llvmorg-13.0.1-rc3
- llvmorg-13.0.1-rc2
- llvmorg-13.0.1-rc1
- llvmorg-13.0.0
- llvmorg-13.0.0-rc4
- llvmorg-13.0.0-rc3
- llvmorg-13.0.0-rc2
- llvmorg-13.0.0-rc1
- llvmorg-13-init
- llvmorg-12.0.1
- llvmorg-12.0.1-rc4
- llvmorg-12.0.1-rc3
- llvmorg-12.0.1-rc2
- llvmorg-12.0.1-rc1
- llvmorg-12.0.0
- llvmorg-12.0.0-rc5
- llvmorg-12.0.0-rc4
- llvmorg-12.0.0-rc3
- llvmorg-12.0.0-rc2
- llvmorg-12.0.0-rc1
- llvmorg-12-init
- llvmorg-11.1.0
- llvmorg-11.1.0-rc3
- llvmorg-11.1.0-rc2
- llvmorg-11.1.0-rc1
- llvmorg-11.0.1
- llvmorg-11.0.1-rc2
- llvmorg-11.0.1-rc1
- llvmorg-11.0.0
- llvmorg-11.0.0-rc6
- llvmorg-11.0.0-rc5
- llvmorg-11.0.0-rc4
- llvmorg-11.0.0-rc3
- llvmorg-11.0.0-rc2
- llvmorg-11.0.0-rc1
- llvmorg-11-init
- llvmorg-10.0.1
- llvmorg-10.0.1-rc4
- llvmorg-10.0.1-rc3
- llvmorg-10.0.1-rc2
- llvmorg-10.0.1-rc1
- llvmorg-10.0.0
- llvmorg-10.0.0-rc6
- llvmorg-10.0.0-rc5
- llvmorg-10.0.0-rc4
- llvmorg-10.0.0-rc3
- llvmorg-10.0.0-rc2
- llvmorg-10.0.0-rc1
- llvmorg-10-init
- llvmorg-9.0.1
- llvmorg-9.0.1-rc3
- llvmorg-9.0.1-rc2
- llvmorg-9.0.1-rc1
- llvmorg-9.0.0
- llvmorg-9.0.0-rc6
- llvmorg-9.0.0-rc5
- llvmorg-9.0.0-rc4
- llvmorg-9.0.0-rc3
- llvmorg-9.0.0-rc2
- llvmorg-9.0.0-rc1
- llvmorg-8.0.1
- llvmorg-8.0.1-rc4
- llvmorg-8.0.1-rc3
- llvmorg-8.0.1-rc2
- llvmorg-8.0.1-rc1
- llvmorg-8.0.0
- llvmorg-8.0.0-rc5
- llvmorg-8.0.0-rc4
- llvmorg-8.0.0-rc3
- llvmorg-8.0.0-rc2
- llvmorg-8.0.0-rc1
- llvmorg-7.1.0
- llvmorg-7.1.0-rc1
- llvmorg-7.0.1
- llvmorg-7.0.1-rc3
- llvmorg-7.0.1-rc2
- llvmorg-7.0.1-rc1
- llvmorg-7.0.0
- llvmorg-7.0.0-rc3
- llvmorg-7.0.0-rc2
- llvmorg-7.0.0-rc1
- llvmorg-6.0.1
- llvmorg-6.0.1-rc3
- llvmorg-6.0.1-rc2
- llvmorg-6.0.1-rc1
- llvmorg-6.0.0
- llvmorg-6.0.0-rc3
- llvmorg-6.0.0-rc2
- llvmorg-6.0.0-rc1
- llvmorg-5.0.2
- llvmorg-5.0.2-rc2
- llvmorg-5.0.2-rc1
- llvmorg-5.0.1
- llvmorg-5.0.1-rc3
- llvmorg-5.0.1-rc2
- llvmorg-5.0.1-rc1
- llvmorg-5.0.0
- llvmorg-5.0.0-rc5
- llvmorg-5.0.0-rc4
- llvmorg-5.0.0-rc3
- llvmorg-5.0.0-rc2
- llvmorg-5.0.0-rc1
- llvmorg-4.0.1
- llvmorg-4.0.1-rc3
- llvmorg-4.0.1-rc2
- llvmorg-4.0.1-rc1
- llvmorg-4.0.0
- llvmorg-4.0.0-rc4
- llvmorg-4.0.0-rc3
- llvmorg-4.0.0-rc2
- llvmorg-4.0.0-rc1
Zachary Turner
committed
Nov 10, 2016
1 parent
58ddb8d
commit 218ce83
Showing
10 changed files
with
306 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The PDB DBI (Debug Info) Stream | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The PDB Global Symbol Stream | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The TPI & IPI Hash Streams | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The Module Information Stream | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
===================================== | ||
The MSF File Format | ||
===================================== | ||
|
||
.. contents:: | ||
:local: | ||
|
||
.. _msf_superblock: | ||
|
||
The Superblock | ||
============== | ||
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as | ||
follows: | ||
|
||
.. code-block:: c++ | ||
|
||
struct SuperBlock { | ||
char FileMagic[sizeof(Magic)]; | ||
ulittle32_t BlockSize; | ||
ulittle32_t FreeBlockMapBlock; | ||
ulittle32_t NumBlocks; | ||
ulittle32_t NumDirectoryBytes; | ||
ulittle32_t Unknown; | ||
ulittle32_t BlockMapAddr; | ||
}; | ||
|
||
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"`` | ||
followed by the bytes ``1A 44 53 00 00 00``. | ||
- **BlockSize** - The block size of the internal file system. Valid values are | ||
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary | ||
depending on the block sizes. For the purposes of LLVM, we handle only block | ||
sizes of 4KiB, and all further discussion assumes a block size of 4KiB. | ||
- **FreeBlockMapBlock** - The index of a block within the file, at which begins | ||
a bitfield representing the set of all blocks within the file which are "free" | ||
(i.e. the data within that block is not used). This bitfield is spread across | ||
the MSF file at ``BlockSize`` intervals. | ||
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field | ||
is designed to support incremental and atomic updates of the underlying MSF | ||
file. While writing to an MSF file, if the value of this field is `1`, you | ||
can write your new modified bitfield to page 2, and vice versa. Only when | ||
you commit the file to disk do you need to swap the value in the SuperBlock | ||
to point to the new ``FreeBlockMapBlock``. | ||
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize`` | ||
should equal the size of the file on disk. | ||
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream | ||
directory contains information about each stream's size and the set of blocks | ||
that it occupies. It will be described in more detail later. | ||
- **BlockMapAddr** - The index of a block within the MSF file. At this block is | ||
an array of ``ulittle32_t``'s listing the blocks that the stream directory | ||
resides on. For large MSF files, the stream directory (which describes the | ||
block layout of each stream) may not fit entirely on a single block. As a | ||
result, this extra layer of indirection is introduced, whereby this block | ||
contains the list of blocks that the stream directory occupies, and the stream | ||
directory itself can be stitched together accordingly. The number of | ||
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``. | ||
|
||
The Stream Directory | ||
==================== | ||
The Stream Directory is the root of all access to the other streams in an MSF | ||
file. Beginning at byte 0 of the stream directory is the following structure: | ||
|
||
.. code-block:: c++ | ||
|
||
struct StreamDirectory { | ||
ulittle32_t NumStreams; | ||
ulittle32_t StreamSizes[NumStreams]; | ||
ulittle32_t StreamBlocks[NumStreams][]; | ||
}; | ||
|
||
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes. | ||
Note that each of the last two arrays is of variable length, and in particular | ||
that the second array is jagged. | ||
|
||
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4 | ||
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}. | ||
|
||
Stream 0: ceil(1000 / 4096) = 1 block | ||
|
||
Stream 1: ceil(8000 / 4096) = 2 blocks | ||
|
||
Stream 2: ceil(16000 / 4096) = 4 blocks | ||
|
||
Stream 3: ceil(9000 / 4096) = 3 blocks | ||
|
||
In total, 10 blocks are used. Let's see what the stream directory might look | ||
like: | ||
|
||
.. code-block:: c++ | ||
|
||
struct StreamDirectory { | ||
ulittle32_t NumStreams = 4; | ||
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000}; | ||
ulittle32_t StreamBlocks[][] = { | ||
{4}, | ||
{5, 6}, | ||
{11, 9, 7, 8}, | ||
{10, 15, 12} | ||
}; | ||
}; | ||
|
||
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes`` | ||
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one | ||
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``. | ||
|
||
Note also that the streams are discontiguous, and that part of stream 3 is in the | ||
middle of part of stream 2. You cannot assume anything about the layout of the | ||
blocks! | ||
|
||
Alignment and Block Boundaries | ||
============================== | ||
As may be clear by now, it is possible for a single field (whether it be a high | ||
level record, a long string field, or even a single ``uint16``) to begin and | ||
end in separate blocks. For example, if the block size is 4096 bytes, and a | ||
``uint16`` field begins at the last byte of the current block, then it would | ||
need to end on the first byte of the next block. Since blocks are not | ||
necessarily contiguously laid out in the file, this means that both the consumer | ||
and the producer of an MSF file must be prepared to split data apart | ||
accordingly. In the aforementioned example, the high byte of the ``uint16`` | ||
would be written to the last byte of block N, and the low byte would be written | ||
to the first byte of block N+1, which could be tens of thousands of bytes later | ||
(or even earlier!) in the file, depending on what the stream directory says. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
======================================== | ||
The PDB Info Stream (aka the PDB Stream) | ||
======================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The PDB Public Symbol Stream | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
===================================== | ||
The PDB TPI Stream | ||
===================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
===================================== | ||
The PDB File Format | ||
===================================== | ||
|
||
.. contents:: | ||
:local: | ||
|
||
.. _pdb_intro: | ||
|
||
Introduction | ||
============ | ||
|
||
PDB (Program Database) is a file format invented by Microsoft and which contains | ||
debug information that can be consumed by debuggers and other tools. Since | ||
officially supported APIs exist on Windows for querying debug information from | ||
PDBs even without the user understanding the internals of the file format, a | ||
large ecosystem of tools has been built for Windows to consume this format. In | ||
order for Clang to be able to generate programs that can interoperate with these | ||
tools, it is necessary for us to generate PDB files ourselves. | ||
|
||
At the same time, LLVM has a long history of being able to cross-compile from | ||
any platform to any platform, and we wish for the same to be true here. So it | ||
is necessary for us to understand the PDB file format at the byte-level so that | ||
we can generate PDB files entirely on our own. | ||
|
||
This manual describes what we know about the PDB file format today. The layout | ||
of the file, the various streams contained within, the format of individual | ||
records within, and more. | ||
|
||
We would like to extend our heartfelt gratitude to Microsoft, without whom we | ||
would not be where we are today. Much of the knowledge contained within this | ||
manual was learned through reading code published by Microsoft on their `GitHub | ||
repo <https://github.com/Microsoft/microsoft-pdb>`__. | ||
|
||
.. _pdb_layout: | ||
|
||
File Layout | ||
=========== | ||
|
||
.. toctree:: | ||
:hidden: | ||
|
||
MsfFile | ||
PdbStream | ||
TpiStream | ||
DbiStream | ||
ModiStream | ||
PublicStream | ||
GlobalStream | ||
HashStream | ||
|
||
.. _msf: | ||
|
||
The MSF Container | ||
----------------- | ||
A PDB file is really just a special case of an MSF (Multi-Stream Format) file. | ||
An MSF file is actually a miniature "file system within a file". It contains | ||
multiple streams (aka files) which can represent arbitrary data, and these | ||
streams are divided into blocks which may not necessarily be contiguously | ||
laid out within the file (aka fragmented). Additionally, the MSF contains a | ||
stream directory (aka MFT) which describes how the streams (files) are laid | ||
out within the MSF. | ||
|
||
For more information about the MSF container format, stream directory, and | ||
block layout, see :doc:`MsfFile`. | ||
|
||
.. _streams: | ||
|
||
Streams | ||
------- | ||
The PDB format contains a number of streams which describe various information | ||
such as the types, symbols, source files, and compilands (e.g. object files) | ||
of a program, as well as some additional streams containing hash tables that are | ||
used by debuggers and other tools to provide fast lookup of records and types | ||
by name, and various other information about how the program was compiled such | ||
as the specific toolchain used, and more. A summary of streams contained in a | ||
PDB file is as follows: | ||
|
||
+--------------------+------------------------------+-------------------------------------------+ | ||
| Name | Stream Index | Contents | | ||
+====================+==============================+===========================================+ | ||
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information | | ||
| | | - Fields to match EXE to this PDB | | ||
| | | - Map of named streams to stream indices | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records | | ||
| | | - Index of TPI Hash Stream | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information | | ||
| | | - Indices of individual module streams | | ||
| | | - Indices of public / global streams | | ||
| | | - Section Contribution Information | | ||
| | | - Source File Information | | ||
| | | - FPO / PGO Data | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records | | ||
| | | - Index of IPI Hash Stream | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| /LinkInfo | - Contained in PDB Stream | - Unknown | | ||
| | Named Stream map | | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| /src/headerblock | - Contained in PDB Stream | - Unknown | | ||
| | Named Stream map | | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| /names | - Contained in PDB Stream | - PDB-wide global string table used for | | ||
| | Named Stream map | string de-duplication | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module | | ||
| | - One for each compiland | - Line Number Information | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records | | ||
| | | - Index of Public Hash Stream | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| Global Stream | - Contained in DBI Stream | - Global Symbol Records | | ||
| | | - Index of Global Hash Stream | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records | | ||
| | | by name | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records | | ||
| | | by name | | ||
+--------------------+------------------------------+-------------------------------------------+ | ||
|
||
More information about the structure of each of these can be found on the | ||
following pages: | ||
|
||
:doc:`PdbStream` | ||
Information about the PDB Info Stream and how it is used to match PDBs to EXEs. | ||
|
||
:doc:`TpiStream` | ||
Information about the TPI stream and the CodeView records contained within. | ||
|
||
:doc:`DbiStream` | ||
Information about the DBI stream and relevant substreams including the Module Substreams, | ||
source file information, and CodeView symbol records contained within. | ||
|
||
:doc:`ModiStream` | ||
Information about the Module Information Stream, of which there is one for each compilation | ||
unit and the format of symbols contained within. | ||
|
||
:doc:`PublicStream` | ||
Information about the Public Symbol Stream. | ||
|
||
:doc:`GlobalStream` | ||
Information about the Global Symbol Stream. | ||
|
||
:doc:`HashStream` | ||
Information about the Hash Table stream, and how it can be used to quickly look up records | ||
by name. | ||
|
||
CodeView | ||
======== | ||
CodeView is another format which comes into the picture. While MSF defines | ||
the structure of the overall file, and PDB defines the set of streams that | ||
appear within the MSF file and the format of those streams, CodeView defines | ||
the format of **symbol and type records** that appear within specific streams. | ||
Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for | ||
more information about the CodeView format. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters