This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.
This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.
Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)
I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.
Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.
This is the buggy bounds check. At this point, we don't know how large the relocation is, so it's hard to check. We have an implicit assumption here that all relocations are entirely within or outside of the subsection that is being relocated. The assumption is correct for well-formed debug information.
However, we have what appear to be invalid input objects in the pdb-file-static test case. These input objects are yaml-ified object files from MSVC. It seems that obj2yaml/yambl2obj of MSVC object files does not round trip. The relocations, which are maintained separately from the symbol records, do not correspond to the symbol record offsets that yaml2obj generates. We know that MSVC does not align symbol records to 4 bytes, but LLVM does. This may be the source of the discrepancy.
This bug is actually already present in LLD. An object file with a relocation that points to the last byte of a section will cause LLD to do an OOB write. Since LLD applies relocations directly to the output file memory, it is hard to observe or exploit this bug. However, given this recent refactoring, I think it might be worth going back and fixing it.