Previously, our algorithm to compute a build id involved hashing the executable and storing that as the GUID in the CV Debug Record chunk, and setting the age to 1.
This breaks down in one very obvious case: a user adds some newlines to a file, rebuilds, but changes nothing else. This causes new line information and new file checksums to get written to the PDB, meaning that the debug info is different, but the generated code would be the same, so we would write the same build over again with an age of 1.
Anyone using a symbol cache would have a problem now, because the debugger would open the executable, look at the age and guid, find a matching PDB in the symbol cache and then load it. It would never copy the new PDB to the symbol cache.
There are various other scenarios where this can arise as well, but this one is both the easiest to describe and is what people will run into the most often.
PE files are ultimately matched against PDBs using 3 pieces of information:
- A 128-bit GUID. When an executable is being written for the first time, a new GUID is generated, otherwise the existing GUID is re-used.
- When an executable is being written for the first time, the Age is set to 1, otherwise the age is incremented.
- Every time an executable is written, the time stamp field is updated.
Then, the debugger goes through these steps when looking for a PDB:
- Is there a PDB with a matching GUID? If not, there's no debug info, otherwise go to step 2.
- Does the PDB have a matching age? If not, warn the user that the debug info might be stale, otherwise go to step 3.
- Does the PDB have a matching timestamp? If not, warn the user that the debug info might be stale, otherwise we're good to go.
This patch implements the first two of these fields. We're still not writing a timestamp, but this is at least better than before. Unfortunately, this hurts reproducibility, but if we want PDBs to work correctly, we don't have much of a choice but to do this. We can still get reproducibility through additional flags, or via a tool that runs as a post-processing step to strip out un-reproducible pieces.
Could this talk about build determinism a bit? rnk mentioned that upthread.
Is there a list of all sources of build indeterminism in lld? Deterministic builds is one of the motivations for the whole clang-cl thing, so it'd be good if we had a list of things needed to get there :-)