This is an archive of the discontinued LLVM Phabricator instance.

Delay writing the PDB build id until just before file commit.
ClosedPublic

Authored by zturner on Feb 28 2018, 2:39 PM.

Details

Summary

The idea here is that we want to be able to create a reproducible hash of the PDB file before and use this as the timestamp field in the PDB and set the timestamp of the executable to match.

In order to do this, we need a way to defer emitting the signature until after everything else in the PDB file has been written. This way when we hash the PDB file, we hash it as if it had default values (e.g. all zeros) for the build id, and then update the build id field.

For now, this is NFC intended only to make the necessary refactoring to do this kind of deferred emission of the build id possible. No actual hashing of the PDB is done yet. We'll do that in a followup.

Diff Detail

Event Timeline

zturner created this revision.Feb 28 2018, 2:39 PM
ruiu added inline comments.Feb 28 2018, 2:48 PM
lld/COFF/PDB.cpp
1124

Remove llvm::.

1128–1136

Does this unconditionally set time(nullptr) to the build id? If so, can't you pass it as an argument to Builder ctor or something?

llvm/lib/DebugInfo/PDB/Native/InfoStreamBuilder.cpp
30

This is suspicious -- is sizeof(Guid) == sizeof(Guid.Guid)?

zturner added inline comments.Feb 28 2018, 3:01 PM
lld/COFF/PDB.cpp
1128–1136

Yes, but for now I'm trying to keep this no functional change. This is exactly what the code did before, so I wanted to leave it the same. In followup patches when we actually use a hash, the call to time will just go away.

llvm/lib/DebugInfo/PDB/Native/InfoStreamBuilder.cpp
30

Yes, but I could add a static_assert to that effect.

ruiu added inline comments.Feb 28 2018, 3:04 PM
lld/COFF/PDB.cpp
1128–1136

I wonder if you could pass a lambda to compute a hash value instead. Lambda is a convenient way to pass a piece of code to other module. I'm wondering if it's applicable here.

llvm/lib/DebugInfo/PDB/Native/InfoStreamBuilder.cpp
30

But why don't you do

::memset(Guid.Guid, 0, sizeof(Guid.Guid));

which is always correct regardless of Guid and Guid.Guid size?

zturner updated this revision to Diff 136406.Feb 28 2018, 3:05 PM

Updated based on suggestions from Rui.

zturner added inline comments.Feb 28 2018, 3:08 PM
lld/COFF/PDB.cpp
1128–1136

In practice this value is going to be a hash of the PDB. So instead of calling time(nullptr) I just call a function to compute the hash. Then somehow I pass this value back to lld::coff::Writer so it can write the same value to the Debug Directory in the PE. So there won't be a need to pass a lambda, because the PDB Linker will still be the one to compute it. So in a followup patch, I'll delete this, replace it with something like like Result.IdPtr->Signature = hashPdb(*Result.File); and then return the value of Result.IdPtr->Signature back to the caller so it can write the same value. So I don't think lambdas or anything fancy are needed.

compnerd added inline comments.Feb 28 2018, 3:11 PM
lld/COFF/PDB.cpp
1059

Can we sink this to the same site too? I think that the PDB writer indicating what it emitted would be nicer.

zturner added inline comments.Feb 28 2018, 3:14 PM
lld/COFF/PDB.cpp
1059

We could, but this value is useless for the PE writer and is definitely PDB specific. It's not part of the build id or anything and is really more of an implementation detail of the PDB, so that's why I left it here.

ruiu added inline comments.Feb 28 2018, 3:14 PM
lld/COFF/PDB.cpp
1128–1136

I'm little confused. Builder knows the pdb file it is creating, so it is not clear to me why only the hash computation needs to be done on the caller side. Why can't you do that inside Builder and make Builder return a Signature?

zturner added inline comments.Feb 28 2018, 3:18 PM
lld/COFF/PDB.cpp
1128–1136

It might work. I can try.

zturner updated this revision to Diff 136429.Feb 28 2018, 4:35 PM

Re-write this to have PDBFileBuilder emit the signature. It's still done as the last step though, so that we can drop in a hash in a followup.

This was a little awkward, because in the case of something like yamlToPdb we actually want to use the exact value in the yaml and not recompute it. I made it work by having the InfoStreamBuilder allow you to optionally set the signature, and if it's not set, it will use the computed value.

ruiu accepted this revision.Feb 28 2018, 4:59 PM

LGTM

lld/COFF/PDB.cpp
1058–1059

Uuid

This revision is now accepted and ready to land.Feb 28 2018, 4:59 PM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.