This is an archive of the discontinued LLVM Phabricator instance.

Differential D51956

lld-link: Set PDB GUID to hash of PDB contents instead of to a random byte sequence.
ClosedPublic

Authored by thakis on Sep 11 2018, 4:20 PM.

Download Raw Diff

Details

Reviewers

zturner
ruiu

Commits

rG0bd2d304e672: lld-link: Set PDB GUID to hash of PDB contents instead of to a random byte…
rG205ca68b8db3: Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID.
rL342334: lld-link: Set PDB GUID to hash of PDB contents instead of to a random byte…
rLLD342334: lld-link: Set PDB GUID to hash of PDB contents instead of to a random byte…
rL342333: Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID.

Summary

Previously, lld-link would use a random byte sequence as the PDB GUID. Instead, use a hash of the PDB file contents.

Naively computing the hash after the PDB data has been generated is in practice as fast as other approaches I tried. I also tried online-computing the hash as parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also where all the measuring data is) and computing the hash in parallel (https://reviews.llvm.org/D51957). This approach here is simplest, without being slower.

To not disturb llvm-pdbutil pdb2yaml, make the hash generation an opt-in feature on InfoStreamBuilder and let ldb/COFF/PDB.cpp always set it.

Since writing the PDB computes this ID which also goes in the exe, the PDB writing code now must be called before writeBuildId(). writeBuildId() for that reason is no longer included in the "Code Layout" timer.

Since the PDB GUID is now a function of the PDB contents, the PDB Age is always set to 1. There was a long comment above loadExistingBuildId (now gone) about how not changing the GUID and only incrementing the age was important, but according to the discussion in PR35914 that comment was incorrect.

Diff Detail

Repository: rL LLVM

Event Timeline

thakis created this revision.Sep 11 2018, 4:20 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 11 2018, 4:20 PM

This replaces https://reviews.llvm.org/D51957.

Actually, this replaces https://reviews.llvm.org/D51887

rebase, minor cleanups

Please take a look!

ruiu added inline comments.Sep 14 2018, 9:48 AM

lld/COFF/PDB.cpp
133 ↗	(On Diff #165515)	It's return type is void. I'd change the comment or change the return type.
llvm/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp
28 ↗	(On Diff #165515)	As long as you are using a non-crypto hash function, there is a risk of generating the same build id, and the probability is not negligible if you have a lot of executables due to the birthday problem. Is this okay?

thakis added inline comments.Sep 14 2018, 10:27 AM

lld/COFF/PDB.cpp
133 ↗	(On Diff #165515)	Will do.
llvm/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp
28 ↗	(On Diff #165515)	The 8 byte hash still gives decent hash collision resistance for up to 2**32 different pdb files, and since pdbs are keyed by executable name on the symbol server that's per binary. Projects tend to have far fewer revisions than 4 billion. Does that make sense?

improve comment

ruiu added inline comments.Sep 14 2018, 10:39 AM

llvm/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp
28 ↗	(On Diff #165515)	Maybe it is safe. But what could happen if two executables have the same hash? Since xxhash is not cryptographically-safe, you could easily generate two executables having the same ID. Is there any security risks or something caused by that possibility? If the probability is small and the result of hash collision is not that bad, xxhash is probably okay.

thakis added inline comments.Sep 14 2018, 11:25 AM

llvm/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp
28 ↗	(On Diff #165515)	The main use case for this guid is to an executable to its pdb file. The common workflow is that a build server builds an executable and its pdb, then uploads both to a symbol server (under the namespace of the exe, the exe in a subdir containing the exe's pe timestamp and size, and the pdb under the guid). If the executable crashes, it produces a minidump. From the minidump, crash infrastructure can obtain the full executable and the pdb. Since nothing guarantees that the pdb guid is a hash of the pdb data, I can't think of anything where being able to produce a pdb with a given uuid that is an xxhash buys you anything: Since nothing forces the guid to be a hash, you can just produce a pdb and set its guid field to whatever you want anyways.

The symptoms of a collision are just going to be you can’t debug the
program, so not very severe imo, especially since it would almost certainly
be resolved on the next incremental build

msg-23510-146.txt162 BDownload

In D51956#1235313, @llvm-commits wrote:

The symptoms of a collision are just going to be you can’t debug the
program, so not very severe imo, especially since it would almost certainly
be resolved on the next incremental build

Can you explain how it would lead to you not being able to debug the program?

Do you mean for local builds? If so, if two back-to-back builds end up with the same pdb guid in the exe and pdb by chance even though the pdb changes, the debugger should still load the new pdb off disk fine (?)

Do you mean if a build server produces PDBs with the same guid for different builds? If so, that would probably produce an error during pdb upload and make the build fail, not debugging (?)

If you’re uploading to build server, i don’t think it would be an error, it
would either overwrite or not. If it does overwrite, debugging the exe
matching the pdb that was there before wouldn’t work, and if it did not
overwrite debugging the new exe would fail.

That said, my point was mainly that the probability of this being an issue
in practice is negligible

LGTM

It sounds like I worried a bit too much about hash collisions.

This revision is now accepted and ready to land.Sep 14 2018, 1:18 PM

Closed by commit rL342333: Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID. (authored by nico). · Explain WhySep 15 2018, 11:37 AM

This revision was automatically updated to reflect the committed changes.

https://reviews.llvm.org/rLLD342334 too. Thanks!

thakis mentioned this in D89418: [lld-macho] Implement LC_UUID.Nov 16 2020, 3:52 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

DebugInfo/

PDB/

Native/

InfoStreamBuilder.h

11 lines

PDBFileBuilder.h

4 lines

lib/

DebugInfo/

PDB/

Native/

InfoStreamBuilder.cpp

11 lines

PDBFileBuilder.cpp

33 lines

tools/

llvm-pdbutil/

llvm-pdbutil.cpp

7 lines

Diff 165653

llvm/trunk/include/llvm/DebugInfo/PDB/Native/InfoStreamBuilder.h

	Show All 29 Lines

	class InfoStreamBuilder {			class InfoStreamBuilder {
	public:			public:
	InfoStreamBuilder(msf::MSFBuilder &Msf, NamedStreamMap &NamedStreams);			InfoStreamBuilder(msf::MSFBuilder &Msf, NamedStreamMap &NamedStreams);
	InfoStreamBuilder(const InfoStreamBuilder &) = delete;			InfoStreamBuilder(const InfoStreamBuilder &) = delete;
	InfoStreamBuilder &operator=(const InfoStreamBuilder &) = delete;			InfoStreamBuilder &operator=(const InfoStreamBuilder &) = delete;

	void setVersion(PdbRaw_ImplVer V);			void setVersion(PdbRaw_ImplVer V);
				void addFeature(PdbRaw_FeatureSig Sig);

				// If this is true, the PDB contents are hashed and this hash is used as
				// PDB GUID and as Signature. The age is always 1.
				void setHashPDBContentsToGUID(bool B);

				// These only have an effect if hashPDBContentsToGUID() is false.
	void setSignature(uint32_t S);			void setSignature(uint32_t S);
	void setAge(uint32_t A);			void setAge(uint32_t A);
	void setGuid(codeview::GUID G);			void setGuid(codeview::GUID G);
	void addFeature(PdbRaw_FeatureSig Sig);

				bool hashPDBContentsToGUID() const { return HashPDBContentsToGUID; }
	uint32_t getAge() const { return Age; }			uint32_t getAge() const { return Age; }
	codeview::GUID getGuid() const { return Guid; }			codeview::GUID getGuid() const { return Guid; }
	Optional<uint32_t> getSignature() const { return Signature; }			Optional<uint32_t> getSignature() const { return Signature; }

	uint32_t finalize();			uint32_t finalize();

	Error finalizeMsfLayout();			Error finalizeMsfLayout();

	Error commit(const msf::MSFLayout &Layout,			Error commit(const msf::MSFLayout &Layout,
	WritableBinaryStreamRef Buffer) const;			WritableBinaryStreamRef Buffer) const;

	private:			private:
	msf::MSFBuilder &Msf;			msf::MSFBuilder &Msf;

	std::vector<PdbRaw_FeatureSig> Features;			std::vector<PdbRaw_FeatureSig> Features;
	PdbRaw_ImplVer Ver;			PdbRaw_ImplVer Ver;
	uint32_t Age;			uint32_t Age;
	Optional<uint32_t> Signature;			Optional<uint32_t> Signature;
	codeview::GUID Guid;			codeview::GUID Guid;

				bool HashPDBContentsToGUID = false;

	NamedStreamMap &NamedStreams;			NamedStreamMap &NamedStreams;
	};			};
	}			}
	}			}

	#endif			#endif

llvm/trunk/include/llvm/DebugInfo/PDB/Native/PDBFileBuilder.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	public:
msf::MSFBuilder &getMsfBuilder();		msf::MSFBuilder &getMsfBuilder();
InfoStreamBuilder &getInfoBuilder();		InfoStreamBuilder &getInfoBuilder();
DbiStreamBuilder &getDbiBuilder();		DbiStreamBuilder &getDbiBuilder();
TpiStreamBuilder &getTpiBuilder();		TpiStreamBuilder &getTpiBuilder();
TpiStreamBuilder &getIpiBuilder();		TpiStreamBuilder &getIpiBuilder();
PDBStringTableBuilder &getStringTableBuilder();		PDBStringTableBuilder &getStringTableBuilder();
GSIStreamBuilder &getGsiBuilder();		GSIStreamBuilder &getGsiBuilder();

Error commit(StringRef Filename);		// If HashPDBContentsToGUID is true on the InfoStreamBuilder, Guid is filled
		// with the computed PDB GUID on return.
		Error commit(StringRef Filename, codeview::GUID *Guid);

Expected<uint32_t> getNamedStreamIndex(StringRef Name) const;		Expected<uint32_t> getNamedStreamIndex(StringRef Name) const;
Error addNamedStream(StringRef Name, StringRef Data);		Error addNamedStream(StringRef Name, StringRef Data);
void addInjectedSource(StringRef Name, std::unique_ptr<MemoryBuffer> Buffer);		void addInjectedSource(StringRef Name, std::unique_ptr<MemoryBuffer> Buffer);

private:		private:
struct InjectedSourceDescriptor {		struct InjectedSourceDescriptor {
// The full name of the stream that contains the contents of this injected		// The full name of the stream that contains the contents of this injected
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/lib/DebugInfo/PDB/Native/InfoStreamBuilder.cpp

Show All 26 Lines	InfoStreamBuilder::InfoStreamBuilder(msf::MSFBuilder &Msf,
NamedStreamMap &NamedStreams)		NamedStreamMap &NamedStreams)
: Msf(Msf), Ver(PdbRaw_ImplVer::PdbImplVC70), Age(0),		: Msf(Msf), Ver(PdbRaw_ImplVer::PdbImplVC70), Age(0),
NamedStreams(NamedStreams) {		NamedStreams(NamedStreams) {
::memset(&Guid, 0, sizeof(Guid));		::memset(&Guid, 0, sizeof(Guid));
}		}

void InfoStreamBuilder::setVersion(PdbRaw_ImplVer V) { Ver = V; }		void InfoStreamBuilder::setVersion(PdbRaw_ImplVer V) { Ver = V; }

		void InfoStreamBuilder::addFeature(PdbRaw_FeatureSig Sig) {
		Features.push_back(Sig);
		}

		void InfoStreamBuilder::setHashPDBContentsToGUID(bool B) {
		HashPDBContentsToGUID = B;
		}

void InfoStreamBuilder::setAge(uint32_t A) { Age = A; }		void InfoStreamBuilder::setAge(uint32_t A) { Age = A; }

void InfoStreamBuilder::setSignature(uint32_t S) { Signature = S; }		void InfoStreamBuilder::setSignature(uint32_t S) { Signature = S; }

void InfoStreamBuilder::setGuid(GUID G) { Guid = G; }		void InfoStreamBuilder::setGuid(GUID G) { Guid = G; }

void InfoStreamBuilder::addFeature(PdbRaw_FeatureSig Sig) {
Features.push_back(Sig);
}

Error InfoStreamBuilder::finalizeMsfLayout() {		Error InfoStreamBuilder::finalizeMsfLayout() {
uint32_t Length = sizeof(InfoStreamHeader) +		uint32_t Length = sizeof(InfoStreamHeader) +
NamedStreams.calculateSerializedLength() +		NamedStreams.calculateSerializedLength() +
(Features.size() + 1) * sizeof(uint32_t);		(Features.size() + 1) * sizeof(uint32_t);
if (auto EC = Msf.setStreamSize(StreamPDB, Length))		if (auto EC = Msf.setStreamSize(StreamPDB, Length))
return EC;		return EC;
return Error::success();		return Error::success();
Show All 27 Lines

llvm/trunk/lib/DebugInfo/PDB/Native/PDBFileBuilder.cpp

Show All 19 Lines
#include "llvm/DebugInfo/PDB/Native/PDBStringTableBuilder.h"		#include "llvm/DebugInfo/PDB/Native/PDBStringTableBuilder.h"
#include "llvm/DebugInfo/PDB/Native/RawError.h"		#include "llvm/DebugInfo/PDB/Native/RawError.h"
#include "llvm/DebugInfo/PDB/Native/TpiStream.h"		#include "llvm/DebugInfo/PDB/Native/TpiStream.h"
#include "llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h"		#include "llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h"
#include "llvm/Support/BinaryStream.h"		#include "llvm/Support/BinaryStream.h"
#include "llvm/Support/BinaryStreamWriter.h"		#include "llvm/Support/BinaryStreamWriter.h"
#include "llvm/Support/JamCRC.h"		#include "llvm/Support/JamCRC.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
		#include "llvm/Support/xxhash.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::codeview;		using namespace llvm::codeview;
using namespace llvm::msf;		using namespace llvm::msf;
using namespace llvm::pdb;		using namespace llvm::pdb;
using namespace llvm::support;		using namespace llvm::support;

PDBFileBuilder::PDBFileBuilder(BumpPtrAllocator &Allocator)		PDBFileBuilder::PDBFileBuilder(BumpPtrAllocator &Allocator)
▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	auto SourceStream = WritableMappedBlockStream::createIndexedStream(
Layout, MsfBuffer, SN, Allocator);		Layout, MsfBuffer, SN, Allocator);
BinaryStreamWriter SourceWriter(*SourceStream);		BinaryStreamWriter SourceWriter(*SourceStream);
assert(SourceWriter.bytesRemaining() == IS.Content->getBufferSize());		assert(SourceWriter.bytesRemaining() == IS.Content->getBufferSize());
cantFail(SourceWriter.writeBytes(		cantFail(SourceWriter.writeBytes(
arrayRefFromStringRef(IS.Content->getBuffer())));		arrayRefFromStringRef(IS.Content->getBuffer())));
}		}
}		}

Error PDBFileBuilder::commit(StringRef Filename) {		Error PDBFileBuilder::commit(StringRef Filename, codeview::GUID *Guid) {
assert(!Filename.empty());		assert(!Filename.empty());
if (auto EC = finalizeMsfLayout())		if (auto EC = finalizeMsfLayout())
return EC;		return EC;

MSFLayout Layout;		MSFLayout Layout;
auto ExpectedMsfBuffer = Msf->commit(Filename, Layout);		Expected<FileBufferByteStream> ExpectedMsfBuffer =
		Msf->commit(Filename, Layout);
if (!ExpectedMsfBuffer)		if (!ExpectedMsfBuffer)
return ExpectedMsfBuffer.takeError();		return ExpectedMsfBuffer.takeError();
FileBufferByteStream Buffer = std::move(*ExpectedMsfBuffer);		FileBufferByteStream Buffer = std::move(*ExpectedMsfBuffer);

auto ExpectedSN = getNamedStreamIndex("/names");		auto ExpectedSN = getNamedStreamIndex("/names");
if (!ExpectedSN)		if (!ExpectedSN)
return ExpectedSN.takeError();		return ExpectedSN.takeError();

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	uint64_t InfoStreamFileOffset =
blockToOffset(InfoStreamBlocks.front(), Layout.SB->BlockSize);		blockToOffset(InfoStreamBlocks.front(), Layout.SB->BlockSize);
InfoStreamHeader H = reinterpret_cast<InfoStreamHeader >(		InfoStreamHeader H = reinterpret_cast<InfoStreamHeader >(
Buffer.getBufferStart() + InfoStreamFileOffset);		Buffer.getBufferStart() + InfoStreamFileOffset);

commitInjectedSources(Buffer, Layout);		commitInjectedSources(Buffer, Layout);

// Set the build id at the very end, after every other byte of the PDB		// Set the build id at the very end, after every other byte of the PDB
// has been written.		// has been written.
// FIXME: Use a hash of the PDB rather than time(nullptr) for the signature.		if (Info->hashPDBContentsToGUID()) {
		// Compute a hash of all sections of the output file.
		uint64_t Digest =
		xxHash64({Buffer.getBufferStart(), Buffer.getBufferEnd()});

		H->Age = 1;

		memcpy(H->Guid.Guid, &Digest, 8);
		// xxhash only gives us 8 bytes, so put some fixed data in the other half.
		memcpy(H->Guid.Guid + 8, "LLD PDB.", 8);

		// Put the hash in the Signature field too.
		H->Signature = static_cast<uint32_t>(Digest);

		// Return GUID to caller.
		memcpy(Guid, H->Guid.Guid, 16);
		} else {
H->Age = Info->getAge();		H->Age = Info->getAge();
H->Guid = Info->getGuid();		H->Guid = Info->getGuid();
Optional<uint32_t> Sig = Info->getSignature();		Optional<uint32_t> Sig = Info->getSignature();
H->Signature = Sig.hasValue() ? *Sig : time(nullptr);		H->Signature = Sig.hasValue() ? *Sig : time(nullptr);
		}

return Buffer.commit();		return Buffer.commit();
}		}

llvm/trunk/tools/llvm-pdbutil/llvm-pdbutil.cpp

Show First 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	static void yamlToPdb(StringRef Path) {
IpiBuilder.setVersionHeader(Ipi.Version);		IpiBuilder.setVersionHeader(Ipi.Version);
for (const auto &R : Ipi.Records) {		for (const auto &R : Ipi.Records) {
CVType Type = R.toCodeViewRecord(TS);		CVType Type = R.toCodeViewRecord(TS);
IpiBuilder.addTypeRecord(Type.RecordData, None);		IpiBuilder.addTypeRecord(Type.RecordData, None);
}		}

Builder.getStringTableBuilder().setStrings(*Strings.strings());		Builder.getStringTableBuilder().setStrings(*Strings.strings());

ExitOnErr(Builder.commit(opts::yaml2pdb::YamlPdbOutputFile));		codeview::GUID IgnoredOutGuid;
		ExitOnErr(Builder.commit(opts::yaml2pdb::YamlPdbOutputFile, &IgnoredOutGuid));
}		}

static PDBFile &loadPDB(StringRef Path, std::unique_ptr<IPDBSession> &Session) {		static PDBFile &loadPDB(StringRef Path, std::unique_ptr<IPDBSession> &Session) {
ExitOnErr(loadDataForPDB(PDB_ReaderType::Native, Path, Session));		ExitOnErr(loadDataForPDB(PDB_ReaderType::Native, Path, Session));

NativeSession NS = static_cast<NativeSession >(Session.get());		NativeSession NS = static_cast<NativeSession >(Session.get());
return NS->getPDBFile();		return NS->getPDBFile();
}		}
▲ Show 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	static void mergePdbs() {
});		});
Builder.getInfoBuilder().addFeature(PdbRaw_FeatureSig::VC140);		Builder.getInfoBuilder().addFeature(PdbRaw_FeatureSig::VC140);

SmallString<64> OutFile(opts::merge::PdbOutputFile);		SmallString<64> OutFile(opts::merge::PdbOutputFile);
if (OutFile.empty()) {		if (OutFile.empty()) {
OutFile = opts::merge::InputFilenames[0];		OutFile = opts::merge::InputFilenames[0];
llvm::sys::path::replace_extension(OutFile, "merged.pdb");		llvm::sys::path::replace_extension(OutFile, "merged.pdb");
}		}
ExitOnErr(Builder.commit(OutFile));
		codeview::GUID IgnoredOutGuid;
		ExitOnErr(Builder.commit(OutFile, &IgnoredOutGuid));
}		}

static void explain() {		static void explain() {
std::unique_ptr<IPDBSession> Session;		std::unique_ptr<IPDBSession> Session;
InputFile IF =		InputFile IF =
ExitOnErr(InputFile::open(opts::explain::InputFilename.front(), true));		ExitOnErr(InputFile::open(opts::explain::InputFilename.front(), true));

for (uint64_t Off : opts::explain::Offsets) {		for (uint64_t Off : opts::explain::Offsets) {
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines