This is an archive of the discontinued LLVM Phabricator instance.

lld-link: Set PDB GUID to hash of PDB contents instead of to a random byte sequence.
AbandonedPublic

Authored by thakis on Sep 10 2018, 12:49 PM.

Details

Reviewers
zturner
Summary

This is not quite ready for real review. I still need to collect performance data of this, and of the more naive approach that just hashes in a second pass after writing the file.

So feel free to ignore for now, but I figured I'd upload what I have. It's probably done enough that I can use it for benchmarking.

Actual patch description:

Previously, lld-link would use a random byte sequence as the PDB GUID. Instead, use a hash of the PDB file contents.

To compute it, introduce HashingFileBufferByteStream that computes a running hash of the bytes it writes.

Since we already use xxhash, add its streaming parts to llvm//Support/xxhash. xxhash gives only 8 bytes of content hash, so put a fixed string in the other 8 bytes available in the PDB GUID.

To not disturb llvm-pdbutil pdb2yaml, make the hash generation an opt-in feature on InfoStreamBuilder and let ldb/COFF/PDB.cpp always set it.

Since writing the PDB computes this ID which also goes in the exe, the PDB writing code now must be called before writeBuildId(). writeBuildId() for that reason is no longer included in the "Code Layout" timer.

Since the PDB GUID is now a function of the PDB contents, the PDB Age is always set to 1. There was a long comment above loadExistingBuildId (now gone) about how not changing the GUID and only incrementing the age was important, but according to the discussion in PR35914 that comment was incorrect.

Diff Detail

Event Timeline

thakis created this revision.Sep 10 2018, 12:49 PM
ruiu added a comment.Sep 10 2018, 1:28 PM

I wonder if you can just compute a hash for a mmap'ed PDB file instead of defining a new type of Stream. That's what we do for ELF when computing a build-id, and I found that's easy to do. It is also easy to parallelize hash computation by making it a tree hash.

rnk added a comment.Sep 10 2018, 4:22 PM

I wonder if you can just compute a hash for a mmap'ed PDB file instead of defining a new type of Stream. That's what we do for ELF when computing a build-id, and I found that's easy to do. It is also easy to parallelize hash computation by making it a tree hash.

If we use a streaming interface to write the file in the first place, hashing as we write will probably provide better temporal locality to the data being hashed. If the hash computation is cheap, fetching the PDB data from RAM into cache will dominate. It's hard to say whether parallelizing the hash will be more of a win than the locality of doing it in the streaming interface, but given that it's not clearly better, I would lean towards doing this because it's convenient.

I wonder if you can just compute a hash for a mmap'ed PDB file instead of defining a new type of Stream. That's what we do for ELF when computing a build-id, and I found that's easy to do. It is also easy to parallelize hash computation by making it a tree hash.

I considered this, but I figured that's much worse cache-ing wise. Data passed to the stream is in memory, while walking the file again will likely cause lots of page faults, given how bad Windows's file system cache is. Since the new stream turned out to be very little code, I'm pretty happy with this approach.

If you want, I can prototype the other version too and see how much slower it is and how the code looks there. Most of the patch will stay the same in both cases.

ruiu added a comment.Sep 10 2018, 4:29 PM

Sequential file access might be fast enough, so I don't know which is better performance-wise. In such situation, I'd probably try to implement an easier one (hashing a mmap'ed buffer) to see if it is satisfactory, or try to implement both to compare if not, to keep it as simple as possible.

rnk added a comment.Sep 10 2018, 4:32 PM

Sequential file access might be fast enough, so I don't know which is better performance-wise. In such situation, I'd probably try to implement an easier one (hashing a mmap'ed buffer) to see if it is satisfactory, or try to implement both to compare if not, to keep it as simple as possible.

I guess we have different ideas about what is simple. :)

ruiu added a comment.Sep 10 2018, 4:39 PM

Oh, I read "this" as my suggestion. :)

But computing a hash of a continuous region in memory is much easier than defining a new Stream, no? If you take a look at the code in ELF that computes a build-id, I think you'll notice that's pretty short. It also eliminates the need to modify xxhash.h.

(Landed r341945 to address the 2 differing bytes that caused the guid for rsds.test to be different. It's still different due to different /pdbaltpath: flags, but the pdb contents are now identical except for guid and timestamp. I don't understand yet how /pdbaltpath: makes it into the hash, but for the same /pdbaltpath: it seems to be deterministic now. Looking more…)

(I don't understand yet how /pdbaltpath: makes it into the hash)

(That's because EnvBlockSym's cmd stores the the linker args under the "cmd" entry, so all flags appear in the pdb. So rsds.test can't use different /pdbaltpath:s if it expects the guids to match. All that's left now is to collect perf data with both approaches.)

thakis updated this revision to Diff 164875.Sep 11 2018, 7:34 AM
thakis edited the summary of this revision. (Show Details)

tests pass

Did some measurements on my linux box (didn't have access to my win box). There, just naively calling xxHash64 on the 1.4GB chrome.dll.pdb file after creating it, doing the parallel xxHas64 computation, this patch, and the old nondeterministic guid code all take about the same amount of time (around 32s, min-of-4 links with each approach is within 0.2s of that). I'll try this on Windows tomorrow.

(Naive: https://reviews.llvm.org/D51956 Parallel: https://reviews.llvm.org/D51957)

ruiu added a comment.Sep 12 2018, 9:14 AM

Did some measurements on my linux box (didn't have access to my win box). There, just naively calling xxHash64 on the 1.4GB chrome.dll.pdb file after creating it, doing the parallel xxHas64 computation, this patch, and the old nondeterministic guid code all take about the same amount of time (around 32s, min-of-4 links with each approach is within 0.2s of that). I'll try this on Windows tomorrow.

If it is so cheap to compute a hash, maybe we should use MD5 instead of xxhash to eliminate the possibility of hash collision?

Did some measurements on my linux box (didn't have access to my win box). There, just naively calling xxHash64 on the 1.4GB chrome.dll.pdb file after creating it, doing the parallel xxHas64 computation, this patch, and the old nondeterministic guid code all take about the same amount of time (around 32s, min-of-4 links with each approach is within 0.2s of that). I'll try this on Windows tomorrow.

(Naive: https://reviews.llvm.org/D51956 Parallel: https://reviews.llvm.org/D51957)

I'm a little surprised it's this fast. Are you sure it's even doing anything? What kind of hard drive are you using? A SATA III bus interface, which is still probably the single most common interface used by SSDs, has a theoretical maximum transfer rate of 600MB/s, so you'd be adding at least 2.5 seconds to the link, and that's under optimal circumstances. A SATA II interface, which is also still common enough, is 300MB/s. nVME and PCIe interfaces can reach gigabytes / second but we definitely shoudln't be basing measurements off of those. I don't really have a super strong opinion, but it's something to think about.

ruiu added a comment.Sep 12 2018, 9:58 AM

When you are reading back a file that you just created, that read !access doesn't actually hit the disk, no?

Not if it’s small, but a 1.4GB file I would imagine could not fit in cache.
Still though, it’s a good point

Did some measurements on my linux box (didn't have access to my win box). There, just naively calling xxHash64 on the 1.4GB chrome.dll.pdb file after creating it, doing the parallel xxHas64 computation, this patch, and the old nondeterministic guid code all take about the same amount of time (around 32s, min-of-4 links with each approach is within 0.2s of that). I'll try this on Windows tomorrow.

(Naive: https://reviews.llvm.org/D51956 Parallel: https://reviews.llvm.org/D51957)

I'm a little surprised it's this fast. Are you sure it's even doing anything? What kind of hard drive are you using? A SATA III bus interface, which is still probably the single most common interface used by SSDs, has a theoretical maximum transfer rate of 600MB/s, so you'd be adding at least 2.5 seconds to the link, and that's under optimal circumstances. A SATA II interface, which is also still common enough, is 300MB/s. nVME and PCIe interfaces can reach gigabytes / second but we definitely shoudln't be basing measurements off of those. I don't really have a super strong opinion, but it's something to think about.

I'm surprised too. I did check that the guid in the pdb was computed by the new algorithm though, and that the "size" var in front of the xxhash64 call contains 1.4GB. The linux disk cache is pretty good though, so I'm guessing it was all in memory. I'm doing Windows next, where I expect the disk cache to be worse.

The situation on Windows is like on Linux. I built chrome.dll with our pinned lld (which was compiled with clang), with a locally-built lld (built by msvc 2017), and then with this patch and the other two patches linked from here. Min-of-5 times are comparable (full data in parens):

pinned lld, chromedll: 41.2s (42.0s, 45.0s, 41.9s, 43.5s, 41.2s)
trunk lld, chrome.dll: 48.1s (48.3, 48.1, 49.0, 51.0, 49.3)
patched lld, chrome.dll: 48.6s (48.6s, 50.7s, 51.1s, 48.8s, 48.8s)
patched lld, naive hash, chrome.dll: 47.4s (48.8s, 51.0s, 51.5s, 51.0s, 47.4s)
patched lld, parallel hash, chrome.dll: 46.7s (50.2s, 47.9s, 47.8s, 46.7s, 48.8s)

After looking at this data again, the "naive hash" line looked a bit slower than the rest so i re-ran those 5 links and got

patched lld, naive hash, chrome.dll, again: 45.8s (46.9s, 45.8s, 46.1s, 45.8s, 48.5s)

So measurement noise is pretty high :-/ Pinned lld is consistently faster (due to being built by clang), but how exactly the hash is computed doesn't matter. So I suppose https://reviews.llvm.org/D51956 is the way to go. I'll give that a real patch description and send it out.

To get the time, I first built all of chrome_dll, then deleted chrome.dll and got the build target like so: ninja -C out\gn chrome_dll ; del out\gn\chrome.dll; ninja -C out\gn chrome_dll -v -d keeprsp

Then I ran the link command with my locally-built lld like so: C:\src\chrome\src\out\gn>..\..\..\..\tim\tim ninja -t msvc -e environment.x64 -- ../../../../llvm-mono/out/gn/bin/lld-link.exe /nologo /IMPLIB:./chrome.dll.lib /DLL /OUT:./chrome.dll /PDB:./chrome.dll.pdb @./chrome.dll.rsp

(tim is https://github.com/sgraham/tim)

(re md5: Let's do that in a follow-up since it needs re-measuring and will be a single-file change then. The 8 byte hash still gives decent hash collision resistance for up to 2**32 different pdb files, and snce pdbs are keyed by executable name on the symbol server that's per binary. Projects tend to have far fewer revisions than 4 billion.)

ruiu added a comment.Sep 14 2018, 1:30 PM

Thank you for sharing the result of the benchmark!

thakis abandoned this revision.Sep 15 2018, 4:42 PM