Use of this flag causes LLD to try to use the .debug$H for each input object file in which it is present. The original code path (e.g. /DEBUG) is unchanged and still uses non-cryptographic, non-tree hashes that are computed entirely by the linker.
Some preliminary timings shows modest performance gains on real world programs. For example, on my machine linking clang with /DEBUG takes about 23 seconds, and with /DEBUG:GHASH takes about 17 seconds. So this is a 1.35x speedup. That said, this implementation is completely unoptimized. I haven't profiled to find out what the current bottlenecks are, and it's possible we are doing something terribly simple that could lead to further gains.
Note that this implementation does not compute missing hashes in parallel. That's one avenue for exploration, but I put some debugging metrics in, and found that when linking clang, we only computed about 31,000 SHA1 hashes serially. So, there is not likely a significant performance win to be had from computing missing hashes in parallel, and it also suggests there is not a big win to be had from post-processing system libraries to produce a .debug$H section. Just having a large enough application that the number of application defined types dominates the number of system defined types is sufficient to get good performance.
On the other hand, if we don't compute hashes in parallel, it does lead to poor worst case performance. I also tried using /DEBUG:GHASH when none of the object files contained a .debug$H section. In that case, we were twice as slow as /DEBUG. This is consistent with my earlier findings that swapping the hash algorithm from CityHash to SHA1 and makign no other changes led to a 2x slowdown.