We're getting to the point that some MS tools (e.g. DIA) can recognize our PDBs but others (e.g. link.exe) cannot. I think the way forward is to improve our tooling to help us find differences more easily. For example, if we can compile the same program with clang-cl and cl and have a tool tell us all the places where the PDBs differ, this could tell us what we're doing wrong. It's tricky though, because there are a lot of "benign" differences in a PDB. For example, if the string table in one PDB consists of "foo" followed by "bar" and in the other PDB it consists of "bar" followed by "foo", this is not necessarily a critical difference, as long as the uses of these strings also refer to the correct location. On the other hand, if the second PDB doesn't even contain the string "foo" at all, this is a critical difference.
diff mode has been in llvm-pdbutil for quite a while, but because of the above challenge along with some others, it's been hard to make it useful.  I think this patch addresses that.  It looks for all the same things, but it now prints the output in tabular format (carefully formatted and aligned into tables and fields), and it highlights critical differences in red, non-critical differences in yellow, and identical fields in green.  Example:
This makes it easy to spot the places we differ, and the general concept of outputting arbitrary fields in tabular format can be extended to provide analysis into many of the different types of information that show up in a PDB.
I don't quite get this. Basically we have a comparator which says "if they're different, no big deal, they're actually equivalent". Is that more or less the idea? Maybe instead of asking the caller to pass a comparator we should have a different printEquivalent method? It seems shorter and less magical.