For now we only support diffing the MSF Super Block and the Stream Directory. Subsequent patches will drill down into individual streams.
Details
Diff Detail
Event Timeline
Compared to dumping two files using pdb2yaml and diffing them, what is the advantage of this tool?
llvm/tools/llvm-pdbdump/Diff.cpp | ||
---|---|---|
103 | No newline at end? |
Well, as an extreme (but obviously hypothetical) example, consider the case where pdb2yaml outputs a constant yaml file no matter what input it gets. The diff would always be the same, but the underlying pdbs could be entirely unrelated.
This is obviously contrived, but the point is that if there are bugs in tool X, it doesn't make sense to use the output of tool X to verify that tool X is working correctly.
Beyond that though, pdb2yaml is only intended to give you a textual representation of the contents. It's useful to be able to understand how 2 PDB files differ at the byte level. It's entirely possible that two PDBs which differ in subtle ways could produce the same YAML. Also, even with a 50MB pdb, pdb2yaml takes an extremely long time to run. If we wanted to do it on something the order of chrome.dll.pdb which is maybe 1GB, it would be impractical.
Minor updates.
As one more point about "why not just diff the YAML", I just ran pdb2yaml and yaml2pdb on a random PDB on my drive. Then I ran llvm-pdbdump diff on it. With just the code in this patch, I get the following output:
Found 8 common streams. Searching for intra-stream differences. DBI Stream (Left: 283128 bytes, Right: 160570 bytes) Named Stream "/names" (Left: 58244 bytes, Right: 55870 bytes) Old MSF Directory (Left: 40 bytes, Right: 0 bytes) PDB Stream (Left: 118 bytes, Right: 114 bytes) TPI Stream (Left: 382784 bytes, Right: 382632 bytes)
Note that the TPI stream on the left (the original file) is 152 bytes larger. If I use pdb2yaml again on the new file and diff their contents, nothing appears different about the TPI stream. So we need other tools to figure this kind of thing out.
It's also possible to have two PDBs with equivalent type streams but different type index numberings. This diffing tool would make it possible to look past those superficial differences.
llvm/tools/llvm-pdbdump/Diff.cpp | ||
---|---|---|
117 | Should you report if P.size() != Q.size()? | |
122 | Should that be !=? (Next line as well.) | |
150 | Is the Comparator necessary? Isn't the default comparator std::less, which should do exactly the same thing? | |
166 | In practice, I suspect .data() will work, but .begin() would seem more principled, since these algorithms work with pairs of iterators. The fact that SmallVector iterators are pointers (as returned by .data()) is an implementation detail I wouldn't rely on. | |
llvm/tools/llvm-pdbdump/StreamUtil.h | ||
20 | Could you add a comment clarifying what this method does? |
llvm/tools/llvm-pdbdump/Diff.cpp | ||
---|---|---|
117 | That should get reported indirectly (because we will just report that one file has streams not present in the other file). | |
122 | Yes, fixed in latest diff. | |
150 | PI is the value type of the enumerator range, which in this case is something like result_pair<SmallString<32>>. Even if that weren't the case and it were just std::pair<size_t, std::string> though, this comparator is specifically designed to only look at the second item of the pair (the name) and not the index. | |
166 | begin() would actually be wrong here. I've only called reserve, not resize, so size() == 0. SmallVector<> has this optimization so that you can copy objects into it without default constructing them first, and then calling set_size() (which is different than resize()) after you write into its internal buffer. |
llvm/tools/llvm-pdbdump/Diff.cpp | ||
---|---|---|
166 | I understand you've only reserved space using this SmallVector feature. I would expect begin on an empty SmallVector with a reservation to return an iterator at the beginning of the buffer and for end to return the same thing. In fact, the implementation of SmallVector::data() is to return begin() coerced to the container's pointer type (which happens to be identical to its iterator type). /// Return a pointer to the vector's buffer, even if empty(). pointer data() { return pointer(begin()); } I normally wouldn't go diving into a container's implementation details, but it's the only place I can find that documents this feature of using the reserved buffer of an empty container. Mixing pointers and iterators of a SmallVector works because they're interchangeable, but it seems to make this code fragile if someone were ever to come along and select a different container type, where the same effect could be achieved with a reserve and a back_insert_iterator. |
No newline at end?