This is an archive of the discontinued LLVM Phabricator instance.

[llvm-pdbdump] Add the ability to merge PDBs
Needs ReviewPublic

Authored by zturner on May 18 2017, 10:11 AM.

Details

Reviewers
rnk
inglorion
ruiu
Summary

By adding a new merge subcommand, we can now give arbitrarily many PDBs to llvm-pdbdump as inputs and get back one PDB. Currently it only merges type streams and ignores everything else. In the future it can merge symbol streams, module information, and other types of stuff.

Having this command allows us to write a test for type merging, which is also done in this patch.

Diff Detail

Event Timeline

zturner created this revision.May 18 2017, 10:11 AM
zturner updated this revision to Diff 99459.May 18 2017, 10:34 AM

Added a slightly more complicated test case involving two identical procedure types.

inglorion edited edge metadata.May 18 2017, 11:19 AM

I feel like llvm-pdbdump already does a ton of things. Having a tool to merge pdbs seems useful, but I'm not sure llvm-pdbdump should be that tool. If we want all that functionality in a single tool I think we should at least rename it to something like llvm-pdbtool (because we're not really dumping pdbs now), but I would actually prefer having this be a separate tool.

What would be the advantage of having it be a separate tool? I kind of like the fact that I don't have to remember which tool to use when I want something, which kind of solves the problem of tools with overlapping responsibilities. For example, I can never remember whether to use llvm-objdump or llvm-readobj. Seems to me they should be the same tool. Like "if you want to do something to a PDB, use llvm-pdbdump". That said, I do agree it does more than just dumping now, so perhaps a new name is in order soon. I wouldn't want to mix a change like that in with other changes though.

What would be the advantage of having it be a separate tool?

Partially the usual "less code, so easier to understand, faster to compile, fewer opportunities to create strong coupling".

I also seem to have the opposite problem when figuring out how to do something. Instead of struggling to remember which of the tools I know exist I need to use, I struggle to discover that a tool exists at all (I can create pdbs from YAML?) or struggle to remember what the subcommand is called (ls or autocomplete will tell me what the name of the binary is, but I then need to run it to figure out if a subcommand exists and how it's spelled).

I kind of like the fact that I don't have to remember which tool to use when I want something, which kind of solves the problem of tools with overlapping responsibilities. For example, I can never remember whether to use llvm-objdump or llvm-readobj.

In this particular example, I suspect that one is inspired by objdump and one is inspired by readobj. Those do indeed largely do the same things, so it's unfortunate that there are two. On the other hand, things like inspecting pdbs and merging pdbs seem pretty distinct to me.

Seems to me they should be the same tool. Like "if you want to do something to a PDB, use llvm-pdbdump". That said, I do agree it does more than just dumping now, so perhaps a new name is in order soon. I wouldn't want to mix a change like that in with other changes though.

Agreed that if we're going to rename the tool, that should go in a separate change.

Then again, I don't feel strongly enough about my preference for separate tools to really hold this up. I'm ok with just renaming this to llvm-pdbtool or some such. The code looks fine to me apart from the thing I mentioned inline.

llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
1019

The other subcommands take function arguments, instead of accessing the globals. Can we do that here as well?

zturner added inline comments.May 18 2017, 4:01 PM
llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
1019

Actually it's a little bit of both. There are so many options that it's impractical to pass the entire set of options to every subcommand's implementation, and they frequently access the globals. In this instance I did it this way because passing a reference to a cl::list<std::string> felt a little awkward. In the other cases we don't have this problem because there are a fixed number of input files, but in this case there can be arbitrarily many.

inglorion added inline comments.May 19 2017, 11:35 AM
llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
1019

You're right, the others use a combination of arguments and globals. I'd still prefer it if you made this one an argument. I agree that passing a reference to a cl::list feels a little awkward. How about an ArrayRef? IIRC, you can get those out of cl::lists using the operator ArrayRef.

zturner added inline comments.May 19 2017, 11:40 AM
llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
1019

Didn't realize there was an operator ArrayRef. If that's the case, I agree that seems like a good solution.