This is an archive of the discontinued LLVM Phabricator instance.

Improve consistency and flexibility with llvm-pdbdump subcommands
ClosedPublic

Authored by zturner on Jun 7 2017, 9:21 AM.

Details

Summary

We had the pdb2yaml and raw options which each dumped pdb in a different way. But many of the options were effectively the same, but spelled differently. For example, we had:

    pdb2yaml            |     raw
---------------------------------------
-dbi-module-syms        | -module-syms
-dbi-module-info        | -modules
-dbi-module-source-info | -module-files
-dbi-module-lines	     | -line-info

Furthermore, they didn't appear in the same categories in their respective subcommands, had different help text, stored the results in different variables, etc. So this patch aims to fix all of that by having one set of variables that are shared among subcommands.

At the same time, we need to improve the granularity by which we can dump module subsections. Previously, line-info and dbi-module-lines really means "dump all the subsections of each module's debug stream". But some of those subsections are actually not about lines at all. There's Frame Data, Checksums, even symbols can show up. What we'd really like is a way to specify exactly which subsections are dumped. So the -line-info is changed to an enum value that looks like this:

-subsections              - dump subsections from each module's debug stream
  =cme                    -   Cross module exports (DEBUG_S_CROSSSCOPEEXPORTS subsection)
  =cmi                    -   Cross module imports (DEBUG_S_CROSSSCOPEIMPORTS subsection)
  =fc                     -   File checksums (DEBUG_S_CHECKSUMS subsection)
  =ilines                 -   Inlinee lines (DEBUG_S_INLINEELINES subsection)
  =lines                  -   Lines (DEBUG_S_LINES subsection)
  =all                    -   All known subsections

and you can specify one or more of these values separated by commas. This makes it easy to add support for other subsections in future patches. Another benefit is that it allows us to consolidate some of the tests. I was able to convert 3 different tests which did mostly the same thing but with different subsections into a single test which tests all subsections at once.

Diff Detail

Event Timeline

zturner created this revision.Jun 7 2017, 9:21 AM
inglorion edited edge metadata.Jun 8 2017, 10:23 AM

I would like to change the changes to these option names to coincide with renaming llvm-pdbdump to llvm-pdbtool (or some such) so that we only break the interface once.

While we're at it - how much value is there in having both yaml and raw that is almost the same, but with slightly different syntax? I've added some code to dump things in raw format before we understood what they meant, but I think now that we can dump pretty much everything as YAML, we should consider removing raw. Again, that breaks the API, so the introduction of llvm-pdbtool seems like a good time to do it.

llvm/test/DebugInfo/PDB/Inputs/debug-subsections.yaml
1

Why does this change?

Oops, I submitted that before I was done reviewing. More comments probably coming soon.

I don't mind submitting it as a followup, but I think it's a bad idea to submit it at the same time. While you're right that it only breaks the interface once to do it together, I don't think anyone is really depending on the interface that much, and changing the name is already going to be a rather huge change that is like 100% boilerplate

llvm/test/DebugInfo/PDB/Inputs/debug-subsections.yaml
1

It shouldn't, that looks like a copy paste error.

Thanks for doing this, zturner! I have some more inline comments and questions.

llvm/test/DebugInfo/PDB/pdbdump-readwrite.test
1–2

While thinking about this: What does -dbi-stream mean? Is it useful to specify -dbi-stream without any of the additional enabling flags like -modules?

Also, should we at some point change this test to use -all instead of the various individual options?

llvm/tools/llvm-pdbdump/YAMLOutputStyle.cpp
40–41

Please remove the commented-out code (also further down).

96–97

Since you're changing this - would it make sense to rewrite this

if (!opts::pdb2yaml::StringTable && !opts::shared::DumpModuleFiles && opts::shared::DumpModuleSubsections.empty()))
  return Error::success();

?

llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
887

Can you make a function that enables all shared options and call that from the places where we handle -all for pdb2yaml and raw? That way, the code that handles -all for pdb2yaml is all in one place, and the code that handles -all for raw is all in one place, instead of both being partially here and partially where they are now.

Hmm, good point. I hadn't considered that changing the name in the same change would mix a lot of mechanical changes with actual interesting to review changes. Let's keep them in two separate changes, then. Can you land them close together in time, though?

While we're at it - how much value is there in having both yaml and raw that is almost the same, but with slightly different syntax? I've added some code to dump things in raw format before we understood what they meant, but I think now that we can dump pretty much everything as YAML, we should consider removing raw.

I've been thinking about what to do here, and I think there is a better option. We should change raw mode to be more concise. raw was originally supposed to be for human consumption while yaml was for machine consumption. But they are equally verbose, and there's no clear argument that raw is more readable than yaml. It might even be worse. So what I've been thinking about is changing raw to be very compressed. For example, instead of this:

D:\foo> llvm-pdbdump raw -all foo.pdb
FileHeaders {
  BlockSize: 4096
  FreeBlockMap: 2
  NumBlocks: 1145
  NumDirectoryBytes: 5484
  Unknown1: 0
  BlockMapAddr: 1143
  NumDirectoryBlocks: 2
  DirectoryBlocks: [1141, 1142]
  NumStreams: 236
}
Streams [
  Stream 0: [Old MSF Directory] (40 bytes)
  Stream 1: [PDB Stream] (118 bytes)
  Stream 2: [TPI Stream] (407284 bytes)
  Stream 3: [DBI Stream] (210538 bytes)
  Stream 4: [IPI Stream] (496844 bytes)
  Stream 5: [Named Stream "/LinkInfo"] (0 bytes)
...
PDB Stream {
  Version: 20000404
  Signature: 0x59303684
  Age: 2
  Guid: {6EE2E173-08F2-BF4D-AA11-4734E99F4974}
  Features: 0x1
  Named Streams {
    /names: 6
    /LinkInfo: 5
    /src/headerblock: 233
  }
}
Type Info Stream (TPI) {
  TPI Version: 20040203
  Record count: 8478
  Records [
    {
      ArgList (0x1000) {
        TypeLeafKind: LF_ARGLIST (0x1201)
        NumArgs: 0
        Arguments [
        ]
      }
    }
    {
      Procedure (0x1001) {
        TypeLeafKind: LF_PROCEDURE (0x1008)
        ReturnType: void (0x3)
        CallingConvention: NearC (0x0)
        FunctionOptions [ (0x0)
        ]
        NumParameters: 0
        ArgListType: () (0x1000)
      }
    }

We print this:

D:\foo> llvm-pdbdump raw -all foo.pdb
MSF [BlockSize=4096, FreeBlockMap=2, NumBlocks=1145, DirSize=5484, BlockMapAddr=1143, NumDirBlocks=2, Streams=236]

Streams [
  Stream 0: [Old MSF Directory] (40 bytes)
  Stream 1: [PDB Stream] (118 bytes)
  Stream 2: [TPI Stream] (407284 bytes)
  Stream 3: [DBI Stream] (210538 bytes)
  Stream 4: [IPI Stream] (496844 bytes)
  Stream 5: [Named Stream "/LinkInfo"] (0 bytes)
  ...
]

PDB Stream [Ver: 20000404, Sig: 0x59303684, Age: 2, Guid: {6EE2E173-08F2-BF4D-AA11-4734E99F4974}, Features: 0x1]
  Named Streams {
    /names: 6
    /LinkInfo: 5
    /src/headerblock: 233
  }
  
TPI Stream [Ver: 20040203]
  Records [
    TI: 0x1000 - LF_ARGLIST (0x1201) {}
    TI: 0x1001 - LF_PROCEDURE (0x1001) { Ret: void (0x03), Conv: NearC (0x0), Options: 0x0, Args: () (0x1000)}
    ...
  ]

or some such.

Hmm, good point. I hadn't considered that changing the name in the same change would mix a lot of mechanical changes with actual interesting to review changes. Let's keep them in two separate changes, then. Can you land them close together in time, though?

I can try. I have about 5 changes in the pipeline, and I need to get those in first or I will have to do a ton of work to rewrite them after changing the tool name. But I can do it after I land those changes.

zturner added inline comments.Jun 8 2017, 11:29 AM
llvm/test/DebugInfo/PDB/pdbdump-readwrite.test
1–2

If you specify -dbi-stream with nothing else it just dumps the header fields. Perhaps a better way to organize all these options is to make them all by enum lists like I did with -subsections, then you could write: -dbi=headers,modules,files,etc. Another option would be to make it so we don't do the hierarchical display at all, but instead make each option independent. Then if you specified -dbi-headers -modules -files you'd get something like:

DBI Headers
--------------
...

Modules
--------------
...

Files
--------------

It's a little bit easier to do things from this method, but it's a little more separated from the underlying file structure which is kind of disadvantageous.

zturner updated this revision to Diff 101950.Jun 8 2017, 11:32 AM

Updated with suggestions.

inglorion added inline comments.Jun 8 2017, 2:44 PM
llvm/tools/llvm-pdbdump/YAMLOutputStyle.cpp
40–41

Looks like there's still a line of commented-out code.

llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
901

That takes care of some of the scattering of the code, but most of the code for pdb2yaml::All is still in YAMLOutput.cpp. Did you intend to move that here, too?

zturner marked an inline comment as done.Jun 8 2017, 2:47 PM
zturner added inline comments.
llvm/tools/llvm-pdbdump/YAMLOutputStyle.cpp
40–41

Removed offline.

llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
901

No because none of that is related to the shared options. You only mentioned handling the shared options in a common place.

inglorion added inline comments.Jun 8 2017, 3:09 PM
llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
901

Apologies if I did not express myself clearly. What I meant is that the implementation of -all is scattered. For raw, we have this block here, and the block a few lines down under if (opts::RawSubcommand). For pdb2yaml, we have this block here, and the code in YAMLOutputStyle::dump in YAMLOutputStyle.cpp. So there are two places where we check opts::raw::RawAll and do something, and two places where we check opts::pdb2yaml::All and do something. My proposal was intended to bring that down to one place for opts::raw::RawAll and one for opts::pdb2yaml::All.

Alternatively, I would also be ok if you moved all the code for handing both opts::raw::RawAll and opts::pdb2yaml::All here. That's what I thought you were doing, but I see now that it wasn't.

I have a slight preference for the first way, but the second way is fine with me, too. The main thing I want is to have neither pdb2yaml::All or raw::RawAll broken up into pieces that are far apart.

zturner marked an inline comment as done.Jun 8 2017, 3:12 PM
zturner added inline comments.
llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
901

Ok, I can do that. I think the second way might be easier, just because pdb2yaml All and raw All have some subtle differences.

In the future, if I go forwad with the idea to change raw to be more compressed for human readability, then we can abandon the requirement for the output to map closely to the file format (since we have pdb2yaml for that), and then a lot of the option decisions can be simplified I think.

zturner updated this revision to Diff 101970.Jun 8 2017, 3:41 PM
zturner updated this revision to Diff 101971.

So the lines being deleted thing is happening when one patch uses full context and the next doesn't. Re-upped with full context

inglorion accepted this revision.Jun 8 2017, 4:03 PM

Thanks! lgtm

This revision is now accepted and ready to land.Jun 8 2017, 4:03 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/DebugInfo/PDB/pdbdump-write.test