This is an archive of the discontinued LLVM Phabricator instance.

[pdb] Add -type-stats and sort stats by descending size
ClosedPublic

Authored by rnk on Mar 21 2019, 3:14 PM.

Details

Summary

It prints this on chromium browser_tests.exe.pdb:

Types
         Total: 5647475 entries ( 371,897,512 bytes,   65.85 avg)
--------------------------------------------------------------------------
      LF_CLASS:  397894 entries ( 119,537,780 bytes,  300.43 avg)
  LF_STRUCTURE:  236351 entries (  83,208,084 bytes,  352.05 avg)
  LF_FIELDLIST:  291003 entries (  66,087,920 bytes,  227.10 avg)
  LF_MFUNCTION: 1884176 entries (  52,756,928 bytes,   28.00 avg)
    LF_POINTER: 1149030 entries (  13,877,344 bytes,   12.08 avg)
    LF_ARGLIST:  789980 entries (  12,436,752 bytes,   15.74 avg)
 LF_METHODLIST:  361498 entries (   8,351,008 bytes,   23.10 avg)
       LF_ENUM:   16069 entries (   6,108,340 bytes,  380.13 avg)
  LF_PROCEDURE:  269374 entries (   4,309,984 bytes,   16.00 avg)
   LF_MODIFIER:  235602 entries (   2,827,224 bytes,   12.00 avg)
      LF_UNION:    9131 entries (   2,072,168 bytes,  226.94 avg)
    LF_VFTABLE:     323 entries (     207,784 bytes,  643.29 avg)
      LF_ARRAY:    6639 entries (     106,380 bytes,   16.02 avg)
    LF_VTSHAPE:     126 entries (       6,472 bytes,   51.37 avg)
   LF_BITFIELD:     278 entries (       3,336 bytes,   12.00 avg)
      LF_LABEL:       1 entries (           8 bytes,    8.00 avg)

The PDB is overall 1.9GB, so the LF_CLASS and LF_STRUCTURE declarations
account for about 10% of the overall file size. I was surprised to find
that on average LF_FIELDLIST records are short. Maybe this is because
there are many more types with short member lists than there are
instantiations with lots of members, like std::vector.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Mar 21 2019, 3:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2019, 3:14 PM

This is what I'm seeing for a large PDB (2 GB). Things are a bit different from your use-case:

                     Type Record Stats
============================================================

  Types
           Total: 7050382 entries ( 536,798,320 bytes,   76.14 avg)
  --------------------------------------------------------------------------
    LF_FIELDLIST:  369709 entries ( 116,277,336 bytes,  314.51 avg)
      LF_VFTABLE:   48428 entries ( 106,221,244 bytes, 2193.38 avg)
    LF_STRUCTURE:  502958 entries (  92,283,348 bytes,  183.48 avg)
    LF_MFUNCTION: 2699622 entries (  75,589,416 bytes,   28.00 avg)
        LF_CLASS:  323609 entries (  72,441,564 bytes,  223.86 avg)
         LF_ENUM:  122798 entries (  24,322,364 bytes,  198.07 avg)
      LF_POINTER: 1262870 entries (  15,468,832 bytes,   12.25 avg)
      LF_ARGLIST:  794482 entries (  12,368,496 bytes,   15.57 avg)
   LF_METHODLIST:  462337 entries (  11,367,124 bytes,   24.59 avg)
        LF_UNION:   29152 entries (   4,260,840 bytes,  146.16 avg)
    LF_PROCEDURE:  236117 entries (   3,777,872 bytes,   16.00 avg)
     LF_MODIFIER:  189660 entries (   2,275,920 bytes,   12.00 avg)
        LF_ARRAY:    7766 entries (     124,912 bytes,   16.08 avg)
      LF_VTSHAPE:     175 entries (      10,668 bytes,   60.96 avg)
     LF_BITFIELD:     698 entries (       8,376 bytes,   12.00 avg)
        LF_LABEL:       1 entries (           8 bytes,    8.00 avg)
llvm/test/DebugInfo/PDB/udt-stats.test
12 ↗(On Diff #191789)

Why is <simple type> zero size?

llvm/tools/llvm-pdbutil/DumpOutputStyle.cpp
339 ↗(On Diff #191789)

Would you mind increasing {1,7} to {1,8} and {2,8:N} to {2,10:N} please? The output is offsetted on my end:

S_UNAMESPACE:   45458 entries (  904548 bytes)
  S_REGREL32: 4138872 entries (115147908 bytes)
     S_LOCAL: 2487396 entries (57712788 bytes)
702 ↗(On Diff #191789)

Rui says no auto in LLD (when the type isn't obvious), is that a policy that should apply everywhere in LLVM?

rnk marked 3 inline comments as done.Mar 22 2019, 1:23 PM

I happen to know that @zturner is busy from today until next Thursday, so I'm going to go ahead and land this with some tweaks. It's the dumper, so I think post commit review is fine if he has any suggestions for improving this or simplifying the code.

This is what I'm seeing for a large PDB (2 GB). Things are a bit different from your use-case:

                     Type Record Stats
============================================================

  Types
           Total: 7050382 entries ( 536,798,320 bytes,   76.14 avg)
  --------------------------------------------------------------------------
    LF_FIELDLIST:  369709 entries ( 116,277,336 bytes,  314.51 avg)

That's interesting, it's more consistent with what I expected to find. I think this indicates that your codebase has templates with more members, and therefore longer field (and method) lists. I was surprised that the LF_CLASS and LF_STRUCTURE records dominated browser_tests.exe.pdb.

LF_VFTABLE:   48428 entries ( 106,221,244 bytes, 2193.38 avg)

This, is an interesting result. LLVM can't produce these records, only MSVC can. I asked Dave Bartolomeo and YongKang about their purpose, and they told me that they are used for devirtualization at LTCG time. Given that we don't use them, I bet we can just discard them from the PDB by default, with a flag to explicitly request them if desired. These records include long mangled names of all virtual methods, so it adds up.

LF_STRUCTURE:  502958 entries (  92,283,348 bytes,  183.48 avg)
LF_MFUNCTION: 2699622 entries (  75,589,416 bytes,   28.00 avg)

I guess this LF_MFUNCTION result is consistent with long LF_FIELDLISTS: it probably indicates that many classes with many methods are repeatedly instantiated many times with varying parameters across the codebase.

     LF_CLASS:  323609 entries (  72,441,564 bytes,  223.86 avg)
      LF_ENUM:  122798 entries (  24,322,364 bytes,  198.07 avg)
   LF_POINTER: 1262870 entries (  15,468,832 bytes,   12.25 avg)
   LF_ARGLIST:  794482 entries (  12,368,496 bytes,   15.57 avg)
LF_METHODLIST:  462337 entries (  11,367,124 bytes,   24.59 avg)
     LF_UNION:   29152 entries (   4,260,840 bytes,  146.16 avg)
 LF_PROCEDURE:  236117 entries (   3,777,872 bytes,   16.00 avg)
  LF_MODIFIER:  189660 entries (   2,275,920 bytes,   12.00 avg)
     LF_ARRAY:    7766 entries (     124,912 bytes,   16.08 avg)
   LF_VTSHAPE:     175 entries (      10,668 bytes,   60.96 avg)
  LF_BITFIELD:     698 entries (       8,376 bytes,   12.00 avg)
     LF_LABEL:       1 entries (           8 bytes,    8.00 avg)
llvm/test/DebugInfo/PDB/udt-stats.test
12 ↗(On Diff #191789)

I think this is the size of the type record being referenced, not the S_UDT record. In the case of simple types, there are no type records. This would correspond to something like typedef void *voidptr_t;.

llvm/tools/llvm-pdbutil/DumpOutputStyle.cpp
339 ↗(On Diff #191789)

Definitely, I actually went up to {2,12:N} for types, which is shown down below.

702 ↗(On Diff #191789)

Yeah, I'll remove this. @zturner wrote a lot of this code, and I think he has a different attitude towards auto usage.

aganea accepted this revision.EditedMar 22 2019, 1:37 PM

LGTM other than what I've already commented.

This revision is now accepted and ready to land.Mar 22 2019, 1:37 PM
This revision was automatically updated to reflect the committed changes.