[CodeView] Don't output nested typedefs that are scoped to classes.
AcceptedPublic

Authored by zturner on Sep 1 2017, 5:56 PM.

Details

Reviewers
rnk
Summary

S_UDT records are basically the "bridge" between the debugger's expression evaluator and the type information. If you type (Foo *)nullptr into the watch window, the debugger looks for an S_UDT record named Foo. If it can find one, it displays your type. Otherwise you get an error.

We have always understood this to mean that if you have code like this:

struct A {
  int X;
};

struct B {
  typedef A AT;
  AT Member;
};

that you will get 3 S_UDT records. "A", "B", and "B::AT". Because if you were to type (B::AT*)nullptr into the debugger, it would need to find an S_UDT record named "B::AT".

But "B::AT" is actually the S_UDT record that would be generated if B were a namespace, not a struct. So the debugger needs to be able to distinguish this case. So what it does is:

  1. Look for an S_UDT named "B::AT". If it finds one, it knows that AT is in a namespace.
  2. If it doesn't find one, split at the scope resolution operator, and look for an S_UDT named B. If it finds one, look up the type for B, and then look for AT as one of its members.

With this algorithm, S_UDT records for nested typedefs are not just unnecessary, but actually wrong!

The results of implementing this in clang are dramatic. It cuts our /DEBUG:FASTLINK PDB sizes by more than 50%, and we go from being ~20% larger than MSVC PDBs on average, to ~40% smaller.

It also slightly speeds up link time. We get about 10% faster links than without this patch.

Diff Detail

zturner created this revision.Sep 1 2017, 5:56 PM
rnk accepted this revision.Sep 3 2017, 8:17 AM

Lgtm!

This revision is now accepted and ready to land.Sep 3 2017, 8:17 AM
rnk added a comment.Sep 3 2017, 8:17 AM

Does this really not affect existing tests? This should have at least one test.

Yea I actually have a test for this already. I guess in my haste to get the CL uploaded before I left for the day, it somehow didn't end up making it. I'll re-upload on Tuesday.

Nice description of the problem.

I'm surprised the size difference is so dramatic. It's hard to imagine that many classes/structs having that many typedefs, except for the standard containers.

What about:

struct X {
  typedef struct {} Y;
};

?

rnk added a comment.Sep 5 2017, 9:33 AM

Thanks for fixing this, I think this was a regression from my change rL310410. The intention of that change was just to get LF_NESTED records inside the LF_FIELDLISTs, but I didn't notice the extra S_UDTs that were created, which lead to this explosion.

zturner added a comment.EditedSep 5 2017, 10:08 AM

Nice description of the problem.

I'm surprised the size difference is so dramatic. It's hard to imagine that many classes/structs having that many typedefs, except for the standard containers.

@amccarth: The problem is two-fold. First, there is the issue of just the S_UDT records themselves. You're right that most of them come from standard containers, but there's so many. Every instantiation, gets one, and since some classes are instantiated with other template classes, you get a large explosion of template instantiations, and thusly a large number of these S_UDT records. I ran llvm-pdbutil dump -udt-stats before and after, and came up with this:

             Patch    No Patch  % Decrease
S_UDT Count   6,769    89,421     92%
S_UDT Size   81,936   835,664     90%

So as you can see, the records themselves are only responsible for a size reduction of ~753KB.

But this isn't the real source of the problem. /DEBUG:FASTLINK PDBs don't work the same way as regular PDBs. Normally the linker would merge all the S_UDT records into a single global symbol stream, eliminating duplicates, and take them out of the module stream, but with a /DEBUG:FASTLINK PDB, all of the symbols stay in the original module stream. This means you end up with 10s of thousands of duplicate S_UDT records in a /DEBUG:FASTLINK PDB. So that 835KB size increase, multiply it by a couple of orders of magnitude and there's your size increase.

To prove this, I ran llvm-pdbutil dump -sym-stats on the fastlink versions of the before & after PDBs.

With Patch

D:\analyze>llvm-pdbutil dump -sym-stats libclang-fastlink-patch.pdb

  Summary |

      Symbols
                                         Total: 2853675 entries (341483776 bytes)
      --------------------------------------------------------------------------
                                     S_THUNK32:     681 entries (   31356 bytes)
                                     S_SECTION:       8 entries (     224 bytes)
                                    S_ENVBLOCK:       2 entries (     556 bytes)
                                      S_EXPORT:     353 entries (   13872 bytes)
                                  S_TRAMPOLINE:  297459 entries ( 5949180 bytes)
                                         S_END:     681 entries (    2724 bytes)
                                    S_FASTLINK: 2553073 entries (335435108 bytes)
                                     S_OBJNAME:     686 entries (   16552 bytes)
                                    S_COMPILE3:     686 entries (   32928 bytes)
                                   S_COFFGROUP:      46 entries (    1276 bytes)

      Chunks
                                         Total:  500501 entries (96025508 bytes)
      --------------------------------------------------------------------------
                            DEBUG_S_FILECHKSMS:    1300 entries (  447104 bytes)
                       DEBUG_S_COFF_SYMBOL_RVA:    1990 entries (51430364 bytes)
                                 DEBUG_S_LINES:  497211 entries (44148040 bytes)

There are 2,553,073 S_FASTLINK records here. If we do the same on the fastlink PDB that doesn't have the patch, we get:

Without Patch:

D:\analyze>llvm-pdbutil dump -sym-stats libclang-fastlink-nopatch.pdb

      Symbols
                                         Total: 6521428 entries (909728300 bytes)
      --------------------------------------------------------------------------
                                     S_THUNK32:     681 entries (   31356 bytes)
                                     S_SECTION:       8 entries (     224 bytes)
                                    S_ENVBLOCK:       2 entries (     556 bytes)
                                      S_EXPORT:     353 entries (   13872 bytes)
                                  S_TRAMPOLINE:  297459 entries ( 5949180 bytes)
                                         S_END:     681 entries (    2724 bytes)
                                    S_FASTLINK: 6220826 entries (903679632 bytes)
                                     S_OBJNAME:     686 entries (   16552 bytes)
                                    S_COMPILE3:     686 entries (   32928 bytes)
                                   S_COFFGROUP:      46 entries (    1276 bytes)

      Chunks
                                         Total:  500501 entries (96025412 bytes)
      --------------------------------------------------------------------------
                            DEBUG_S_FILECHKSMS:    1300 entries (  447104 bytes)
                       DEBUG_S_COFF_SYMBOL_RVA:    1990 entries (51430364 bytes)
                                 DEBUG_S_LINES:  497211 entries (44147944 bytes)

And here there are 6,220,826. So even though we only eliminated 82,652 merged S_UDT records totalling a miniscule 753KB , when they're not merged, it accounts for a reduction of 3.6 million S_UDT records totalling 568MB.

@majnemer: It does output an S_UDT in that case, but so do we, so our behavior matches.

zturner updated this revision to Diff 113889.Sep 5 2017, 11:52 AM

Updated test.

rnk accepted this revision.Sep 5 2017, 12:51 PM

Test case lgtm