Page MenuHomePhabricator

[modules] [pch] Do not deserialize all lazy template specializations when looking for one.
Needs ReviewPublic

Authored by v.g.vassilev on Dec 19 2017, 2:55 PM.

Details

Summary

Currently, we load all lazy template specializations when we search whether there is a suitable template specialization for a template. This is especially suboptimal with modules. If module B specializes a template from module A the ASTReader would only read the specialization DeclID. This is observed to be especially pathological when module B is stl. However, the template instantiator (almost immediately after) will call findSpecialization. In turn, findSpecialization will load all lazy specializations to give an answer.

This patch teaches findSpecialization to work with lazy specializations without having to deserialize their full content. It provides along with the DeclID an cross-TU stable ODRHash of the template arguments which is enough to decide if we have already specialization and which are the exact ones (we load all decls with the same hash to avoid potential collisions) to deserialize.

While we make finding a template specialization more memory-efficient we are far from being done. There are still a few places which trigger eager deserialization of template specializations: the places where we require completion of the redeclaration chain.

Diff Detail

Event Timeline

v.g.vassilev created this revision.Dec 19 2017, 2:55 PM
v.g.vassilev added a reviewer: rtrieu.

Fixed a comment typo.

I think you should track lazy partial specializations separately from lazy full specializations -- we need to load all the partial specializations when doing partial specialization selection, and don't want to load all full specializations at the same time.

lib/Serialization/ASTReaderDecl.cpp
99

Move these subexpressions to separate statements; the two "read" calls are not sequenced here.

3999

Likewise here.

v.g.vassilev marked 2 inline comments as done.

Address inline comments: order the read operations.

Teach ASTReader::CompleteRedeclChain to load only the template specializations with the same template argument list.

lib/Serialization/ASTReaderDecl.cpp
99

Good catch!

rsmith added a comment.Jan 2 2018, 2:08 PM

This is looking like a really good start; we'll still want to do something about partial specializations, but this should already help a lot.

Can you produce some sample performance numbers for this change? This is replacing a linear-time algorithm (with a high constant) with a quadratic-time algorithm (with a lower constant), and it'd be useful to confirm that's not going to be a pressing problem in the short term. (In the medium term, we should move to using an on-disk hash table for template specialization lookup, at least for templates with large numbers of specializations.)

include/clang/AST/DeclTemplate.h
762–765

No space between operator and (, please.

lib/AST/DeclTemplate.cpp
216

Can we remove the loaded specializations from the list / zero them out, so know we've already asked the ExternalASTSource for them?

260–268

I'm a little concerned about this: a debug build could perform more deserialization than an optimized build here. It's probably not a big deal, but it could lead to some surprising debugging experiences down the line (where a debug build works but an optimized build misbehaves).

Perhaps we could assert that there are no lazy specializations with the relevant hash in our list before performing this lookup? (This'd require that we remove them as we load them.)

lib/Serialization/ASTReader.cpp
7004–7021

} and else on same line, please.

v.g.vassilev marked 4 inline comments as done.

Address comments:

  • Fix style;
  • Do not potentially deserialize a specialization in debug mode.

Loading of lazy partial template specializations should not trigger loading of full template specializations.

Often during selection process we need to load all partial template specializations. It is very rare to do so for full template specialization.

I have created a simple (-ish) benchmark targeted to stress test modules with a good number of template specializations. I use the mp11 library from boost. I am building two modules which contain around 1K specializations (mostly full specializations). The example can be found here. I use a small fraction of them. This patch deserializes (only) 1117 specializations for around 0.2 seconds whereas without it clang deserializes 1905 specializations for 0.4 s.

This I believe can be further improved and I am investigating but that should not be a blocker for this patch.

Zero IsPartial and improve comments in assert and free text.

Reverse template specialization order.

Reduce hash collisions for template specializations.

Currently when we hash a tag type the visitor calls ODRHash::AddDecl which mostly relies on the decl name give distinct hash value. The types coming from template specializations have very similar properties (including decl names). For those we need to provide more information in order to disambiguate them.

This patch adds the template arguments for the template specialization decl corresponding to its type. We manage to reduce further the amount of deserializations from 1117 down to 451.

Reduce further the deserializations from 451 to 449 by providing a more complete implementation of ODRHash::AddTemplateArgument.

Kudos @rtrieu!

Finer grained clang stats can be seen here.

rsmith added a comment.Jan 5 2018, 3:55 PM

I just tried this on one of our larger builds. The good news is that it appears to work: the compilation still succeeds, and significantly fewer nodes are deserialized. The bad news is that the final .cc file compilation got slower, from 42s to 86s. (There's a lot of noise here, and the builds might have been on vastly different machines, so take this with a pinch of salt for now.) Sample stats:

  • "types read" is down from 30% to 17%
  • "declarations read" is down from 34% to 23%
  • number of ClassTemplateSpecializations read has decreased by 30%, number of CXXRecordDecls read is down 25%
  • total ASTContext memory usage is down by 12%

Add sanity check.

We assert that the ODR hash of the template arguments should be the same as the one for from the actually found template specialization.

This way we managed to catch a few collisions in the ODRHash logic.