This is an archive of the discontinued LLVM Phabricator instance.

Avoid Decl::getASTContext in hot paths where possible, Part 1
Changes PlannedPublic

Authored by riccibruno on Dec 13 2018, 9:27 AM.

Details

Summary

Decl::getASTContext and DeclContext::getParentASTContext are not that cheap
since they must walk back to the TUDecl, potentially after many cache misses along the way.

Instrumentation shows that most of the iterations in getTranslationUnitDecl could be
eliminated by passing a ref to the ASTContext as a function parameter.
The goal here is not to remove all the calls to getASTContext, but instead
eliminate a good fraction of the iterations in getTranslationUnitDecl.

This patch deals with:

VarDecl::isThisDeclarationADefinition
VarTemplateDecl::isThisDeclarationADefinition
Decl::canBeWeakImported
Decl::isWeakImported
VarDecl::getActingDefinition
ValueDecl::isWeak
VarDecl::checkInitIsICE
ComparisonCategoryInfo::ValueInfo::getIntValue
ComparisonCategoryInfo::ValueInfo::hasValidIntValue
VarDecl::isKnownToBeDefined
VarDecl::getDefinition
VarDecl::hasDefinition
VarTemplateDecl::getDefinition
VarDecl::getTemplateInstantiationPattern

Diff Detail

Repository
rC Clang

Event Timeline

riccibruno created this revision.Dec 13 2018, 9:27 AM
aaron.ballman added a subscriber: rsmith.

I'd like @rsmith's opinion on whether this is a good churn or not. I think it's mostly reasonable, but it's also a lot of changes for identical behavior and in some cases the changes are a bit unfortunate.

include/clang/ASTMatchers/ASTMatchers.h
4504

I don't care for this refactoring -- the new code repeats the types from the function definition and is considerably harder to read.

lib/AST/Decl.cpp
2378

Bad use of auto, though it is used above so maybe it should stay for local consistency.

lib/Sema/SemaDecl.cpp
2849

I'm not keen on the code duplication. :-(

Just to expend on the instrumentation results: The measurement was done with all of Boost. I would take the estimation of the time wasted with
a grain of salt since this is just num_iterations * 10ns which is obviously a very rough estimation.
However on my machine I get that removing half of the iterations in getTranslationUnitDecl reduce the run-time of an fsyntax-only by a little more than 1%.

Extrapolating from this, this could be worth about 2% if we remove, say 80%, of the iterations in getTranslationUnitDecl.

I guess that the question is: Is this churn worth about 2% ? I would like to argue that yes since:

  1. 2% is a lot for something that is morally a simple getter.
  2. The methods of the expression classes must already take a ref to the ASTContext if they need it.
  3. In the vast majority of cases the context is already easily available. It is true though that this requires propagating it in some functions.
  4. This figure do not include some potential gains in the time needed to do the AST -> LLVM IR code generation.
  5. I suspect (but have not done the measurement) that the gain is even greater during a typical compilation where more than one thread is used since possibly more iterations are going to cause an expensive LLC miss.
riccibruno marked 3 inline comments as done.

Addressed inline comments

vsk added a subscriber: vsk.Dec 15 2018, 10:32 AM

Thanks for working on this :). It’d be interesting to see some pre/post patch compile time numbers on CTMark.

Naive/strawman alternative here, but: what’s the impact on peak RSS and compile time of just storing an ASTContext pointer in Decl/DeclContext? If it’s not out of the question to grow Decl etc. and it gives a nice speed up, that might be a simpler approach.

riccibruno added a comment.EditedDec 15 2018, 10:44 AM
In D55658#1332240, @vsk wrote:

Thanks for working on this :). It’d be interesting to see some pre/post patch compile time numbers on CTMark.

Naive/strawman alternative here, but: what’s the impact on peak RSS and compile time of just storing an ASTContext pointer in Decl/DeclContext? If it’s not out of the question to grow Decl etc. and it gives a nice speed up, that might be a simpler approach.

I will try to get some numbers from CTMark.

I don't think that storing a ref/pointer to the ASTContext in each DeclContext (no point in storing it in Decl) is a great idea.
I experimented with this and the speedup from avoiding the lookup of the ASTContext is almost completely offset by
the slowdown from the increased memory usage.

vsk added a comment.Dec 15 2018, 11:02 AM
In D55658#1332240, @vsk wrote:

Thanks for working on this :). It’d be interesting to see some pre/post patch compile time numbers on CTMark.

Naive/strawman alternative here, but: what’s the impact on peak RSS and compile time of just storing an ASTContext pointer in Decl/DeclContext? If it’s not out of the question to grow Decl etc. and it gives a nice speed up, that might be a simpler approach.

I will try to get some numbers from CTMark.

I don't think that storing a ref/pointer to the ASTContext in each DeclContext (no point in storing it in Decl) is a great idea.
I experimented with this and the speedup from avoiding the lookup of the ASTContext is almost completely offset by
the slowdown from the increased memory usage.

Good to know, thanks for trying it out!

riccibruno planned changes to this revision.Jan 18 2019, 8:06 AM