This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/ASTMatchers/
-
lib/
-
ASTMatchers/
3
ASTMatchFinder.cpp

Differential D80202

[ASTMatchers] Performance optimization for memoization
Needs ReviewPublic

Authored by njames93 on May 19 2020, 4:03 AM.

Download Raw Diff

Details

Reviewers

klimek
sammccall

Summary

A simple optimization to avoid unnecessary copies of BoundNodesTreeBuilders when querying the cache for memoized matchers.
This hasn't been benchmarked as I'm unsure how the memoized matchers were first benchmarked so it shouldn't be merged unless there is a performance improvement.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

njames93 created this revision.May 19 2020, 4:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 4:03 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

grandinj added a subscriber: grandinj.May 19 2020, 5:28 AM

grandinj added inline comments.

clang/lib/ASTMatchers/ASTMatchFinder.cpp
84	note that making fields const tends to disable various move optimisations, so better to mark the whole struct const in code that uses it

Harbormaster completed remote builds in B57187: Diff 264846.May 19 2020, 5:55 AM

Given the extra complexity I'd like to see that it matters - bound nodes tend to be small.

In D80202#2046266, @klimek wrote:

Given the extra complexity I'd like to see that it matters - bound nodes tend to be small.

I put that in the description, but this is where i need help. whats the best way to benchmark the matchers?
Also, do you know how it was benchmarked when MaxMemoizationEntries was decided upon?
There was also comments about some making some micro benchmarks but I don't think that was acted upon.

In D80202#2046321, @njames93 wrote:

In D80202#2046266, @klimek wrote:

Given the extra complexity I'd like to see that it matters - bound nodes tend to be small.

I put that in the description, but this is where i need help. whats the best way to benchmark the matchers?
Also, do you know how it was benchmarked when MaxMemoizationEntries was decided upon?
There was also comments about some making some micro benchmarks but I don't think that was acted upon.

I do remember benchmarking it when I wrote it, but that was 10 years ago :)
I mainly did benchmarks on large ASTs, trying various matchers.
Today we can benchmark for example whether running clang-tidy on all of LLVM makes a difference.
Micro-BMs would be nice to add, but are somewhat hard to get right.

Running most of the clang tidy checks on the clang-tidy folder yields these results

=================================BeforePatch===================================

RUN1:
4045.17user 83.93system 11:28.80elapsed 599%CPU (0avgtext+0avgdata 534024maxresident)k
0inputs+0outputs (0major+27584683minor)pagefaults 0swaps

RUN2:
4078.06user 84.99system 11:35.99elapsed 598%CPU (0avgtext+0avgdata 506912maxresident)k
55312inputs+0outputs (663major+27661947minor)pagefaults 0swaps

RUN3:
4040.77user 86.02system 11:28.85elapsed 599%CPU (0avgtext+0avgdata 547096maxresident)k
0inputs+0outputs (0major+27698937minor)pagefaults 0swaps

==================================AfterPatch===================================

RUN1: 
4025.33user 83.32system 11:27.00elapsed 598%CPU (0avgtext+0avgdata 530568maxresident)k
0inputs+0outputs (0major+27689512minor)pagefaults 0swaps

RUN2:
4056.93user 83.36system 11:32.31elapsed 598%CPU (0avgtext+0avgdata 529120maxresident)k
3752inputs+0outputs (19major+27794845minor)pagefaults 0swaps

RUN3:
4029.05user 85.45system 11:26.31elapsed 599%CPU (0avgtext+0avgdata 533508maxresident)k
0inputs+0outputs (0major+27730918minor)pagefaults 0swaps

Shows a consistent improvement but there tests are very noisy and dont focus on just the matching, they also include all the other boilderplate when running clang-tidy over a database. not to mention a small sample size

I decided to do some more stringent benchmarking, just focusing on the matching, not the boilerplate of running clang-tidy.

=================================BeforePatch===================================
Matcher Timings:   116.0756 user  29.1397 system  145.2154 process  145.2168 wall
Matcher Timings:   117.7205 user  29.2475 system  146.9681 process  146.9158 wall
Matcher Timings:   116.8313 user  29.5170 system  146.3483 process  146.2655 wall
Matcher Timings:   117.9491 user  29.0969 system  147.0459 process  146.9678 wall
Matcher Timings:   117.6309 user  29.1864 system  146.8173 process  146.7687 wall

user:    117.2+-0.753
system:  29.24+-0.166
process: 146.5+-0.760

==================================AfterPatch===================================
Matcher Timings:   110.5497 user  28.3316 system  138.8813 process  138.7960 wall
Matcher Timings:   112.5151 user  28.8616 system  141.3767 process  141.3003 wall
Matcher Timings:   116.1578 user  28.9472 system  145.1049 process  145.0785 wall
Matcher Timings:   107.1089 user  27.2752 system  134.3841 process  134.3459 wall
Matcher Timings:   105.9242 user  27.0338 system  132.9580 process  132.9010 wall

user:    110.4+-4.159
system:  28.09+-0.890
process: 138.6+-4.979

This is showing ~5% improvement when running every clang-tidy check on a translation unit (Specifically clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp)

rebase

+Sam for additional sanity checking.

clang/lib/ASTMatchers/ASTMatchFinder.cpp
904	Ok, if this actually matters, we should also not use a std::map here, but a llvm::DenseMap (or do we rely on iteration invalidation semantics here?).

Harbormaster failed remote builds in B61233: Diff 272433!Jun 22 2020, 8:35 AM

Fix checks failing

Harbormaster completed remote builds in B61256: Diff 272470.Jun 22 2020, 10:45 AM

sammccall added inline comments.Jul 7 2020, 2:30 AM

clang/lib/ASTMatchers/ASTMatchFinder.cpp
904	Belated +1. 5% seems like this performance does matter, so DenseMap::find_as should be yet faster. It looks like BoundNodesTreeBuilder would need a DenseMapInfo, but all the other members are hashable already.

Revision Contents

Path

Size

clang/

lib/

ASTMatchers/

ASTMatchFinder.cpp

73 lines

Diff 272470

clang/lib/ASTMatchers/ASTMatchFinder.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	struct MatchKey {

bool operator<(const MatchKey &Other) const {		bool operator<(const MatchKey &Other) const {
return std::tie(Traversal, Type, MatcherID, Node, BoundNodes) <		return std::tie(Traversal, Type, MatcherID, Node, BoundNodes) <
std::tie(Other.Traversal, Other.Type, Other.MatcherID, Other.Node,		std::tie(Other.Traversal, Other.Type, Other.MatcherID, Other.Node,
Other.BoundNodes);		Other.BoundNodes);
}		}
};		};

		/// Performance optimization for querying the memoized map without the expensive
		/// copy of the \c BoundNodesTreeBuilder and to a lesser extent the \c
		/// DynTypedNode.
		struct MatchKeyRef {
		const DynTypedMatcher::MatcherIDType MatcherID;
		const DynTypedNode &Node;
		const BoundNodesTreeBuilder &BoundNodes;
		const TraversalKind Traversal;
		grandinjUnsubmitted Not Done Reply Inline Actions note that making fields const tends to disable various move optimisations, so better to mark the whole struct const in code that uses it grandinj: note that making fields const tends to disable various move optimisations, so better to mark…
		MatchType Type;

		MatchKey toKey() const {
		return MatchKey{MatcherID, Node, BoundNodes, Traversal, Type};
		}
		};

		bool operator<(const MatchKey &LHS, const MatchKeyRef &RHS) {
		return std::tie(LHS.Traversal, LHS.Type, LHS.MatcherID, LHS.Node,
		LHS.BoundNodes) < std::tie(RHS.Traversal, RHS.Type,
		RHS.MatcherID, RHS.Node,
		RHS.BoundNodes);
		}

		bool operator<(const MatchKeyRef &LHS, const MatchKey &RHS) {
		return std::tie(LHS.Traversal, LHS.Type, LHS.MatcherID, LHS.Node,
		LHS.BoundNodes) < std::tie(RHS.Traversal, RHS.Type,
		RHS.MatcherID, RHS.Node,
		RHS.BoundNodes);
		}

// Used to store the result of a match and possibly bound nodes.		// Used to store the result of a match and possibly bound nodes.
struct MemoizedMatchResult {		struct MemoizedMatchResult {
bool ResultOfMatch;		bool ResultOfMatch;
BoundNodesTreeBuilder Nodes;		BoundNodesTreeBuilder Nodes;
};		};

// A RecursiveASTVisitor that traverses all children or all descendants of		// A RecursiveASTVisitor that traverses all children or all descendants of
// a node.		// a node.
▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	bool memoizedMatchesRecursively(const DynTypedNode &Node, ASTContext &Ctx,
BoundNodesTreeBuilder *Builder, int MaxDepth,		BoundNodesTreeBuilder *Builder, int MaxDepth,
TraversalKind Traversal, BindKind Bind) {		TraversalKind Traversal, BindKind Bind) {
// For AST-nodes that don't have an identity, we can't memoize.		// For AST-nodes that don't have an identity, we can't memoize.
// When doing a single-level match, we don't need to memoize		// When doing a single-level match, we don't need to memoize
if (!Node.getMemoizationData() \|\| !Builder->isComparable() \|\| MaxDepth == 1)		if (!Node.getMemoizationData() \|\| !Builder->isComparable() \|\| MaxDepth == 1)
return matchesRecursively(Node, Matcher, Builder, MaxDepth, Traversal,		return matchesRecursively(Node, Matcher, Builder, MaxDepth, Traversal,
Bind);		Bind);

MatchKey Key;
Key.MatcherID = Matcher.getID();
Key.Node = Node;
// Note that we key on the bindings before the match.		// Note that we key on the bindings before the match.
Key.BoundNodes = *Builder;		MatchKeyRef KeyRef = {Matcher.getID(), Node, *Builder,
Key.Traversal = Ctx.getParentMapContext().getTraversalKind();		Ctx.getParentMapContext().getTraversalKind(),
Key.Type = MatchType::Descendants;		MatchType::Descendants};
MemoizationMap::iterator I = ResultCache.find(Key);
		MemoizationMap::iterator I = ResultCache.find(KeyRef);
if (I != ResultCache.end()) {		if (I != ResultCache.end()) {
*Builder = I->second.Nodes;		*Builder = I->second.Nodes;
return I->second.ResultOfMatch;		return I->second.ResultOfMatch;
}		}

MemoizedMatchResult Result;		MemoizedMatchResult Result;
Result.Nodes = *Builder;		Result.Nodes = *Builder;
Result.ResultOfMatch = matchesRecursively(Node, Matcher, &Result.Nodes,		Result.ResultOfMatch = matchesRecursively(Node, Matcher, &Result.Nodes,
MaxDepth, Traversal, Bind);		MaxDepth, Traversal, Bind);

MemoizedMatchResult &CachedResult = ResultCache[Key];		auto Cached = ResultCache.emplace(KeyRef.toKey(), std::move(Result));
CachedResult = std::move(Result);		assert(Cached.second && "Emplace should succeed");
		*Builder = Cached.first->second.Nodes;
*Builder = CachedResult.Nodes;		return Cached.first->second.ResultOfMatch;
return CachedResult.ResultOfMatch;
}		}

// Matches children or descendants of 'Node' with 'BaseMatcher'.		// Matches children or descendants of 'Node' with 'BaseMatcher'.
bool matchesRecursively(const DynTypedNode &Node,		bool matchesRecursively(const DynTypedNode &Node,
const DynTypedMatcher &Matcher,		const DynTypedMatcher &Matcher,
BoundNodesTreeBuilder *Builder, int MaxDepth,		BoundNodesTreeBuilder *Builder, int MaxDepth,
TraversalKind Traversal, BindKind Bind) {		TraversalKind Traversal, BindKind Bind) {
MatchChildASTVisitor Visitor(		MatchChildASTVisitor Visitor(
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	bool memoizedMatchesAncestorOfRecursively(const DynTypedNode &Node,
BoundNodesTreeBuilder *Builder,		BoundNodesTreeBuilder *Builder,
AncestorMatchMode MatchMode) {		AncestorMatchMode MatchMode) {
// For AST-nodes that don't have an identity, we can't memoize.		// For AST-nodes that don't have an identity, we can't memoize.
// When doing a single-level match, we don't need to memoize		// When doing a single-level match, we don't need to memoize
if (!Builder->isComparable() \|\| MatchMode == AncestorMatchMode::AMM_ParentOnly)		if (!Builder->isComparable() \|\| MatchMode == AncestorMatchMode::AMM_ParentOnly)
return matchesAncestorOfRecursively(Node, Ctx, Matcher, Builder,		return matchesAncestorOfRecursively(Node, Ctx, Matcher, Builder,
MatchMode);		MatchMode);

MatchKey Key;		// Note that we key on the bindings before the match.
Key.MatcherID = Matcher.getID();		MatchKeyRef KeyRef = {Matcher.getID(), Node, *Builder,
Key.Node = Node;		Ctx.getParentMapContext().getTraversalKind(),
Key.BoundNodes = *Builder;		MatchType::Ancestors};
Key.Traversal = Ctx.getParentMapContext().getTraversalKind();
Key.Type = MatchType::Ancestors;

// Note that we cannot use insert and reuse the iterator, as recursive		// Note that we cannot use insert and reuse the iterator, as recursive
// calls to match might invalidate the result cache iterators.		// calls to match might invalidate the result cache iterators.
MemoizationMap::iterator I = ResultCache.find(Key);		MemoizationMap::iterator I = ResultCache.find(KeyRef);
if (I != ResultCache.end()) {		if (I != ResultCache.end()) {
*Builder = I->second.Nodes;		*Builder = I->second.Nodes;
return I->second.ResultOfMatch;		return I->second.ResultOfMatch;
}		}

MemoizedMatchResult Result;		MemoizedMatchResult Result;
Result.Nodes = *Builder;		Result.Nodes = *Builder;
Result.ResultOfMatch = matchesAncestorOfRecursively(		Result.ResultOfMatch = matchesAncestorOfRecursively(
Node, Ctx, Matcher, &Result.Nodes, MatchMode);		Node, Ctx, Matcher, &Result.Nodes, MatchMode);

MemoizedMatchResult &CachedResult = ResultCache[Key];		auto Cached = ResultCache.emplace(KeyRef.toKey(), std::move(Result));
CachedResult = std::move(Result);		assert(Cached.second && "Emplace should succeed");
		*Builder = Cached.first->second.Nodes;
*Builder = CachedResult.Nodes;		return Cached.first->second.ResultOfMatch;
return CachedResult.ResultOfMatch;
}		}

bool matchesAncestorOfRecursively(const DynTypedNode &Node, ASTContext &Ctx,		bool matchesAncestorOfRecursively(const DynTypedNode &Node, ASTContext &Ctx,
const DynTypedMatcher &Matcher,		const DynTypedMatcher &Matcher,
BoundNodesTreeBuilder *Builder,		BoundNodesTreeBuilder *Builder,
AncestorMatchMode MatchMode) {		AncestorMatchMode MatchMode) {
const auto &Parents = ActiveASTContext->getParents(Node);		const auto &Parents = ActiveASTContext->getParents(Node);
if (Parents.empty()) {		if (Parents.empty()) {
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	#endif
llvm::DenseMap<const Type, std::set<const TypedefNameDecl> > TypeAliases;		llvm::DenseMap<const Type, std::set<const TypedefNameDecl> > TypeAliases;

// Maps an Objective-C interface to its ObjCCompatibleAliasDecls.		// Maps an Objective-C interface to its ObjCCompatibleAliasDecls.
llvm::DenseMap<const ObjCInterfaceDecl *,		llvm::DenseMap<const ObjCInterfaceDecl *,
llvm::SmallPtrSet<const ObjCCompatibleAliasDecl *, 2>>		llvm::SmallPtrSet<const ObjCCompatibleAliasDecl *, 2>>
CompatibleAliases;		CompatibleAliases;

// Maps (matcher, node) -> the match result for memoization.		// Maps (matcher, node) -> the match result for memoization.
typedef std::map<MatchKey, MemoizedMatchResult> MemoizationMap;		typedef std::map<MatchKey, MemoizedMatchResult, std::less<>> MemoizationMap;
		klimekUnsubmitted Not Done Reply Inline Actions Ok, if this actually matters, we should also not use a std::map here, but a llvm::DenseMap (or do we rely on iteration invalidation semantics here?). klimek: Ok, if this actually matters, we should also not use a std::map here, but a llvm::DenseMap (or…
		sammccallUnsubmitted Not Done Reply Inline Actions Belated +1. 5% seems like this performance does matter, so DenseMap::find_as should be yet faster. It looks like BoundNodesTreeBuilder would need a DenseMapInfo, but all the other members are hashable already. sammccall: Belated +1. 5% seems like this performance does matter, so DenseMap::find_as should be yet…
MemoizationMap ResultCache;		MemoizationMap ResultCache;
};		};

static CXXRecordDecl *		static CXXRecordDecl *
getAsCXXRecordDeclOrPrimaryTemplate(const Type *TypeNode) {		getAsCXXRecordDeclOrPrimaryTemplate(const Type *TypeNode) {
if (auto *RD = TypeNode->getAsCXXRecordDecl())		if (auto *RD = TypeNode->getAsCXXRecordDecl())
return RD;		return RD;

▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines