This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
1/1
ParsedAST.cpp
-
clang/
-
include/clang/Tooling/Syntax/
-
clang/
-
Tooling/
-
Syntax/
9/9
Tokens.h
-
lib/Tooling/Syntax/
-
Tooling/
-
Syntax/
2/2
Tokens.cpp
-
unittests/Tooling/Syntax/
-
Tooling/
-
Syntax/
-
TokensTest.cpp

Differential D99086

[clang][Syntax] Optimize expandedTokens for token ranges.
ClosedPublic

Authored by usaxena95 on Mar 22 2021, 9:21 AM.

Download Raw Diff

Details

Reviewers

kadircet
sammccall

Commits

rGaa979084dffb: [clang][Syntax] Optimize expandedTokens for token ranges.

Summary

expandedTokens(SourceRange) used to do a binary search to get the
expanded tokens belonging to a source range. Each binary search uses
isBeforeInTranslationUnit to order two source locations. This is
inherently very slow.
By profiling clangd we found out that users like clangd::SelectionTree
spend 95% of time in isBeforeInTranslationUnit. Also it is worth
noting that users of expandedTokens(SourceRange) majorly use ranges
provided by AST to query this funciton. The ranges provided by AST are
token ranges (starting at the beginning of a token and ending at the
beginning of another token).

Therefore we can avoid the binary search in majority of the cases by
maintaining an index of ExpandedToken by their SourceLocations. We still
do binary search for ranges which are not token ranges but such
instances are quite low.

For a cachedAST case, we have wins even for normal clangd usage. Invoking location based features has same performance both in batch mode and normal server mode.

Performance:
~/build/bin/clangd --check=clang/lib/Serialization/ASTReader.cpp
Before: Took 2:10s to complete.
Now: Took 1:13s to complete.
+45% faster.

For medium size buffers ( like XRefs.cpp) the iteration is negligible as compared to ParsedAST. The wins are lower 4-8% for smaller buffers.
~/build/bin/clangd -- clang-tools-extra/clangd/XRefs.cpp
Preamble: 5s
ParsedAST: 0.54s
Location Features: 0.51s (5% speedup)
ExpandedToken iteration: 0.00018s (0.03% of ParsedAST)

Huge win for Macro heavy files (like tests)
~/build/bin/clangd --check=clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
Preamble: 2.3s
ParsedAST: 0.23s
ExpandedToken iteration: 0.002s (0.85% of ParsedAST)
Location feature: 1702s to 363s (+80% faster)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 requested review of this revision.Mar 22 2021, 9:21 AM

usaxena95 created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 9:21 AM

Herald added subscribers: cfe-commits, ilya-biryukov. · View Herald Transcript

kadircet added inline comments.Mar 22 2021, 11:08 AM

clang/include/clang/Tooling/Syntax/Tokens.h
380	this definitely makes sense, but most of the users of `TokenBuffers` are currently invoking `expandedTokens(Range)` only once, so i am not sure if this is a huge benefit on their-side, whereas they'll likely end up seeing some short-lived memory consumption (and an extra traversal of all the expanded tokens). Any reasons for not moving this logic to the application? `expandedTokens()` endpoint will return all the tokens, no questions asked. Then the application can build whatever acceleration structure it wants. If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper?

Harbormaster completed remote builds in B95019: Diff 332325.Mar 22 2021, 11:22 AM

usaxena95 added inline comments.Mar 22 2021, 12:07 PM

clang/include/clang/Tooling/Syntax/Tokens.h
380	this definitely makes sense, but most of the users of TokenBuffers are currently invoking expandedTokens(Range) only once I didn't get that users are invoking it only once. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search which uses `isBeforeInTranslationUnit` (considerably slower). Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range. Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer. I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit. My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself. Any reasons for not moving this logic to the application? If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper? This can basically benefit all users of `expandedToken(Range)`. Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application. Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases.

kadircet added inline comments.Mar 22 2021, 1:13 PM

clang/include/clang/Tooling/Syntax/Tokens.h
380	I didn't get that users are invoking it only once. Sorry that was confusing, I was trying to say that on use cases like Hover/Go-To etc. clangd builds selection tree only once and you are right, that might actually involve multiple calls to `expandedTokens(Range)` but that shouldn't be the bottleneck of these features. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search right, i was saying that all that precomputation is also probably unneeded, as normally features are only interested in a single token, which should nest properly in AST under normal circumstances and doesn't require calling expandedTokens for all possible ranges. Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range. I was suggesting to move this into application because that felt counter-intuitive, as the benchmark you provided is actually running Hover(and other feautres) on every (spelled) token of the file. If you say that it is beneficial even for the normal hover case, sure let's go (but i'd be happier if you proved me wrong first 😅 ) Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer. I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit. My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself. That's what clangd does though (not sure if Tokenbuffers have any other users ATM). It literally builds a new ParsedAST after every edit(It also has a cache to skip some work if contents didn't change between requests). But usually all of those requests are user-initiated, so an extra latency of a couple ms on every request isn't really noticeable. Getting rid of that would still be great, but if it means slowing down the initial request for quite some time (due to extra traversal of all the expanded tokens, which might be a huge deal especially on files including some tablegen'd stuff, as they are usually part of the main file and not preamble). This can basically benefit all users of `expandedToken(Range)`. As stated above, i am just worried that this might hurt the use cases in which application just builds a tokenbuffer and calls this on a few ranges. (maybe clangd is the odd one out, but i don't see any other users. there's `clang::syntax::computeReplacements` but it is only used in a test) Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application. I don't follow the reasoning here exactly. The application that needs to operate on every token of the expanded file can call the wrapper rather than using TokenBuffers directly. But... The most dominant application in clangd's case is SelectionTree and if it is going to perform this computation all the time (because it wants to support both batch and single invocation use-case :/ ). As you suggested this can be optional (or triggered with an explicit endpoint) but that's gonna be ugly, as it needs to be supplied when building a ParsedAST, ugh.. (Sorry for drawing a little circle here) I suppose it is cleanest the way you propose it then (unless your application is actually building ParsedAST's itself without any clangd abstraction like TUScheduler/ClangdServer). It would still be great if you could check for possible regressions on the common path (the easiest would be to just look at `ParsedAST::build` latencies on a couple files, i suppose `clang/lib/Serialization/ASTReader.cpp` is a good example for the pathological case, winning there would probably mean winning in all the other cases too, but it would be good to see a more "sane" file too. As anything after building the `ParsedAST` is guaranteed to be as fast as before. Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases. I suppose to hit the slow case users should provide spelled tokens that did not expand to anything, which should be rare, so I don't think that'll be a huge problem.
clang/lib/Tooling/Syntax/Tokens.cpp
667	nit: might look better as a `for(size_t I=0,E=Tokens.size();I!=E;++I) ....` kind of loop.

Thanks for finding and working on this hotspot. (I'm curious *why* isBeforeInTranslationUnit is slow if you have any insight - it has sad worst-case but I thought we'd hit the one-element cache often enough in practice).

I don't see a better way to optimize this, main question is where to do it.
In favor of TokenBuffer initializer (this patch):

it produces the simplest API
no code duplicated between callers
when a TokenBuffer gets reused and it's awkward for its users to share other state, the same cache can be reused

In favor of in the caller (e.g. SelectionTree)

users that never call expandedTokens(SourceRange) pay nothing
users that call expandedTokens(SourceRange) once pay log(N) * isBeforeInTranslationUnit instead of N, which may come out ahead for large files.

It's a bit unfortunate that TokenBuffer is enough of a grab-bag of features that we can't really reason about which structures to build :-(

Can we get the best of both worlds?
Ideally we'd build this cache only when it's profitable. But building on-demand messes with const (and it's unclear whether to do it on the first call).
What about making this optional via a non-const TokenBuffer::indexExpandedTokens(); // Optimizes future calls to expandedTokens(SourceRange). Idempotent.?
It's a slightly weird API, but not actually hard to use (I imagine clangd would always call it when building ParsedAST).

clang/include/clang/Tooling/Syntax/Tokens.h
380	Long thread here :-) Kadir: one wrinkle that's maybe missed here is that clangd's SelectionTree calls this method many times to establish bounds of many AST nodes. I fully believe this is dominating SelectionTree build time (though it would be nice to see benchmarks).
clang/lib/Tooling/Syntax/Tokens.cpp
665	I think it's worth reserve()ing the map size here

sammccall added inline comments.Mar 23 2021, 5:24 AM

clang/include/clang/Tooling/Syntax/Tokens.h
380	the benchmark you provided is actually running Hover(and other feautres) on every (spelled) token of the file I finally understood this. You're right, but we want to reuse some of the clangd code in batch mode (for code review workflows), which means exactly this! If you say that it is beneficial even for the normal hover case I'd love to have numbers, but my understanding is: parse >> selectiontree > hover. So this should be a >50% speedup for hover in the cached AST case. Though that case is fast anyway, so it's not critically important.

sammccall added a subscriber: adamcz.Mar 23 2021, 11:45 AM

usaxena95 edited the summary of this revision. (Show Details)Mar 24 2021, 10:48 AM

Made the optimization an 'opt-in'.
Created TokenBuffer::indexExpandedToken() for creating the index.

Made some more measurements on other files. Added to description.

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 10:48 AM

Herald added a subscriber: arphaman. · View Herald Transcript

usaxena95 edited the summary of this revision. (Show Details)Mar 24 2021, 10:50 AM

Remove unintended changes.

Harbormaster completed remote builds in B95530: Diff 333045.Mar 24 2021, 6:58 PM

Harbormaster completed remote builds in B95537: Diff 333052.Mar 24 2021, 7:27 PM

Just doc nits - I think maybe there's some confusion on what a token range is.
Code looks good though!

clang-tools-extra/clangd/ParsedAST.cpp
426	nit: the first sentence is just repeating the function's doc - I'd just say "Makes SelectionTree build much faster"
clang/include/clang/Tooling/Syntax/Tokens.h
196	Hmm, the primary doc (first sentence) should probably be about the effect rather than implementation details. Maybe just "Builds a cache to make expandedToken(SourceRange) faster."?
203	token range was correct and critical to mention here, please revert It means that R.endLocation() is the last token included in the range. Sometimes endLocation() means other things, in closed character range, or an half-open token or character range.
205	can you drop "for 'token ranges'" here? I'm not sure what it means, and all ranges passed in here must be token ranges per contract.
379	again this comment is confusing. I'd suggest just dropping it - it's fairly covered by indexExpandedTokens()'s comment.

This revision is now accepted and ready to land.Mar 25 2021, 9:47 AM

Addressed comments.

This revision was landed with ongoing or failed builds.Mar 25 2021, 10:54 AM

Closed by commit rGaa979084dffb: [clang][Syntax] Optimize expandedTokens for token ranges. (authored by usaxena95). · Explain Why

This revision was automatically updated to reflect the committed changes.

usaxena95 marked an inline comment as done.

usaxena95 added a commit: rGaa979084dffb: [clang][Syntax] Optimize expandedTokens for token ranges..

Harbormaster completed remote builds in B95720: Diff 333344.Mar 25 2021, 11:18 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

ParsedAST.cpp

5 lines

clang/

include/

clang/

Tooling/

Syntax/

Tokens.h

14 lines

lib/

Tooling/

Syntax/

Tokens.cpp

24 lines

unittests/

Tooling/

Syntax/

TokensTest.cpp

1 line

Diff 333052

clang-tools-extra/clangd/ParsedAST.cpp

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	llvm::erase_if(CTChecks, [&](const auto &Check) {
return !Check->isLanguageVersionSupported(CTContext->getLangOpts());		return !Check->isLanguageVersionSupported(CTContext->getLangOpts());
});		});
Preprocessor *PP = &Clang->getPreprocessor();		Preprocessor *PP = &Clang->getPreprocessor();
for (const auto &Check : CTChecks) {		for (const auto &Check : CTChecks) {
Check->registerPPCallbacks(Clang->getSourceManager(), PP, PP);		Check->registerPPCallbacks(Clang->getSourceManager(), PP, PP);
Check->registerMatchers(&CTFinder);		Check->registerMatchers(&CTFinder);
}		}

const Config& Cfg = Config::current();		const Config &Cfg = Config::current();
ASTDiags.setLevelAdjuster([&](DiagnosticsEngine::Level DiagLevel,		ASTDiags.setLevelAdjuster([&](DiagnosticsEngine::Level DiagLevel,
const clang::Diagnostic &Info) {		const clang::Diagnostic &Info) {
if (Cfg.Diagnostics.SuppressAll \|\|		if (Cfg.Diagnostics.SuppressAll \|\|
isBuiltinDiagnosticSuppressed(Info.getID(), Cfg.Diagnostics.Suppress))		isBuiltinDiagnosticSuppressed(Info.getID(), Cfg.Diagnostics.Suppress))
return DiagnosticsEngine::Ignored;		return DiagnosticsEngine::Ignored;
if (!CTChecks.empty()) {		if (!CTChecks.empty()) {
std::string CheckName = CTContext->getCheckName(Info.getID());		std::string CheckName = CTContext->getCheckName(Info.getID());
bool IsClangTidyDiag = !CheckName.empty();		bool IsClangTidyDiag = !CheckName.empty();
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	ParsedAST::build(llvm::StringRef Filename, const ParseInputs &Inputs,
if (llvm::Error Err = Action->Execute())		if (llvm::Error Err = Action->Execute())
log("Execute() failed when building AST for {0}: {1}", MainInput.getFile(),		log("Execute() failed when building AST for {0}: {1}", MainInput.getFile(),
toString(std::move(Err)));		toString(std::move(Err)));

// We have to consume the tokens before running clang-tidy to avoid collecting		// We have to consume the tokens before running clang-tidy to avoid collecting
// tokens from running the preprocessor inside the checks (only		// tokens from running the preprocessor inside the checks (only
// modernize-use-trailing-return-type does that today).		// modernize-use-trailing-return-type does that today).
syntax::TokenBuffer Tokens = std::move(CollectTokens).consume();		syntax::TokenBuffer Tokens = std::move(CollectTokens).consume();
		// Index expanded tokens to optimize finding expandedToken in token ranges.
		sammccallUnsubmitted Done Reply Inline Actions nit: the first sentence is just repeating the function's doc - I'd just say "Makes SelectionTree build much faster" sammccall: nit: the first sentence is just repeating the function's doc - I'd just say "Makes…
		// Primarily useful for building a SelectionTree.
		Tokens.indexExpandedTokens();
std::vector<Decl *> ParsedDecls = Action->takeTopLevelDecls();		std::vector<Decl *> ParsedDecls = Action->takeTopLevelDecls();
// AST traversals should exclude the preamble, to avoid performance cliffs.		// AST traversals should exclude the preamble, to avoid performance cliffs.
Clang->getASTContext().setTraversalScope(ParsedDecls);		Clang->getASTContext().setTraversalScope(ParsedDecls);
if (!CTChecks.empty()) {		if (!CTChecks.empty()) {
// Run the AST-dependent part of the clang-tidy checks.		// Run the AST-dependent part of the clang-tidy checks.
// (The preprocessor part ran already, via PPCallbacks).		// (The preprocessor part ran already, via PPCallbacks).
trace::Span Tracer("ClangTidyMatch");		trace::Span Tracer("ClangTidyMatch");
CTFinder.matchAST(Clang->getASTContext());		CTFinder.matchAST(Clang->getASTContext());
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

clang/include/clang/Tooling/Syntax/Tokens.h

Show All 28 Lines

#include "clang/Basic/FileManager.h"		#include "clang/Basic/FileManager.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/SourceLocation.h"		#include "clang/Basic/SourceLocation.h"
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "clang/Lex/Token.h"		#include "clang/Lex/Token.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cstdint>		#include <cstdint>
#include <tuple>		#include <tuple>

namespace clang {		namespace clang {
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	public:
/// Tokens are in TU order (per SourceManager::isBeforeInTranslationUnit()).		/// Tokens are in TU order (per SourceManager::isBeforeInTranslationUnit()).
/// FIXME: figure out how to handle token splitting, e.g. '>>' can be split		/// FIXME: figure out how to handle token splitting, e.g. '>>' can be split
/// into two '>' tokens by the parser. However, TokenBuffer currently		/// into two '>' tokens by the parser. However, TokenBuffer currently
/// keeps it as a single '>>' token.		/// keeps it as a single '>>' token.
llvm::ArrayRef<syntax::Token> expandedTokens() const {		llvm::ArrayRef<syntax::Token> expandedTokens() const {
return ExpandedTokens;		return ExpandedTokens;
}		}

		/// Creates an index ExpandedTokens by their SourceLocation for faster
		sammccallUnsubmitted Done Reply Inline Actions Hmm, the primary doc (first sentence) should probably be about the effect rather than implementation details. Maybe just "Builds a cache to make expandedToken(SourceRange) faster."? sammccall: Hmm, the primary doc (first sentence) should probably be about the effect rather than…
		/// lookups. This optimizes future calls to `expandedTokens(SourceRange)` for
		/// finding expanded tokens in a 'token range'.
		/// Creates an index only once. Further calls to it will be no-op.
		void indexExpandedTokens();

/// Returns the subrange of expandedTokens() corresponding to the closed		/// Returns the subrange of expandedTokens() corresponding to the closed
/// token range R.		/// source range R.
		sammccallUnsubmitted Done Reply Inline Actions token range was correct and critical to mention here, please revert It means that R.endLocation() is the last token included in the range. Sometimes endLocation() means other things, in closed character range, or an half-open token or character range. sammccall: token range was correct and critical to mention here, please revert It means that R.
		/// Consider calling indexExpandedTokens() before for faster lookups for
		/// 'token ranges'.
		sammccallUnsubmitted Done Reply Inline Actions can you drop "for 'token ranges'" here? I'm not sure what it means, and all ranges passed in here must be token ranges per contract. sammccall: can you drop "for 'token ranges'" here? I'm not sure what it means, and all ranges passed in…
llvm::ArrayRef<syntax::Token> expandedTokens(SourceRange R) const;		llvm::ArrayRef<syntax::Token> expandedTokens(SourceRange R) const;

/// Returns the subrange of spelled tokens corresponding to AST node spanning		/// Returns the subrange of spelled tokens corresponding to AST node spanning
/// \p Expanded. This is the text that should be replaced if a refactoring		/// \p Expanded. This is the text that should be replaced if a refactoring
/// were to rewrite the node. If \p Expanded is empty, the returned value is		/// were to rewrite the node. If \p Expanded is empty, the returned value is
/// llvm::None.		/// llvm::None.
///		///
/// Will fail if the expanded tokens do not correspond to a sequence of		/// Will fail if the expanded tokens do not correspond to a sequence of
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	private:
/// Returns the file that the Spelled tokens are taken from.		/// Returns the file that the Spelled tokens are taken from.
/// Asserts that they are non-empty, from a tracked file, and in-bounds.		/// Asserts that they are non-empty, from a tracked file, and in-bounds.
const MarkedFile &fileForSpelled(llvm::ArrayRef<syntax::Token> Spelled) const;		const MarkedFile &fileForSpelled(llvm::ArrayRef<syntax::Token> Spelled) const;

/// Token stream produced after preprocessing, conceputally this captures the		/// Token stream produced after preprocessing, conceputally this captures the
/// same stream as 'clang -E' (excluding the preprocessor directives like		/// same stream as 'clang -E' (excluding the preprocessor directives like
/// #file, etc.).		/// #file, etc.).
std::vector<syntax::Token> ExpandedTokens;		std::vector<syntax::Token> ExpandedTokens;
		// Index of ExpandedTokens for faster lookups by SourceLocation. This is
		// useful while finding expanded tokens in a 'token range'.
		sammccallUnsubmitted Done Reply Inline Actions again this comment is confusing. I'd suggest just dropping it - it's fairly covered by indexExpandedTokens()'s comment. sammccall: again this comment is confusing. I'd suggest just dropping it - it's fairly covered by…
		llvm::DenseMap<SourceLocation, unsigned> ExpandedTokIndex;
		kadircetUnsubmitted Done Reply Inline Actions this definitely makes sense, but most of the users of `TokenBuffers` are currently invoking `expandedTokens(Range)` only once, so i am not sure if this is a huge benefit on their-side, whereas they'll likely end up seeing some short-lived memory consumption (and an extra traversal of all the expanded tokens). Any reasons for not moving this logic to the application? `expandedTokens()` endpoint will return all the tokens, no questions asked. Then the application can build whatever acceleration structure it wants. If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper? kadircet: this definitely makes sense, but most of the users of `TokenBuffers` are currently invoking…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions this definitely makes sense, but most of the users of TokenBuffers are currently invoking expandedTokens(Range) only once I didn't get that users are invoking it only once. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search which uses `isBeforeInTranslationUnit` (considerably slower). Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range. Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer. I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit. My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself. Any reasons for not moving this logic to the application? If you think there are in-tree use cases that can benefit from this caching, maybe even provide that as a separate helper? This can basically benefit all users of `expandedToken(Range)`. Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application. Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases. usaxena95: > this definitely makes sense, but most of the users of TokenBuffers are currently invoking…
		kadircetUnsubmitted Done Reply Inline Actions I didn't get that users are invoking it only once. Sorry that was confusing, I was trying to say that on use cases like Hover/Go-To etc. clangd builds selection tree only once and you are right, that might actually involve multiple calls to `expandedTokens(Range)` but that shouldn't be the bottleneck of these features. This is not caching the results of `expandedTokens(Range)`. We are precomputing these to completely avoid a binary search right, i was saying that all that precomputation is also probably unneeded, as normally features are only interested in a single token, which should nest properly in AST under normal circumstances and doesn't require calling expandedTokens for all possible ranges. Doing an extra traversal of expandedTokens is not noticeable as compared to `isBeforeInTranslationUnit` latency. Users like SelectionTree (GoToDefinition, Hover,..) can directly benefit from this even though they query this only once per some range. I was suggesting to move this into application because that felt counter-intuitive, as the benchmark you provided is actually running Hover(and other feautres) on every (spelled) token of the file. If you say that it is beneficial even for the normal hover case, sure let's go (but i'd be happier if you proved me wrong first 😅 ) Umm. If you meant use cases like: build token buffer -> use expandedTokens(Range) -> destroy TokenBuffer. I don't immediately see such use cases. But I think it is still better than using isBeforeInTranslationUnit. My mental model is that TokenBuffer is a long lived object (example in ParsedAST in clangd). And using it only once and then discarding it is expensive in itself. That's what clangd does though (not sure if Tokenbuffers have any other users ATM). It literally builds a new ParsedAST after every edit(It also has a cache to skip some work if contents didn't change between requests). But usually all of those requests are user-initiated, so an extra latency of a couple ms on every request isn't really noticeable. Getting rid of that would still be great, but if it means slowing down the initial request for quite some time (due to extra traversal of all the expanded tokens, which might be a huge deal especially on files including some tablegen'd stuff, as they are usually part of the main file and not preamble). This can basically benefit all users of `expandedToken(Range)`. As stated above, i am just worried that this might hurt the use cases in which application just builds a tokenbuffer and calls this on a few ranges. (maybe clangd is the odd one out, but i don't see any other users. there's `clang::syntax::computeReplacements` but it is only used in a test) Since expandedToken(Range) is provided by TokenBuffer, the optimization must also remain in the TokenBuffer as compared to the application. I don't follow the reasoning here exactly. The application that needs to operate on every token of the expanded file can call the wrapper rather than using TokenBuffers directly. But... The most dominant application in clangd's case is SelectionTree and if it is going to perform this computation all the time (because it wants to support both batch and single invocation use-case :/ ). As you suggested this can be optional (or triggered with an explicit endpoint) but that's gonna be ugly, as it needs to be supplied when building a ParsedAST, ugh.. (Sorry for drawing a little circle here) I suppose it is cleanest the way you propose it then (unless your application is actually building ParsedAST's itself without any clangd abstraction like TUScheduler/ClangdServer). It would still be great if you could check for possible regressions on the common path (the easiest would be to just look at `ParsedAST::build` latencies on a couple files, i suppose `clang/lib/Serialization/ASTReader.cpp` is a good example for the pathological case, winning there would probably mean winning in all the other cases too, but it would be good to see a more "sane" file too. As anything after building the `ParsedAST` is guaranteed to be as fast as before. Although users which only use this for non-token ranges will pay additional cost. We can potentially make this precomputation optional for such cases. I suppose to hit the slow case users should provide spelled tokens that did not expand to anything, which should be rare, so I don't think that'll be a huge problem. kadircet: > I didn't get that users are invoking it only once. Sorry that was confusing, I was trying to…
		sammccallUnsubmitted Done Reply Inline Actions Long thread here :-) Kadir: one wrinkle that's maybe missed here is that clangd's SelectionTree calls this method many times to establish bounds of many AST nodes. I fully believe this is dominating SelectionTree build time (though it would be nice to see benchmarks). sammccall: Long thread here :-) Kadir: one wrinkle that's maybe missed here is that clangd's…
		sammccallUnsubmitted Done Reply Inline Actions the benchmark you provided is actually running Hover(and other feautres) on every (spelled) token of the file I finally understood this. You're right, but we want to reuse some of the clangd code in batch mode (for code review workflows), which means exactly this! If you say that it is beneficial even for the normal hover case I'd love to have numbers, but my understanding is: parse >> selectiontree > hover. So this should be a >50% speedup for hover in the cached AST case. Though that case is fast anyway, so it's not critically important. sammccall: > the benchmark you provided is actually running Hover(and other feautres) on every (spelled)…
llvm::DenseMap<FileID, MarkedFile> Files;		llvm::DenseMap<FileID, MarkedFile> Files;
// The value is never null, pointer instead of reference to avoid disabling		// The value is never null, pointer instead of reference to avoid disabling
// implicit assignment operator.		// implicit assignment operator.
const SourceManager *SourceMgr;		const SourceManager *SourceMgr;
};		};

/// The spelled tokens that overlap or touch a spelling location Loc.		/// The spelled tokens that overlap or touch a spelling location Loc.
/// This always returns 0-2 tokens.		/// This always returns 0-2 tokens.
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

clang/lib/Tooling/Syntax/Tokens.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	llvm::StringRef FileRange::text(const SourceManager &SM) const {
StringRef Text = SM.getBufferData(File, &Invalid);		StringRef Text = SM.getBufferData(File, &Invalid);
if (Invalid)		if (Invalid)
return "";		return "";
assert(Begin <= Text.size());		assert(Begin <= Text.size());
assert(End <= Text.size());		assert(End <= Text.size());
return Text.substr(Begin, length());		return Text.substr(Begin, length());
}		}

		void TokenBuffer::indexExpandedTokens() {
		// No-op if the index is already created.
		if (!ExpandedTokIndex.empty())
		return;
		ExpandedTokIndex.reserve(ExpandedTokens.size());
		// Index ExpandedTokens for faster lookups by SourceLocation.
		for (size_t I = 0, E = ExpandedTokens.size(); I != E; ++I)
		ExpandedTokIndex[ExpandedTokens[I].location()] = I;
		}

llvm::ArrayRef<syntax::Token> TokenBuffer::expandedTokens(SourceRange R) const {		llvm::ArrayRef<syntax::Token> TokenBuffer::expandedTokens(SourceRange R) const {
		if (!ExpandedTokIndex.empty()) {
		// Quick lookup if `R` is a token range.
		// This is a huge win since majority of the users use ranges provided by an
		// AST. Ranges in AST are token ranges from expanded token stream.
		const auto B = ExpandedTokIndex.find(R.getBegin());
		const auto E = ExpandedTokIndex.find(R.getEnd());
		if (B != ExpandedTokIndex.end() && E != ExpandedTokIndex.end()) {
		// Add 1 to End to make a half-open range.
		return {ExpandedTokens.data() + B->getSecond(),
		ExpandedTokens.data() + E->getSecond() + 1};
		}
		}
		// Slow case. Use `isBeforeInTranslationUnit` to binary search for the
		// required range.
return getTokensCovering(expandedTokens(), R, *SourceMgr);		return getTokensCovering(expandedTokens(), R, *SourceMgr);
}		}

CharSourceRange FileRange::toCharRange(const SourceManager &SM) const {		CharSourceRange FileRange::toCharRange(const SourceManager &SM) const {
return CharSourceRange(		return CharSourceRange(
SourceRange(SM.getComposedLoc(File, Begin), SM.getComposedLoc(File, End)),		SourceRange(SM.getComposedLoc(File, Begin), SM.getComposedLoc(File, End)),
/IsTokenRange=/false);		/IsTokenRange=/false);
}		}
▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	Builder(std::vector<syntax::Token> Expanded, PPExpansions CollectedExpansions,
LangOpts(LangOpts) {		LangOpts(LangOpts) {
Result.ExpandedTokens = std::move(Expanded);		Result.ExpandedTokens = std::move(Expanded);
}		}

TokenBuffer build() && {		TokenBuffer build() && {
assert(!Result.ExpandedTokens.empty());		assert(!Result.ExpandedTokens.empty());
assert(Result.ExpandedTokens.back().kind() == tok::eof);		assert(Result.ExpandedTokens.back().kind() == tok::eof);

// Tokenize every file that contributed tokens to the expanded stream.		// Tokenize every file that contributed tokens to the expanded stream.
		sammccallUnsubmitted Done Reply Inline Actions I think it's worth reserve()ing the map size here sammccall: I think it's worth reserve()ing the map size here
buildSpelledTokens();		buildSpelledTokens();

		kadircetUnsubmitted Done Reply Inline Actions nit: might look better as a `for(size_t I=0,E=Tokens.size();I!=E;++I) ....` kind of loop. kadircet: nit: might look better as a `for(size_t I=0,E=Tokens.size();I!=E;++I) ....` kind of loop.
// The expanded token stream consists of runs of tokens that came from		// The expanded token stream consists of runs of tokens that came from
// the same source (a macro expansion, part of a file etc).		// the same source (a macro expansion, part of a file etc).
// Between these runs are the logical positions of spelled tokens that		// Between these runs are the logical positions of spelled tokens that
// didn't expand to anything.		// didn't expand to anything.
while (NextExpanded < Result.ExpandedTokens.size() - 1 /* eof */) {		while (NextExpanded < Result.ExpandedTokens.size() - 1 /* eof */) {
// Create empty mappings for spelled tokens that expanded to nothing here.		// Create empty mappings for spelled tokens that expanded to nothing here.
// May advance NextSpelled, but NextExpanded is unchanged.		// May advance NextSpelled, but NextExpanded is unchanged.
discard();		discard();
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

clang/unittests/Tooling/Syntax/TokensTest.cpp

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	public:
bool BeginSourceFileAction(CompilerInstance &CI) override {		bool BeginSourceFileAction(CompilerInstance &CI) override {
assert(!Collector && "expected only a single call to BeginSourceFile");		assert(!Collector && "expected only a single call to BeginSourceFile");
Collector.emplace(CI.getPreprocessor());		Collector.emplace(CI.getPreprocessor());
return true;		return true;
}		}
void EndSourceFileAction() override {		void EndSourceFileAction() override {
assert(Collector && "BeginSourceFileAction was never called");		assert(Collector && "BeginSourceFileAction was never called");
Result = std::move(*Collector).consume();		Result = std::move(*Collector).consume();
		Result.indexExpandedTokens();
}		}

std::unique_ptr<ASTConsumer>		std::unique_ptr<ASTConsumer>
CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {		CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {
return std::make_unique<ASTConsumer>();		return std::make_unique<ASTConsumer>();
}		}

private:		private:
▲ Show 20 Lines • Show All 933 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang][Syntax] Optimize expandedTokens for token ranges.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 333052

clang-tools-extra/clangd/ParsedAST.cpp

clang/include/clang/Tooling/Syntax/Tokens.h

clang/lib/Tooling/Syntax/Tokens.cpp

clang/unittests/Tooling/Syntax/TokensTest.cpp

[clang][Syntax] Optimize expandedTokens for token ranges.
ClosedPublic