This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clangd/index/
-
index/
3/11
Index.h
3/3
Index.cpp
7/9
SymbolCollector.h
10/10
SymbolCollector.cpp
-
unittests/clangd/
-
clangd/
1/2
SymbolCollectorTests.cpp

Differential D50385

[clangd] Collect symbol occurrences in SymbolCollector
ClosedPublic

Authored by hokein on Aug 7 2018, 6:14 AM.

Download Raw Diff

Details

Reviewers

ilya-biryukov
ioeric
sammccall

Summary

SymbolCollector will be used for two cases:

collect Symbol type only, used for indexing preamble AST.
collect Symbol and SymbolOccurrences, used for indexing main AST.

For finding local references from the AST, we will implement it in other ways.

Diff Detail

Repository

rCTE Clang Tools Extra

Build Status

Buildable 22001
Build 22001: arc lint + arc unit

Event Timeline

hokein created this revision.Aug 7 2018, 6:14 AM

Herald added subscribers: arphaman, mgrang, jkorous and 2 others. · View Herald TranscriptAug 7 2018, 6:14 AM

Harbormaster completed remote builds in B21181: Diff 159495.Aug 7 2018, 6:14 AM

2 high-level questions:

What's the reason for having a separate SymbolOccurrenceSlab? Could store occurrences as extra payload of Symbol?

Could we merge SymbolOccurrenceCollector into the existing SymbolCollector? They look a lot alike. Having another index data consumer seems like more overhead on the user side.

In D50385#1191914, @ioeric wrote:

2 high-level questions:

What's the reason for having a separate SymbolOccurrenceSlab? Could store occurrences as extra payload of Symbol?

Storing occurrences in Symbol structure is easy to misuse by users IMO -- if we go through this way, we will end up having a getOccurrences-like method in Symbol structure. Once users get the Symbol instance, it is natural for them to call getOccurrences to get all occurrences of the symbol. However this getOccurrences method doesn't do what users expected (just returning an incomplete set of results or empty). To query the symbol occurrences, we should always use index interface.

Therefore, I think we should try to avoid these confusions in the design.

Could we merge SymbolOccurrenceCollector into the existing SymbolCollector? They look a lot alike. Having another index data consumer seems like more overhead on the user side.

The SymbolOccurrenceCollector has many responsibilities (collecting declaration, definition, code completion information etc), and the code is growing complex now. Merging the SymbolOccurrenceCollector to it will make it more complicated -- we will introduce more option flags like collect-symbol-only, collect-occurrence-only to configure it for our different use cases (we need to the implementation detail clearly in order to make a correct option for SymbolCollector). And I can foresee these two collectors might be run at different point (runWithPreamble vs runWithAST) in dynamic index.

They might use same facilities, but we could always share them.

In D50385#1193545, @hokein wrote:

In D50385#1191914, @ioeric wrote:

2 high-level questions:

What's the reason for having a separate SymbolOccurrenceSlab? Could store occurrences as extra payload of Symbol?

Storing occurrences in Symbol structure is easy to misuse by users IMO -- if we go through this way, we will end up having a getOccurrences-like method in Symbol structure. Once users get the Symbol instance, it is natural for them to call getOccurrences to get all occurrences of the symbol. However this getOccurrences method doesn't do what users expected (just returning an incomplete set of results or empty). To query the symbol occurrences, we should always use index interface.

Therefore, I think we should try to avoid these confusions in the design.

Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.

Could we merge SymbolOccurrenceCollector into the existing SymbolCollector? They look a lot alike. Having another index data consumer seems like more overhead on the user side.

The SymbolOccurrenceCollector has many responsibilities (collecting declaration, definition, code completion information etc), and the code is growing complex now. Merging the SymbolOccurrenceCollector to it will make it more

Although the existing SymbolCollector supports different options, I think it still has a pretty well-defined responsibility: gather information about symbols. IMO, cross-reference is one of the property of symbol, and I don't see strong reasons to keep them separated.

complicated -- we will introduce more option flags like collect-symbol-only, collect-occurrence-only to configure it for our different use cases (we need to the implementation detail clearly in order to make a correct option for SymbolCollector).

I think these options are reasonable if they turn out to be necessary. And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated. Conversely, I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.

And I can foresee these two collectors might be run at different point (runWithPreamble vs runWithAST) in dynamic index.

With some options, this should be a problem I think?

In D50385#1193600, @ioeric wrote:

In D50385#1193545, @hokein wrote:

In D50385#1191914, @ioeric wrote:

2 high-level questions:

What's the reason for having a separate SymbolOccurrenceSlab? Could store occurrences as extra payload of Symbol?

Storing occurrences in Symbol structure is easy to misuse by users IMO -- if we go through this way, we will end up having a getOccurrences-like method in Symbol structure. Once users get the Symbol instance, it is natural for them to call getOccurrences to get all occurrences of the symbol. However this getOccurrences method doesn't do what users expected (just returning an incomplete set of results or empty). To query the symbol occurrences, we should always use index interface.

Therefore, I think we should try to avoid these confusions in the design.

Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.

I think it's reasonable to keep occurrences away from Symbol's Detail field. Stashing them together is only fine for the collector API, having any way to directly access occurrences through Symbol will be totally confusing for all the other users.
E.g., the Index::lookup() will not provide occurrences in the Symbol instances it returns, and if the accessors for those will be there it will only add confusion. So +1 to keeping them out of the Symbol class.

On the other hand, SymbolSlab feels like a perfectly reasonable place to store the occurrences in addition to the symbols themselves and it feels we should reuse its memory arena for storing any strings we need to allocate, etc.

Could we merge SymbolOccurrenceCollector into the existing SymbolCollector? They look a lot alike. Having another index data consumer seems like more overhead on the user side.

The SymbolOccurrenceCollector has many responsibilities (collecting declaration, definition, code completion information etc), and the code is growing complex now. Merging the SymbolOccurrenceCollector to it will make it more

Although the existing SymbolCollector supports different options, I think it still has a pretty well-defined responsibility: gather information about symbols. IMO, cross-reference is one of the property of symbol, and I don't see strong reasons to keep them separated.

complicated -- we will introduce more option flags like collect-symbol-only, collect-occurrence-only to configure it for our different use cases (we need to the implementation detail clearly in order to make a correct option for SymbolCollector).

I think these options are reasonable if they turn out to be necessary. And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated. Conversely, I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.

And I can foresee these two collectors might be run at different point (runWithPreamble vs runWithAST) in dynamic index.

With some options, this should be a problem I think?

+1 to merging into the SymbolCollector. Keeping the responsibilities separate inside a single class should be easy, e.g. something like that should be simple enough:

SymbolCollector::handleDeclOccurence(args) {
  this->processForSymbol(args); // handles keeping the Symbol structure up-to-date, i.e. adds definition locations, etc.
  this->processForOccurrences(args); // appends occurrences to a list of xrefs.
};

The main advantage that we get is less clang-specific boilerplate. The less IndexDataConsumers, FrontendActionFactorys, FrontendActions we create, the more focused and concise our code is.
And in that case, SymbolCollector is already handling those responsibilities for us and reusing looks like a good idea.

Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.

My concerns of storing occurrences as an extra payload of Symbol are:

SymbolSlab is more like an implementation detail. Users of SymbolIndex are not aware of it, they only get Symbol objects, so it easily confuses users if they see any occurrence-related interface/member in Symbol. And we will write a looong comment explaining its correct behavior. It'd be better if we avoid this confusion in the API level.
The fields in Symbol structure are symbol properties, and could be stored in memory. However, occurrences are not, we can't guarantee that.
It seems that we are coupling ID, Symbol, SymbolOccurrence together: in the index implementation, we will go through ID=>Symbol=>Occurrences rather than ID=>Occurrences.

I think these options are reasonable if they turn out to be necessary.

I think they are necessary. For collecting all occurrences for local symbols from the AST, we only need symbol occurrence information, other information (e.g. declaration&definition location, #include) should be discarded; Index for code completion should not collect symbol occurrences.

And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated.

If xrefs is quite isolated, I think it is a good signal to have a dedicated class handling it.

I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.

Merging xrefs to SymbolCollector couldn't avoid these problems, I think it is a matter of where we put these code:

different initialization of SymbolCollector for different use cases (e.g. setting different flags in SymbolCollectorOptions).
for dynamic index, index for xrefs and code completion would be triggered at different point: index for xrefs should happen when AST is ready; index for code completion happens when Preamble is ready; we might end up with two slabs instances in the dynamic index (1 symbol slab + 1 occurrence slab vs. 2 symbol slabs).

The duplication is mainly about AST frontend action boilerplate code. To eliminate it, we could do some refactorings:

get rid of the clang ast action code in SymbolCollector, and SymbolOccurrenceCollector
introduce an IndexSymbol which is a subclass index::IndexDataConsumer
the IndexSymbol has two mode (indexing symbol or indexing occurrence), and dispatch ast information to SymbolCollector/SymbolOccurrenceCollector.

Update the patch based on our offline discussion

only one single clang intefaces implementation, and move finding references to current symbol collector;
store references in SymbolSlab;

Harbormaster completed remote builds in B21775: Diff 161927.Aug 22 2018, 5:29 AM

Herald added a subscriber: kadircet. · View Herald TranscriptAug 22 2018, 5:29 AM

ilya-biryukov added inline comments.Aug 22 2018, 6:34 AM

clangd/index/Index.cpp
134	NIT: remove the lambda? using `<` is the default.
140	NIT: remove the lambda? Using `==` is the default.
153	Is this used for debugging? In that case maybe consider having a user-readable representation instead of the number?
clangd/index/Index.h
46	NIT: having friend decls inside the classes themselves might prove to be more readable. Not opposed to the current one too, feel free to ignore.
292	Maybe add a comment or remove the empty line?
293	Any store occurences in a file-centric manner? E.g. /// Occurences inside a single file. class FileOccurences { StringRef File; vector<pair<Point, OccurenceKind>> Locations; }; // .... DenseMap<SymbolID, vector<FileOccurences>> SymbolOccurences; As discussed previously, this representation is better suited for both merging and serialization.
clangd/index/SymbolCollector.cpp
272	NIT: maybe use early exits and inverted conditions to keep the nesting down?
321	If we any `Options` here, why have an extra `CollectorSymbolOptions`?
clangd/index/SymbolCollector.h
59	Could you elaborate on what this option will be used for? How do we know in advance which symbols we're interested in?

Address review comments.

Harbormaster completed remote builds in B21786: Diff 161962.Aug 22 2018, 8:07 AM

Add one more comment.

Harbormaster completed remote builds in B21789: Diff 161972.Aug 22 2018, 8:44 AM

hokein added inline comments.Aug 22 2018, 9:02 AM

clangd/index/Index.h
46	These operator implementations seem not as much interesting as members in the structure, putting them to the structure probably adds some noise to readers.
293	The file-centric manner doesn't seem to suite our current model: whenever we update the index for the main AST, we just replace the symbol slab with the new one; and for index merging, we only use the index `findOccurrences` interfaces. It would save some memory usage of `StringRef` File, but AFAIK, the memory usage of current model is relatively small (comparing with the SymbolSlab for code completion) since we only store occurrences in main file (~50KB for `CodeComplete.cpp`). I'd leave it as it is now, and we could revisit it later.
clangd/index/SymbolCollector.h
59	This is used for finding references in the AST as a part of the xref implementation, basically the workflow would be: find SymbolIDs of the symbol under the cursor, using `DeclarationAndMacrosFinder` run symbol collector to find all occurrences in the main AST with all SymbolIDs in #1 query the index, to get more occurrences merge them

ilya-biryukov added inline comments.Aug 23 2018, 5:58 AM

clangd/index/Index.h
46	Ok, LG outside too
293	Isn't the merging model different for the occurrences? We would actually have to drop all references from the older index when merging if the new one contains locations in the same file. If the merge if file-centric, the file-based representation makes more sense in the first place. Apart from simpler merging the code, the file-based representation also buys us more efficient serialization for the static index, arguably efficient enough to stash all the occurrences even into our YAML index. Postponing till later is also fine, but I'm not sure it buys us much now. These arguments only apply if we think the file-centric approach is a the right final design, though.
clangd/index/SymbolCollector.h
59	Can we instead find all the occurences in `DeclarationAndMacrosFinder` directly? Extra run of `SymbolCollector` means another AST traversal, which is slow by itself, and SymbolCollector s designed for a much more hairy problem, its interface is just not nicely suited for things like only occurrences. The latter seems to be a simpler problem, and we can have a simpler interface to solve it (possibly shared between SymbolCollector and DeclarationAndMacrosFinder). WDYT?

ioeric added inline comments.Aug 24 2018, 1:42 AM

clangd/index/SymbolCollector.h
67	Use `llvm::Optional`?

ioeric added inline comments.Aug 24 2018, 2:52 AM

clangd/index/SymbolCollector.cpp
241	I don't see a strong reason for the separation of `CollectOccurrence` and `CollectSymbol`. There are some pieceis that are only used by one of them, but they seem cheap enough to ignore? Intuitively, it seems to me reference collection could just be a member function of `SymbolCollector`.

hokein added a reviewer: sammccall.Aug 24 2018, 3:03 AM

sammccall added inline comments.Aug 24 2018, 8:10 AM

clangd/index/Index.h
267	As discussed offline: the merge of occurrences into SymbolSlab seems problematic to me. On the consumer side, we have a separation between Symbol APIs and SymbolOccurrence APIs - they don't really interact. The Symbol type can often only be used with SymbolSlab, and so including occurrences drags them into the mess for consumers that don't care about them. For producers (index implementations), they will usually have both and they may want to share arena storage. But this probably doesn't matter much, and if it does we can use another mechanism (like allowing SymbolSlabBuilder and SymbolOccurrenceSlab to share UniqueStringSaver)
clangd/index/SymbolCollector.cpp
327	note that here we've done basically all the work needed to record the occurrence. If you add a DenseMap<Decl*, {SourceLocation, SymbolRole}> then you'll have enough info at the end to fill in the occurrences, like we do with referenceddecls -> references.
clangd/index/SymbolCollector.h
40	Not sure this split is justified. if IDs goes away (see below), all that's left can be represented in a SymbolOccurenceKind filter (which is 0 to collect no occurrences)
59	Yeah, I don't think we need this. For "find references in the AST" we have an implementation in XRefs for highlights which we don't need to share.
69	collecting symbols doesn't actually need to be optional I think - it's the core responsibility of this class, and "find occurrences of a decl in an ast" can be implemented more easily in other ways

Update the patch based on our new discussion

SymbolOccurrenceSlab for storing underlying occurrence data
reuse SymbolCollector to collect symbol occurrences

hokein retitled this revision from [clangd] Collect symbol occurrences from AST. to [clangd] Collect symbol occurrences in SymbolCollector.Aug 26 2018, 10:42 PM

hokein edited the summary of this revision. (Show Details)

This looks pretty good!

clangd/index/Index.h
377	assert frozen? looking up in a non-frozen array is probably a mistake. if we choose to optimize this, it probably won't be possible.
378	return Occurrences.lookup(ID)?
clangd/index/SymbolCollector.cpp
234	nit: toOccurrenceKind
236	If you want to filter out the unsupported bits, maybe just add an explicit `AllOccurrenceKinds` constant to the header file, and `return AllOccurrenceKinds & Roles` here? (plus casts)
330	just compute the spelling loc once and reuse?
331	you get the spelling loc on the previous line to check for mainfile - so surely we should be using spelling loc here?
457	nit: const auto& for clarity since we're not mutating
462	so this seems maybe gratuitously inefficient, we're copying the filename then going through the URI conversion dance for each reference - even though the filename is the same for each. consider splitting out part of `getTokenLocation` into `getTokenRange(SymbolLocation&)` and only calling that here.
clangd/index/SymbolCollector.h
54–58	this should be next to OccurrenceFilter, they're very closely related (the name mismatch is a little unfortunate)
124	please move next to ReferencedDecls/ReferencedMacros so the comment applies to this too
unittests/clangd/SymbolCollectorTests.cpp
469	this is cute - if possible, consider adding a matcher factory function for readability here, so you can write `EXPECT_THAT(..., HaveRanges(Main.ranges("foo"))`

Address review comments and fix code style.

Minor cleanup.

Harbormaster completed remote builds in B21999: Diff 162854.Aug 28 2018, 7:24 AM

Harbormaster completed remote builds in B22001: Diff 162856.

hokein added inline comments.Aug 28 2018, 7:25 AM

clangd/index/Index.h
378	The `DenseMap::lookup` returns a copy of `Value` (`vector`) which doesn't suit our use case :( -- we will return an `ArrayRef` which stores an reference of a local `vector` object.
unittests/clangd/SymbolCollectorTests.cpp
469	Wrapped this into `HaveRanges`.

Address review comments in D51279.

Harbormaster completed remote builds in B22138: Diff 163512.Aug 31 2018, 5:25 AM

hokein mentioned this in D51279: [clangd] Implement findOccurrences interface in dynamic index..Aug 31 2018, 5:29 AM

sammccall accepted this revision.Aug 31 2018, 5:35 AM

This revision is now accepted and ready to land.Aug 31 2018, 5:35 AM

Committed in rL341208.

Revision Contents

Path

Size

clangd/

index/

81 lines

43 lines

15 lines

75 lines

unittests/

clangd/

SymbolCollectorTests.cpp

80 lines

Diff 162856

clangd/index/Index.h

Show All 25 Lines

struct SymbolLocation {		struct SymbolLocation {
// Specify a position (Line, Column) of symbol. Using Line/Column allows us to		// Specify a position (Line, Column) of symbol. Using Line/Column allows us to
// build LSP responses without reading the file content.		// build LSP responses without reading the file content.
struct Position {		struct Position {
uint32_t Line = 0; // 0-based		uint32_t Line = 0; // 0-based
// Using UTF-16 code units.		// Using UTF-16 code units.
uint32_t Column = 0; // 0-based		uint32_t Column = 0; // 0-based
bool operator==(const Position& P) const {
return Line == P.Line && Column == P.Column;
}
};		};

// The URI of the source file where a symbol occurs.		// The URI of the source file where a symbol occurs.
llvm::StringRef FileURI;		llvm::StringRef FileURI;

/// The symbol range, using half-open range [Start, End).		/// The symbol range, using half-open range [Start, End).
Position Start;		Position Start;
Position End;		Position End;

explicit operator bool() const { return !FileURI.empty(); }		explicit operator bool() const { return !FileURI.empty(); }
bool operator==(const SymbolLocation& Loc) const {
return std::tie(FileURI, Start, End) ==
std::tie(Loc.FileURI, Loc.Start, Loc.End);
}
};		};
		inline bool operator==(const SymbolLocation::Position &L,
		const SymbolLocation::Position &R) {
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions NIT: having friend decls inside the classes themselves might prove to be more readable. Not opposed to the current one too, feel free to ignore. ilya-biryukov: NIT: having friend decls inside the classes themselves might prove to be more readable. Not…
		hokeinAuthorUnsubmitted Not Done Reply Inline Actions These operator implementations seem not as much interesting as members in the structure, putting them to the structure probably adds some noise to readers. hokein: These operator implementations seem not as much interesting as members in the structure…
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions Ok, LG outside too ilya-biryukov: Ok, LG outside too
		return std::tie(L.Line, L.Column) == std::tie(R.Line, R.Column);
		}
		inline bool operator<(const SymbolLocation::Position &L,
		const SymbolLocation::Position &R) {
		return std::tie(L.Line, L.Column) < std::tie(R.Line, R.Column);
		}
		inline bool operator==(const SymbolLocation &L, const SymbolLocation &R) {
		return std::tie(L.FileURI, L.Start, L.End) ==
		std::tie(R.FileURI, R.Start, R.End);
		}
		inline bool operator<(const SymbolLocation &L, const SymbolLocation &R) {
		return std::tie(L.FileURI, L.Start, L.End) <
		std::tie(R.FileURI, R.Start, R.End);
		}
llvm::raw_ostream &operator<<(llvm::raw_ostream &, const SymbolLocation &);		llvm::raw_ostream &operator<<(llvm::raw_ostream &, const SymbolLocation &);

// The class identifies a particular C++ symbol (class, function, method, etc).		// The class identifies a particular C++ symbol (class, function, method, etc).
//		//
// As USRs (Unified Symbol Resolution) could be large, especially for functions		// As USRs (Unified Symbol Resolution) could be large, especially for functions
// with long type arguments, SymbolID is using 160-bits SHA1(USR) values to		// with long type arguments, SymbolID is using 160-bits SHA1(USR) values to
// guarantee the uniqueness of symbols while using a relatively small amount of		// guarantee the uniqueness of symbols while using a relatively small amount of
// memory (vs storing USRs directly).		// memory (vs storing USRs directly).
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	public:
size_t size() const { return Symbols.size(); }		size_t size() const { return Symbols.size(); }
// Estimates the total memory usage.		// Estimates the total memory usage.
size_t bytes() const {		size_t bytes() const {
return sizeof(*this) + Arena.getTotalMemory() +		return sizeof(*this) + Arena.getTotalMemory() +
Symbols.capacity() * sizeof(Symbol);		Symbols.capacity() * sizeof(Symbol);
}		}

// SymbolSlab::Builder is a mutable container that can 'freeze' to SymbolSlab.		// SymbolSlab::Builder is a mutable container that can 'freeze' to SymbolSlab.
// The frozen SymbolSlab will use less memory.		// The frozen SymbolSlab will use less memory.
		sammccallUnsubmitted Done Reply Inline Actions As discussed offline: the merge of occurrences into SymbolSlab seems problematic to me. On the consumer side, we have a separation between Symbol APIs and SymbolOccurrence APIs - they don't really interact. The Symbol type can often only be used with SymbolSlab, and so including occurrences drags them into the mess for consumers that don't care about them. For producers (index implementations), they will usually have both and they may want to share arena storage. But this probably doesn't matter much, and if it does we can use another mechanism (like allowing SymbolSlabBuilder and SymbolOccurrenceSlab to share UniqueStringSaver) sammccall: As discussed offline: the merge of occurrences into SymbolSlab seems problematic to me. On the…
class Builder {		class Builder {
public:		public:
Builder() : UniqueStrings(Arena) {}		Builder() : UniqueStrings(Arena) {}

// Adds a symbol, overwriting any existing one with the same ID.		// Adds a symbol, overwriting any existing one with the same ID.
// This is a deep copy: underlying strings will be owned by the slab.		// This is a deep copy: underlying strings will be owned by the slab.
void insert(const Symbol &S);		void insert(const Symbol &S);

// Returns the symbol with an ID, if it exists. Valid until next insert().		// Returns the symbol with an ID, if it exists. Valid until next insert().
const Symbol *find(const SymbolID &ID) {		const Symbol *find(const SymbolID &ID) {
auto I = SymbolIndex.find(ID);		auto I = SymbolIndex.find(ID);
return I == SymbolIndex.end() ? nullptr : &Symbols[I->second];		return I == SymbolIndex.end() ? nullptr : &Symbols[I->second];
}		}

// Consumes the builder to finalize the slab.		// Consumes the builder to finalize the slab.
SymbolSlab build() &&;		SymbolSlab build() &&;

private:		private:
llvm::BumpPtrAllocator Arena;		llvm::BumpPtrAllocator Arena;
// Intern table for strings. Contents are on the arena.		// Intern table for strings. Contents are on the arena.
llvm::UniqueStringSaver UniqueStrings;		llvm::UniqueStringSaver UniqueStrings;
std::vector<Symbol> Symbols;		std::vector<Symbol> Symbols;
// Values are indices into Symbols vector.		// Values are indices into Symbols vector.
llvm::DenseMap<SymbolID, size_t> SymbolIndex;		llvm::DenseMap<SymbolID, size_t> SymbolIndex;
};		};
		ilya-biryukovUnsubmitted Done Reply Inline Actions Maybe add a comment or remove the empty line? ilya-biryukov: Maybe add a comment or remove the empty line?

		ilya-biryukovUnsubmitted Not Done Reply Inline Actions Any store occurences in a file-centric manner? E.g. /// Occurences inside a single file. class FileOccurences { StringRef File; vector<pair<Point, OccurenceKind>> Locations; }; // .... DenseMap<SymbolID, vector<FileOccurences>> SymbolOccurences; As discussed previously, this representation is better suited for both merging and serialization. ilya-biryukov: Any store occurences in a file-centric manner? E.g. ``` /// Occurences inside a single file.
		hokeinAuthorUnsubmitted Not Done Reply Inline Actions The file-centric manner doesn't seem to suite our current model: whenever we update the index for the main AST, we just replace the symbol slab with the new one; and for index merging, we only use the index `findOccurrences` interfaces. It would save some memory usage of `StringRef` File, but AFAIK, the memory usage of current model is relatively small (comparing with the SymbolSlab for code completion) since we only store occurrences in main file (~50KB for `CodeComplete.cpp`). I'd leave it as it is now, and we could revisit it later. hokein: The file-centric manner doesn't seem to suite our current model: whenever we update the index…
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions Isn't the merging model different for the occurrences? We would actually have to drop all references from the older index when merging if the new one contains locations in the same file. If the merge if file-centric, the file-based representation makes more sense in the first place. Apart from simpler merging the code, the file-based representation also buys us more efficient serialization for the static index, arguably efficient enough to stash all the occurrences even into our YAML index. Postponing till later is also fine, but I'm not sure it buys us much now. These arguments only apply if we think the file-centric approach is a the right final design, though. ilya-biryukov: Isn't the merging model different for the occurrences? We would actually have to drop all…
private:		private:
SymbolSlab(llvm::BumpPtrAllocator Arena, std::vector<Symbol> Symbols)		SymbolSlab(llvm::BumpPtrAllocator Arena, std::vector<Symbol> Symbols)
: Arena(std::move(Arena)), Symbols(std::move(Symbols)) {}		: Arena(std::move(Arena)), Symbols(std::move(Symbols)) {}

llvm::BumpPtrAllocator Arena; // Owns Symbol data that the Symbols do not.		llvm::BumpPtrAllocator Arena; // Owns Symbol data that the Symbols do not.
std::vector<Symbol> Symbols; // Sorted by SymbolID to allow lookup.		std::vector<Symbol> Symbols; // Sorted by SymbolID to allow lookup.
};		};

Show All 15 Lines	inline SymbolOccurrenceKind &operator\|=(SymbolOccurrenceKind &L,
SymbolOccurrenceKind R) {		SymbolOccurrenceKind R) {
return L = L \| R;		return L = L \| R;
}		}
inline SymbolOccurrenceKind operator&(SymbolOccurrenceKind A,		inline SymbolOccurrenceKind operator&(SymbolOccurrenceKind A,
SymbolOccurrenceKind B) {		SymbolOccurrenceKind B) {
return static_cast<SymbolOccurrenceKind>(static_cast<uint8_t>(A) &		return static_cast<SymbolOccurrenceKind>(static_cast<uint8_t>(A) &
static_cast<uint8_t>(B));		static_cast<uint8_t>(B));
}		}
		static const SymbolOccurrenceKind AllOccurrenceKinds =
		SymbolOccurrenceKind::Declaration \| SymbolOccurrenceKind::Definition \|
		SymbolOccurrenceKind::Reference;

// Represents a symbol occurrence in the source file. It could be a		// Represents a symbol occurrence in the source file. It could be a
// declaration/definition/reference occurrence.		// declaration/definition/reference occurrence.
//		//
// WARNING: Location does not own the underlying data - Copies are shallow.		// WARNING: Location does not own the underlying data - Copies are shallow.
struct SymbolOccurrence {		struct SymbolOccurrence {
// The location of the occurrence.		// The location of the occurrence.
SymbolLocation Location;		SymbolLocation Location;
SymbolOccurrenceKind Kind = SymbolOccurrenceKind::Unknown;		SymbolOccurrenceKind Kind = SymbolOccurrenceKind::Unknown;
};		};
		inline bool operator<(const SymbolOccurrence &L, const SymbolOccurrence &R) {
		return std::tie(L.Location, L.Kind) < std::tie(R.Location, R.Kind);
		}
		inline bool operator==(const SymbolOccurrence &L, const SymbolOccurrence &R) {
		return std::tie(L.Location, L.Kind) == std::tie(R.Location, R.Kind);
		}
		llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
		const SymbolOccurrence &Occurrence);

		// An efficient structure of storing large set of symbol occurrences in memory.
		// Filenames are deduplicated.
		class SymbolOccurrenceSlab {
		public:
		using const_iterator =
		llvm::DenseMap<SymbolID, std::vector<SymbolOccurrence>>::const_iterator;
		using iterator = const_iterator;

		SymbolOccurrenceSlab() : UniqueStrings(Arena) {}

		// Define move semantics for the slab, allowing assignment from an rvalue.
		// Implicit move assignment is deleted by the compiler because
		// StringSaver has a reference type member.
		SymbolOccurrenceSlab(SymbolOccurrenceSlab &&Slab) = default;
		SymbolOccurrenceSlab &operator=(SymbolOccurrenceSlab &&RHS) {
		assert(RHS.Frozen &&
		"SymbolOcucrrenceSlab must be frozen when move assigned!");
		Arena = std::move(RHS.Arena);
		Frozen = true;
		Occurrences = std::move(RHS.Occurrences);
		return *this;
		}

		const_iterator begin() const { return Occurrences.begin(); }
		const_iterator end() const { return Occurrences.end(); }

		// Adds a symbol occurrence.
		// This is a deep copy: underlying FileURI will be owned by the slab.
		void insert(const SymbolID &SymID, const SymbolOccurrence &Occurrence);

		llvm::ArrayRef<SymbolOccurrence> find(const SymbolID &ID) const {
		sammccallUnsubmitted Done Reply Inline Actions assert frozen? looking up in a non-frozen array is probably a mistake. if we choose to optimize this, it probably won't be possible. sammccall: assert frozen? looking up in a non-frozen array is probably a mistake. if we choose to optimize…
		assert(Frozen && "SymbolOccurrenceSlab must be frozen before looking up!");
		sammccallUnsubmitted Not Done Reply Inline Actions return Occurrences.lookup(ID)? sammccall: return Occurrences.lookup(ID)?
		hokeinAuthorUnsubmitted Not Done Reply Inline Actions The `DenseMap::lookup` returns a copy of `Value` (`vector`) which doesn't suit our use case :( -- we will return an `ArrayRef` which stores an reference of a local `vector` object. hokein: The `DenseMap::lookup` returns a copy of `Value` (`vector`) which doesn't suit our use case…
		auto It = Occurrences.find(ID);
		if (It == Occurrences.end())
		return {};
		return It->second;
		}

		void freeze();

		private:
		bool Frozen = false;
		llvm::BumpPtrAllocator Arena;
		llvm::UniqueStringSaver UniqueStrings;
		llvm::DenseMap<SymbolID, std::vector<SymbolOccurrence>> Occurrences;
		};

struct FuzzyFindRequest {		struct FuzzyFindRequest {
/// \brief A query string for the fuzzy find. This is matched against symbols'		/// \brief A query string for the fuzzy find. This is matched against symbols'
/// un-qualified identifiers and should not contain qualifiers like "::".		/// un-qualified identifiers and should not contain qualifiers like "::".
std::string Query;		std::string Query;
/// \brief If this is non-empty, symbols must be in at least one of the scopes		/// \brief If this is non-empty, symbols must be in at least one of the scopes
/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"		/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"
/// is provided, the matched symbols must be defined in namespace xyz but not		/// is provided, the matched symbols must be defined in namespace xyz but not
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

clangd/index/Index.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	SymbolSlab SymbolSlab::Builder::build() && {
// We may have unused strings from overwritten symbols. Build a new arena.		// We may have unused strings from overwritten symbols. Build a new arena.
BumpPtrAllocator NewArena;		BumpPtrAllocator NewArena;
llvm::UniqueStringSaver Strings(NewArena);		llvm::UniqueStringSaver Strings(NewArena);
for (auto &S : Symbols)		for (auto &S : Symbols)
own(S, Strings, NewArena);		own(S, Strings, NewArena);
return SymbolSlab(std::move(NewArena), std::move(Symbols));		return SymbolSlab(std::move(NewArena), std::move(Symbols));
}		}

		raw_ostream &operator<<(raw_ostream &OS, SymbolOccurrenceKind K) {
		if (K == SymbolOccurrenceKind::Unknown)
		return OS << "Unknown";
		static const std::vector<const char *> Messages = {"Decl", "Def", "Ref"};
		ilya-biryukovUnsubmitted Done Reply Inline Actions NIT: remove the lambda? using `<` is the default. ilya-biryukov: NIT: remove the lambda? using `<` is the default.
		bool VisitedOnce = false;
		for (unsigned I = 0; I < Messages.size(); ++I) {
		if (static_cast<uint8_t>(K) & 1u << I) {
		if (VisitedOnce)
		OS << ", ";
		OS << Messages[I];
		ilya-biryukovUnsubmitted Done Reply Inline Actions NIT: remove the lambda? Using `==` is the default. ilya-biryukov: NIT: remove the lambda? Using `==` is the default.
		VisitedOnce = true;
		}
		}
		return OS;
		}

		llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
		const SymbolOccurrence &Occurrence) {
		OS << Occurrence.Location << ":" << Occurrence.Kind;
		return OS;
		}

		void SymbolOccurrenceSlab::insert(const SymbolID &SymID,
		ilya-biryukovUnsubmitted Done Reply Inline Actions Is this used for debugging? In that case maybe consider having a user-readable representation instead of the number? ilya-biryukov: Is this used for debugging? In that case maybe consider having a user-readable representation…
		const SymbolOccurrence &Occurrence) {
		assert(!Frozen &&
		"Can't insert a symbol occurrence after the slab has been frozen!");
		auto &SymOccurrences = Occurrences[SymID];
		SymOccurrences.push_back(Occurrence);
		SymOccurrences.back().Location.FileURI =
		UniqueStrings.save(Occurrence.Location.FileURI);
		}

		void SymbolOccurrenceSlab::freeze() {
		// Deduplicate symbol occurrenes.
		for (auto &IDAndOccurrence : Occurrences) {
		auto &Occurrence = IDAndOccurrence.getSecond();
		std::sort(Occurrence.begin(), Occurrence.end());
		Occurrence.erase(std::unique(Occurrence.begin(), Occurrence.end()),
		Occurrence.end());
		}
		Frozen = true;
		}

} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

clangd/index/SymbolCollector.h

	Show All 31 Lines
	/// See also shouldCollectSymbol(...).			/// See also shouldCollectSymbol(...).
	///			///
	/// Clients (e.g. clangd) can use SymbolCollector together with			/// Clients (e.g. clangd) can use SymbolCollector together with
	/// index::indexTopLevelDecls to retrieve all symbols when the source file is			/// index::indexTopLevelDecls to retrieve all symbols when the source file is
	/// changed.			/// changed.
	class SymbolCollector : public index::IndexDataConsumer {			class SymbolCollector : public index::IndexDataConsumer {
	public:			public:
	struct Options {			struct Options {
	/// When symbol paths cannot be resolved to absolute paths (e.g. files in			/// When symbol paths cannot be resolved to absolute paths (e.g. files in
				sammccallUnsubmitted Done Reply Inline Actions Not sure this split is justified. if IDs goes away (see below), all that's left can be represented in a SymbolOccurenceKind filter (which is 0 to collect no occurrences) sammccall: Not sure this split is justified. if IDs goes away (see below), all that's left can be…
	/// VFS that does not have absolute path), combine the fallback directory			/// VFS that does not have absolute path), combine the fallback directory
	/// with symbols' paths to get absolute paths. This must be an absolute			/// with symbols' paths to get absolute paths. This must be an absolute
	/// path.			/// path.
	std::string FallbackDir;			std::string FallbackDir;
	/// Specifies URI schemes that can be used to generate URIs for file paths			/// Specifies URI schemes that can be used to generate URIs for file paths
	/// in symbols. The list of schemes will be tried in order until a working			/// in symbols. The list of schemes will be tried in order until a working
	/// scheme is found. If no scheme works, symbol location will be dropped.			/// scheme is found. If no scheme works, symbol location will be dropped.
	std::vector<std::string> URISchemes = {"file"};			std::vector<std::string> URISchemes = {"file"};
	bool CollectIncludePath = false;			bool CollectIncludePath = false;
	/// If set, this is used to map symbol #include path to a potentially			/// If set, this is used to map symbol #include path to a potentially
	/// different #include path.			/// different #include path.
	const CanonicalIncludes *Includes = nullptr;			const CanonicalIncludes *Includes = nullptr;
	// Populate the Symbol.References field.			// Populate the Symbol.References field.
	bool CountReferences = false;			bool CountReferences = false;
				/// The symbol occurrence kind that will be collected.
				/// If not set (Unknown), SymbolCollector will not collect any symbol
				/// occurrences.
				SymbolOccurrenceKind OccurrenceFilter = SymbolOccurrenceKind::Unknown;
				sammccallUnsubmitted Done Reply Inline Actions this should be next to OccurrenceFilter, they're very closely related (the name mismatch is a little unfortunate) sammccall: this should be next to OccurrenceFilter, they're very closely related (the name mismatch is a…
	// Every symbol collected will be stamped with this origin.			// Every symbol collected will be stamped with this origin.
				ilya-biryukovUnsubmitted Done Reply Inline Actions Could you elaborate on what this option will be used for? How do we know in advance which symbols we're interested in? ilya-biryukov: Could you elaborate on what this option will be used for? How do we know in advance which…
				hokeinAuthorUnsubmitted Not Done Reply Inline Actions This is used for finding references in the AST as a part of the xref implementation, basically the workflow would be: find SymbolIDs of the symbol under the cursor, using `DeclarationAndMacrosFinder` run symbol collector to find all occurrences in the main AST with all SymbolIDs in #1 query the index, to get more occurrences merge them hokein: This is used for finding references in the AST as a part of the xref implementation, basically…
				ilya-biryukovUnsubmitted Not Done Reply Inline Actions Can we instead find all the occurences in `DeclarationAndMacrosFinder` directly? Extra run of `SymbolCollector` means another AST traversal, which is slow by itself, and SymbolCollector s designed for a much more hairy problem, its interface is just not nicely suited for things like only occurrences. The latter seems to be a simpler problem, and we can have a simpler interface to solve it (possibly shared between SymbolCollector and DeclarationAndMacrosFinder). WDYT? ilya-biryukov: Can we instead find all the occurences in `DeclarationAndMacrosFinder` directly? Extra run of…
				sammccallUnsubmitted Done Reply Inline Actions Yeah, I don't think we need this. For "find references in the AST" we have an implementation in XRefs for highlights which we don't need to share. sammccall: Yeah, I don't think we need this. For "find references in the AST" we have an implementation in…
	SymbolOrigin Origin = SymbolOrigin::Unknown;			SymbolOrigin Origin = SymbolOrigin::Unknown;
	/// Collect macros.			/// Collect macros.
	/// Note that SymbolCollector must be run with preprocessor in order to			/// Note that SymbolCollector must be run with preprocessor in order to
	/// collect macros. For example, `indexTopLevelDecls` will not index any			/// collect macros. For example, `indexTopLevelDecls` will not index any
	/// macro even if this is true.			/// macro even if this is true.
	bool CollectMacro = false;			bool CollectMacro = false;
	};			};

				ioericUnsubmitted Done Reply Inline Actions Use `llvm::Optional`? ioeric: Use `llvm::Optional`?
	SymbolCollector(Options Opts);			SymbolCollector(Options Opts);

				sammccallUnsubmitted Done Reply Inline Actions collecting symbols doesn't actually need to be optional I think - it's the core responsibility of this class, and "find occurrences of a decl in an ast" can be implemented more easily in other ways sammccall: collecting symbols doesn't actually need to be optional I think - it's the core responsibility…
	/// Returns true is \p ND should be collected.			/// Returns true is \p ND should be collected.
	/// AST matchers require non-const ASTContext.			/// AST matchers require non-const ASTContext.
	static bool shouldCollectSymbol(const NamedDecl &ND, ASTContext &ASTCtx,			static bool shouldCollectSymbol(const NamedDecl &ND, ASTContext &ASTCtx,
	const Options &Opts);			const Options &Opts);

	void initialize(ASTContext &Ctx) override;			void initialize(ASTContext &Ctx) override;

	void setPreprocessor(std::shared_ptr<Preprocessor> PP) override {			void setPreprocessor(std::shared_ptr<Preprocessor> PP) override {
	this->PP = std::move(PP);			this->PP = std::move(PP);
	}			}

	bool			bool
	handleDeclOccurence(const Decl *D, index::SymbolRoleSet Roles,			handleDeclOccurence(const Decl *D, index::SymbolRoleSet Roles,
	ArrayRef<index::SymbolRelation> Relations,			ArrayRef<index::SymbolRelation> Relations,
	SourceLocation Loc,			SourceLocation Loc,
	index::IndexDataConsumer::ASTNodeInfo ASTNode) override;			index::IndexDataConsumer::ASTNodeInfo ASTNode) override;

	bool handleMacroOccurence(const IdentifierInfo Name, const MacroInfo MI,			bool handleMacroOccurence(const IdentifierInfo Name, const MacroInfo MI,
	index::SymbolRoleSet Roles,			index::SymbolRoleSet Roles,
	SourceLocation Loc) override;			SourceLocation Loc) override;

	SymbolSlab takeSymbols() { return std::move(Symbols).build(); }			SymbolSlab takeSymbols() { return std::move(Symbols).build(); }

				SymbolOccurrenceSlab takeOccurrences() {
				return std::move(SymbolOccurrences);
				}

	void finish() override;			void finish() override;

	private:			private:
	const Symbol *addDeclaration(const NamedDecl &, SymbolID);			const Symbol *addDeclaration(const NamedDecl &, SymbolID);
	void addDefinition(const NamedDecl &, const Symbol &DeclSymbol);			void addDefinition(const NamedDecl &, const Symbol &DeclSymbol);

	// All Symbols collected from the AST.			// All Symbols collected from the AST.
	SymbolSlab::Builder Symbols;			SymbolSlab::Builder Symbols;
	ASTContext *ASTCtx;			ASTContext *ASTCtx;
	std::shared_ptr<Preprocessor> PP;			std::shared_ptr<Preprocessor> PP;
	std::shared_ptr<GlobalCodeCompletionAllocator> CompletionAllocator;			std::shared_ptr<GlobalCodeCompletionAllocator> CompletionAllocator;
	std::unique_ptr<CodeCompletionTUInfo> CompletionTUInfo;			std::unique_ptr<CodeCompletionTUInfo> CompletionTUInfo;
	Options Opts;			Options Opts;
				using DeclOccurrence = std::pair<SourceLocation, index::SymbolRoleSet>;
	// Symbols referenced from the current TU, flushed on finish().			// Symbols referenced from the current TU, flushed on finish().
	llvm::DenseSet<const NamedDecl *> ReferencedDecls;			llvm::DenseSet<const NamedDecl *> ReferencedDecls;
	llvm::DenseSet<const IdentifierInfo *> ReferencedMacros;			llvm::DenseSet<const IdentifierInfo *> ReferencedMacros;
				llvm::DenseMap<const NamedDecl *, std::vector<DeclOccurrence>>
				DeclOccurrences;
	// Maps canonical declaration provided by clang to canonical declaration for			// Maps canonical declaration provided by clang to canonical declaration for
	// an index symbol, if clangd prefers a different declaration than that			// an index symbol, if clangd prefers a different declaration than that
	// provided by clang. For example, friend declaration might be considered			// provided by clang. For example, friend declaration might be considered
	// canonical by clang but should not be considered canonical in the index			// canonical by clang but should not be considered canonical in the index
	// unless it's a definition.			// unless it's a definition.
	llvm::DenseMap<const Decl , const Decl > CanonicalDecls;			llvm::DenseMap<const Decl , const Decl > CanonicalDecls;
				// All symbol occurrences collected from the AST, assembled on finish().
				// Only symbols declared in preamble (from #inclues) and references from the
				// main file will be included.
				sammccallUnsubmitted Done Reply Inline Actions please move next to ReferencedDecls/ReferencedMacros so the comment applies to this too sammccall: please move next to ReferencedDecls/ReferencedMacros so the comment applies to this too
				SymbolOccurrenceSlab SymbolOccurrences;
	};			};

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang

clangd/index/SymbolCollector.cpp

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	getIncludeHeader(llvm::StringRef QName, const SourceManager &SM,
if (Opts.Includes) {		if (Opts.Includes) {
Header = Opts.Includes->mapHeader(Headers, QName);		Header = Opts.Includes->mapHeader(Headers, QName);
if (Header.startswith("<") \|\| Header.startswith("\""))		if (Header.startswith("<") \|\| Header.startswith("\""))
return Header.str();		return Header.str();
}		}
return toURI(SM, Header, Opts);		return toURI(SM, Header, Opts);
}		}

// Return the symbol location of the token at \p Loc.		// Return the symbol range of the token at \p TokLoc.
		std::pair<SymbolLocation::Position, SymbolLocation::Position>
		getTokenRange(SourceLocation TokLoc, const SourceManager &SM,
		const LangOptions &LangOpts) {
		auto CreatePosition = [&SM](SourceLocation Loc) {
		auto LSPLoc = sourceLocToPosition(SM, Loc);
		SymbolLocation::Position Pos;
		Pos.Line = LSPLoc.line;
		Pos.Column = LSPLoc.character;
		return Pos;
		};

		auto TokenLength = clang::Lexer::MeasureTokenLength(TokLoc, SM, LangOpts);
		return {CreatePosition(TokLoc),
		CreatePosition(TokLoc.getLocWithOffset(TokenLength))};
		}

		// Return the symbol location of the token at \p TokLoc.
llvm::Optional<SymbolLocation>		llvm::Optional<SymbolLocation>
getTokenLocation(SourceLocation TokLoc, const SourceManager &SM,		getTokenLocation(SourceLocation TokLoc, const SourceManager &SM,
const SymbolCollector::Options &Opts,		const SymbolCollector::Options &Opts,
const clang::LangOptions &LangOpts,		const clang::LangOptions &LangOpts,
std::string &FileURIStorage) {		std::string &FileURIStorage) {
auto U = toURI(SM, SM.getFilename(TokLoc), Opts);		auto U = toURI(SM, SM.getFilename(TokLoc), Opts);
if (!U)		if (!U)
return llvm::None;		return llvm::None;
FileURIStorage = std::move(*U);		FileURIStorage = std::move(*U);
SymbolLocation Result;		SymbolLocation Result;
Result.FileURI = FileURIStorage;		Result.FileURI = FileURIStorage;
auto TokenLength = clang::Lexer::MeasureTokenLength(TokLoc, SM, LangOpts);		auto Range = getTokenRange(TokLoc, SM, LangOpts);
		Result.Start = Range.first;
auto CreatePosition = [&SM](SourceLocation Loc) {		Result.End = Range.second;
auto LSPLoc = sourceLocToPosition(SM, Loc);
SymbolLocation::Position Pos;
Pos.Line = LSPLoc.line;
Pos.Column = LSPLoc.character;
return Pos;
};

Result.Start = CreatePosition(TokLoc);
auto EndLoc = TokLoc.getLocWithOffset(TokenLength);
Result.End = CreatePosition(EndLoc);

return std::move(Result);		return std::move(Result);
}		}

// Checks whether \p ND is a definition of a TagDecl (class/struct/enum/union)		// Checks whether \p ND is a definition of a TagDecl (class/struct/enum/union)
// in a header file, in which case clangd would prefer to use ND as a canonical		// in a header file, in which case clangd would prefer to use ND as a canonical
// declaration.		// declaration.
// FIXME: handle symbol types that are not TagDecl (e.g. functions), if using		// FIXME: handle symbol types that are not TagDecl (e.g. functions), if using
// the first seen declaration as canonical declaration is not a good enough		// the first seen declaration as canonical declaration is not a good enough
// heuristic.		// heuristic.
bool isPreferredDeclaration(const NamedDecl &ND, index::SymbolRoleSet Roles) {		bool isPreferredDeclaration(const NamedDecl &ND, index::SymbolRoleSet Roles) {
using namespace clang::ast_matchers;		using namespace clang::ast_matchers;
return (Roles & static_cast<unsigned>(index::SymbolRole::Definition)) &&		return (Roles & static_cast<unsigned>(index::SymbolRole::Definition)) &&
llvm::isa<TagDecl>(&ND) &&		llvm::isa<TagDecl>(&ND) &&
match(decl(isExpansionInMainFile()), ND, ND.getASTContext()).empty();		match(decl(isExpansionInMainFile()), ND, ND.getASTContext()).empty();
}		}

		SymbolOccurrenceKind toOccurrenceKind(index::SymbolRoleSet Roles) {
		sammccallUnsubmitted Done Reply Inline Actions nit: toOccurrenceKind sammccall: nit: toOccurrenceKind
		return static_cast<SymbolOccurrenceKind>(
		static_cast<unsigned>(AllOccurrenceKinds) & Roles);
		sammccallUnsubmitted Done Reply Inline Actions If you want to filter out the unsupported bits, maybe just add an explicit `AllOccurrenceKinds` constant to the header file, and `return AllOccurrenceKinds & Roles` here? (plus casts) sammccall: If you want to filter out the unsupported bits, maybe just add an explicit `AllOccurrenceKinds`…
		}

} // namespace		} // namespace

SymbolCollector::SymbolCollector(Options Opts) : Opts(std::move(Opts)) {}		SymbolCollector::SymbolCollector(Options Opts) : Opts(std::move(Opts)) {}
		ioericUnsubmitted Done Reply Inline Actions I don't see a strong reason for the separation of `CollectOccurrence` and `CollectSymbol`. There are some pieceis that are only used by one of them, but they seem cheap enough to ignore? Intuitively, it seems to me reference collection could just be a member function of `SymbolCollector`. ioeric: I don't see a strong reason for the separation of `CollectOccurrence` and `CollectSymbol`.

void SymbolCollector::initialize(ASTContext &Ctx) {		void SymbolCollector::initialize(ASTContext &Ctx) {
ASTCtx = &Ctx;		ASTCtx = &Ctx;
CompletionAllocator = std::make_shared<GlobalCodeCompletionAllocator>();		CompletionAllocator = std::make_shared<GlobalCodeCompletionAllocator>();
CompletionTUInfo =		CompletionTUInfo =
llvm::make_unique<CodeCompletionTUInfo>(CompletionAllocator);		llvm::make_unique<CodeCompletionTUInfo>(CompletionAllocator);
}		}

Show All 14 Lines	bool SymbolCollector::shouldCollectSymbol(const NamedDecl &ND,
// In real world projects, we have a relatively large set of header files		// In real world projects, we have a relatively large set of header files
// that define static variables (like "static const int A = 1;"), we still		// that define static variables (like "static const int A = 1;"), we still
// want to collect these symbols, although they cause potential ODR		// want to collect these symbols, although they cause potential ODR
// violations.		// violations.
if (ND.isInAnonymousNamespace())		if (ND.isInAnonymousNamespace())
return false;		return false;

// We want most things but not "local" symbols such as symbols inside		// We want most things but not "local" symbols such as symbols inside
// FunctionDecl, BlockDecl, ObjCMethodDecl and OMPDeclareReductionDecl.		// FunctionDecl, BlockDecl, ObjCMethodDecl and OMPDeclareReductionDecl.
		ilya-biryukovUnsubmitted Done Reply Inline Actions NIT: maybe use early exits and inverted conditions to keep the nesting down? ilya-biryukov: NIT: maybe use early exits and inverted conditions to keep the nesting down?
// FIXME: Need a matcher for ExportDecl in order to include symbols declared		// FIXME: Need a matcher for ExportDecl in order to include symbols declared
// within an export.		// within an export.
auto InNonLocalContext = hasDeclContext(anyOf(		auto InNonLocalContext = hasDeclContext(anyOf(
translationUnitDecl(), namespaceDecl(), linkageSpecDecl(), recordDecl(),		translationUnitDecl(), namespaceDecl(), linkageSpecDecl(), recordDecl(),
enumDecl(), objcProtocolDecl(), objcInterfaceDecl(), objcCategoryDecl(),		enumDecl(), objcProtocolDecl(), objcInterfaceDecl(), objcCategoryDecl(),
objcCategoryImplDecl(), objcImplementationDecl()));		objcCategoryImplDecl(), objcImplementationDecl()));
// Don't index template specializations and expansions in main files.		// Don't index template specializations and expansions in main files.
auto IsSpecialization =		auto IsSpecialization =
Show All 32 Lines	bool SymbolCollector::handleDeclOccurence(
// picked a replacement for D		// picked a replacement for D
if (D->getFriendObjectKind() != Decl::FriendObjectKind::FOK_None)		if (D->getFriendObjectKind() != Decl::FriendObjectKind::FOK_None)
D = CanonicalDecls.try_emplace(D, ASTNode.OrigD).first->second;		D = CanonicalDecls.try_emplace(D, ASTNode.OrigD).first->second;
const NamedDecl *ND = llvm::dyn_cast<NamedDecl>(D);		const NamedDecl *ND = llvm::dyn_cast<NamedDecl>(D);
if (!ND)		if (!ND)
return true;		return true;

// Mark D as referenced if this is a reference coming from the main file.		// Mark D as referenced if this is a reference coming from the main file.
// D may not be an interesting symbol, but it's cheaper to check at the end.		// D may not be an interesting symbol, but it's cheaper to check at the end.
		ilya-biryukovUnsubmitted Done Reply Inline Actions If we any `Options` here, why have an extra `CollectorSymbolOptions`? ilya-biryukov: If we any `Options` here, why have an extra `CollectorSymbolOptions`?
auto &SM = ASTCtx->getSourceManager();		auto &SM = ASTCtx->getSourceManager();
		auto SpellingLoc = SM.getSpellingLoc(Loc);
if (Opts.CountReferences &&		if (Opts.CountReferences &&
(Roles & static_cast<unsigned>(index::SymbolRole::Reference)) &&		(Roles & static_cast<unsigned>(index::SymbolRole::Reference)) &&
SM.getFileID(SM.getSpellingLoc(Loc)) == SM.getMainFileID())		SM.getFileID(SpellingLoc) == SM.getMainFileID())
ReferencedDecls.insert(ND);		ReferencedDecls.insert(ND);
		sammccallUnsubmitted Done Reply Inline Actions note that here we've done basically all the work needed to record the occurrence. If you add a DenseMap<Decl, {SourceLocation, SymbolRole}> then you'll have enough info at the end to fill in the occurrences, like we do with referenceddecls -> references. sammccall:* note that here we've done basically all the work needed to record the occurrence. If you add a…

		if ((static_cast<unsigned>(Opts.OccurrenceFilter) & Roles) &&
		SM.getFileID(SpellingLoc) == SM.getMainFileID())
		sammccallUnsubmitted Done Reply Inline Actions just compute the spelling loc once and reuse? sammccall: just compute the spelling loc once and reuse?
		DeclOccurrences[ND].emplace_back(SpellingLoc, Roles);
		sammccallUnsubmitted Done Reply Inline Actions you get the spelling loc on the previous line to check for mainfile - so surely we should be using spelling loc here? sammccall: you get the spelling loc on the previous line to check for mainfile - so surely we should be…

// Don't continue indexing if this is a mere reference.		// Don't continue indexing if this is a mere reference.
if (!(Roles & static_cast<unsigned>(index::SymbolRole::Declaration) \|\|		if (!(Roles & static_cast<unsigned>(index::SymbolRole::Declaration) \|\|
Roles & static_cast<unsigned>(index::SymbolRole::Definition)))		Roles & static_cast<unsigned>(index::SymbolRole::Definition)))
return true;		return true;
if (!shouldCollectSymbol(ND, ASTCtx, Opts))		if (!shouldCollectSymbol(ND, ASTCtx, Opts))
return true;		return true;

auto ID = getSymbolID(ND);		auto ID = getSymbolID(ND);
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (Opts.CollectMacro) {
for (const IdentifierInfo *II : ReferencedMacros) {		for (const IdentifierInfo *II : ReferencedMacros) {
llvm::SmallString<128> USR;		llvm::SmallString<128> USR;
if (const auto *MI = PP->getMacroDefinition(II).getMacroInfo())		if (const auto *MI = PP->getMacroDefinition(II).getMacroInfo())
if (!index::generateUSRForMacro(II->getName(), MI->getDefinitionLoc(),		if (!index::generateUSRForMacro(II->getName(), MI->getDefinitionLoc(),
PP->getSourceManager(), USR))		PP->getSourceManager(), USR))
IncRef(SymbolID(USR));		IncRef(SymbolID(USR));
}		}
}		}

		const auto &SM = ASTCtx->getSourceManager();
		sammccallUnsubmitted Done Reply Inline Actions nit: const auto& for clarity since we're not mutating sammccall: nit: const auto& for clarity since we're not mutating
		auto* MainFileEntry = SM.getFileEntryForID(SM.getMainFileID());

		if (auto MainFileURI = toURI(SM, MainFileEntry->getName(), Opts)) {
		std::string MainURI = *MainFileURI;
		for (const auto &It : DeclOccurrences) {
		sammccallUnsubmitted Done Reply Inline Actions so this seems maybe gratuitously inefficient, we're copying the filename then going through the URI conversion dance for each reference - even though the filename is the same for each. consider splitting out part of `getTokenLocation` into `getTokenRange(SymbolLocation&)` and only calling that here. sammccall: so this seems maybe gratuitously inefficient, we're copying the filename then going through…
		if (auto ID = getSymbolID(It.first)) {
		if (Symbols.find(*ID)) {
		for (const auto &LocAndRole : It.second) {
		SymbolOccurrence Occurrence;
		auto Range =
		getTokenRange(LocAndRole.first, SM, ASTCtx->getLangOpts());
		Occurrence.Location.Start = Range.first;
		Occurrence.Location.End = Range.second;
		Occurrence.Location.FileURI = MainURI;
		Occurrence.Kind = toOccurrenceKind(LocAndRole.second);
		SymbolOccurrences.insert(*ID, Occurrence);
		}
		}
		}
		}
		} else {
		log("Failed to create URI for main file: {0}", MainFileEntry->getName());
		}

		SymbolOccurrences.freeze();
ReferencedDecls.clear();		ReferencedDecls.clear();
ReferencedMacros.clear();		ReferencedMacros.clear();
		DeclOccurrences.clear();
}		}

const Symbol *SymbolCollector::addDeclaration(const NamedDecl &ND,		const Symbol *SymbolCollector::addDeclaration(const NamedDecl &ND,
SymbolID ID) {		SymbolID ID) {
auto &Ctx = ND.getASTContext();		auto &Ctx = ND.getASTContext();
auto &SM = Ctx.getSourceManager();		auto &SM = Ctx.getSourceManager();

Symbol S;		Symbol S;
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

unittests/clangd/SymbolCollectorTests.cpp

Show All 22 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"

#include <memory>		#include <memory>
#include <string>		#include <string>

		namespace clang {
		namespace clangd {

		namespace {

using testing::AllOf;		using testing::AllOf;
using testing::Eq;		using testing::Eq;
using testing::Field;		using testing::Field;
		using testing::IsEmpty;
using testing::Not;		using testing::Not;
using testing::UnorderedElementsAre;		using testing::UnorderedElementsAre;
using testing::UnorderedElementsAreArray;		using testing::UnorderedElementsAreArray;

// GMock helpers for matching Symbol.		// GMock helpers for matching Symbol.
MATCHER_P(Labeled, Label, "") {		MATCHER_P(Labeled, Label, "") {
return (arg.Name + arg.Signature).str() == Label;		return (arg.Name + arg.Signature).str() == Label;
}		}
Show All 27 Lines	return std::tie(arg.Definition.Start.Line,
arg.Definition.End.Column) ==		arg.Definition.End.Column) ==
std::tie(Pos.start.line, Pos.start.character, Pos.end.line,		std::tie(Pos.start.line, Pos.start.character, Pos.end.line,
Pos.end.character);		Pos.end.character);
}		}
MATCHER_P(Refs, R, "") { return int(arg.References) == R; }		MATCHER_P(Refs, R, "") { return int(arg.References) == R; }
MATCHER_P(ForCodeCompletion, IsIndexedForCodeCompletion, "") {		MATCHER_P(ForCodeCompletion, IsIndexedForCodeCompletion, "") {
return arg.IsIndexedForCodeCompletion == IsIndexedForCodeCompletion;		return arg.IsIndexedForCodeCompletion == IsIndexedForCodeCompletion;
}		}
		MATCHER(OccurrenceRange, "") {
namespace clang {		const SymbolOccurrence &Pos = testing::get<0>(arg);
namespace clangd {		const Range &Range = testing::get<1>(arg);
		return std::tie(Pos.Location.Start.Line, Pos.Location.Start.Column,
namespace {		Pos.Location.End.Line, Pos.Location.End.Column) ==
		std::tie(Range.start.line, Range.start.character, Range.end.line,
		Range.end.character);
		}
		testing::Matcher<const std::vector<SymbolOccurrence> &>
		HaveRanges(const std::vector<Range> Ranges) {
		return testing::UnorderedPointwise(OccurrenceRange(), Ranges);
		}

class ShouldCollectSymbolTest : public ::testing::Test {		class ShouldCollectSymbolTest : public ::testing::Test {
public:		public:
void build(StringRef HeaderCode, StringRef Code = "") {		void build(StringRef HeaderCode, StringRef Code = "") {
File.HeaderFilename = HeaderName;		File.HeaderFilename = HeaderName;
File.Filename = FileName;		File.Filename = FileName;
File.HeaderCode = HeaderCode;		File.HeaderCode = HeaderCode;
File.Code = Code;		File.Code = Code;
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	tooling::ToolInvocation Invocation(
std::make_shared<PCHContainerOperations>());		std::make_shared<PCHContainerOperations>());

InMemoryFileSystem->addFile(TestHeaderName, 0,		InMemoryFileSystem->addFile(TestHeaderName, 0,
llvm::MemoryBuffer::getMemBuffer(HeaderCode));		llvm::MemoryBuffer::getMemBuffer(HeaderCode));
InMemoryFileSystem->addFile(TestFileName, 0,		InMemoryFileSystem->addFile(TestFileName, 0,
llvm::MemoryBuffer::getMemBuffer(MainCode));		llvm::MemoryBuffer::getMemBuffer(MainCode));
Invocation.run();		Invocation.run();
Symbols = Factory->Collector->takeSymbols();		Symbols = Factory->Collector->takeSymbols();
		SymbolOccurrences = Factory->Collector->takeOccurrences();
return true;		return true;
}		}

protected:		protected:
llvm::IntrusiveRefCntPtr<vfs::InMemoryFileSystem> InMemoryFileSystem;		llvm::IntrusiveRefCntPtr<vfs::InMemoryFileSystem> InMemoryFileSystem;
std::string TestHeaderName;		std::string TestHeaderName;
std::string TestHeaderURI;		std::string TestHeaderURI;
std::string TestFileName;		std::string TestFileName;
std::string TestFileURI;		std::string TestFileURI;
SymbolSlab Symbols;		SymbolSlab Symbols;
		SymbolOccurrenceSlab SymbolOccurrences;
SymbolCollector::Options CollectorOpts;		SymbolCollector::Options CollectorOpts;
std::unique_ptr<CommentHandler> PragmaHandler;		std::unique_ptr<CommentHandler> PragmaHandler;
};		};

TEST_F(SymbolCollectorTest, CollectSymbols) {		TEST_F(SymbolCollectorTest, CollectSymbols) {
const std::string Header = R"(		const std::string Header = R"(
class Foo {		class Foo {
Foo() {}		Foo() {}
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	EXPECT_THAT(
DefRange(Main.range("clsdef"))),		DefRange(Main.range("clsdef"))),
AllOf(QName("print"), DeclRange(Header.range("printdecl")),		AllOf(QName("print"), DeclRange(Header.range("printdecl")),
DefRange(Main.range("printdef"))),		DefRange(Main.range("printdef"))),
AllOf(QName("Z"), DeclRange(Header.range("zdecl"))),		AllOf(QName("Z"), DeclRange(Header.range("zdecl"))),
AllOf(QName("foo"), DeclRange(Header.range("foodecl")))		AllOf(QName("foo"), DeclRange(Header.range("foodecl")))
));		));
}		}

		TEST_F(SymbolCollectorTest, Occurrences) {
		Annotations Header(R"(
		class $foo[[Foo]] {
		public:
		$foo[[Foo]]() {}
		$foo[[Foo]](int);
		};
		class $bar[[Bar]];
		void $func[[func]]();
		)");
		Annotations Main(R"(
		class $bar[[Bar]] {};

		void $func[[func]]();

		void fff() {
		$foo[[Foo]] foo;
		$bar[[Bar]] bar;
		$func[[func]]();
		int abc = 0;
		$foo[[Foo]] foo2 = abc;
		}
		)");
		Annotations SymbolsOnlyInMainCode(R"(
		int a;
		void b() {}
		static const int c = 0;
		class d {};
		)");
		CollectorOpts.OccurrenceFilter = SymbolOccurrenceKind::Declaration \|
		SymbolOccurrenceKind::Definition \|
		SymbolOccurrenceKind::Reference;
		runSymbolCollector(Header.code(),
		(Main.code() + SymbolsOnlyInMainCode.code()).str());
		auto HeaderSymbols = TestTU::withHeaderCode(Header.code()).headerSymbols();

		EXPECT_THAT(SymbolOccurrences.find(findSymbol(Symbols, "Foo").ID),
		HaveRanges(Main.ranges("foo")));
		EXPECT_THAT(SymbolOccurrences.find(findSymbol(Symbols, "Bar").ID),
		sammccallUnsubmitted Done Reply Inline Actions this is cute - if possible, consider adding a matcher factory function for readability here, so you can write `EXPECT_THAT(..., HaveRanges(Main.ranges("foo"))` sammccall: this is cute - if possible, consider adding a matcher factory function for readability here, so…
		hokeinAuthorUnsubmitted Not Done Reply Inline Actions Wrapped this into `HaveRanges`. hokein: Wrapped this into `HaveRanges`.
		HaveRanges(Main.ranges("bar")));
		EXPECT_THAT(SymbolOccurrences.find(findSymbol(Symbols, "func").ID),
		HaveRanges(Main.ranges("func")));

		// Retrieve IDs for symbols only in the main file, and verify these symbols
		// are not collected.
		auto MainSymbols =
		TestTU::withHeaderCode(SymbolsOnlyInMainCode.code()).headerSymbols();
		EXPECT_THAT(SymbolOccurrences.find(findSymbol(MainSymbols, "a").ID),
		IsEmpty());
		EXPECT_THAT(SymbolOccurrences.find(findSymbol(MainSymbols, "b").ID),
		IsEmpty());
		EXPECT_THAT(SymbolOccurrences.find(findSymbol(MainSymbols, "c").ID),
		IsEmpty());
		}

TEST_F(SymbolCollectorTest, References) {		TEST_F(SymbolCollectorTest, References) {
const std::string Header = R"(		const std::string Header = R"(
class W;		class W;
class X {};		class X {};
class Y;		class Y;
class Z {}; // not used anywhere		class Z {}; // not used anywhere
Y* y = nullptr; // used in header doesn't count		Y* y = nullptr; // used in header doesn't count
#define GLOBAL_Z(name) Z name;		#define GLOBAL_Z(name) Z name;
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines