This is an archive of the discontinued LLVM Phabricator instance.

clang-tools-extra/clangd/index/Index.h
26	One thing I'm wondering about is: would it be better to just use `clang::index::SymbolRole` here (which has `Relation___Of` entries that cover what we're interested in) instead of having our own enum?

This mostly looks good, one high level comment:
I believe it makes sense to deduplicate SymbolIDs for RelationSlab.
Up until now, we mostly had only one occurence of a SymbolID in a Slab, but RelationSlab does not follow that assumption.

Also can you add a few tests after the above mentioned check to make sure interning SymbolIDs works as expected?

clang-tools-extra/clangd/index/Index.h
26	I totally agree, but that struct is 4 bytes. I am not sure it is worth the trade off when storing the relationslab, but since this patch is not related to serialization let's get rid of RelationKind and make use of clang::index::SymbolRole. When landing serialization part we can either split clang::index::SymbolRole into two parts or add some custom mapping to serialize only relevant bits etc.
60	use capacity instead of size
68	I am not sure comment currently applies to RelationSlab internals, since everything is of value type anyways, there are no references.

kadircet added a reviewer: gribozavr.Mar 15 2019, 5:26 AM

gribozavr added inline comments.Mar 15 2019, 6:35 AM

clang-tools-extra/clangd/index/Index.cpp
35 ↗	(On Diff #190785)	Use `ArrayRef::copy()`, for example: https://reviews.llvm.org/D58782
clang-tools-extra/clangd/index/Index.h
44	`struct Relation`? And in the comments for it, please explain which way the relationship is directed (is the SymbolID in the key the subtype? or is the SymbolID in the ArrayRef the subtype?).
89	Please move all new declarations into `Relation.h`.

kadircet added inline comments.Mar 15 2019, 6:55 AM

clang-tools-extra/clangd/index/Index.h
44	Ah exactly my thoughts, forget to mention this. I believe current usage is the counter-intuitive one. For example, we will most likely query with something like: `getRelations(SymbolID, baseOf)` to get all relations where `SymbolID` is `baseOf` something else(which says get children of `SymbolID`) So that this valueType can be read like, `SymbolID` is `RelationKind` every `SymbolID inside array` WDYT?

In D59407#1430656, @kadircet wrote:

I believe it makes sense to deduplicate SymbolIDs for RelationSlab.
Up until now, we mostly had only one occurence of a SymbolID in a Slab, but RelationSlab does not follow that assumption.

Just to make sure I understand, do you mean:

(A) When adding a SymbolID to an entry's value, check that it's not already there; or
(B) Try to conserve space by not storing SymbolIDs directly in the entries, but storing an index into a separate list of unique SymbolIDs.

If it's (B), how many bytes should the index be? Are the space gains worth the complexity, given that SymbolID is only 8 bytes to begin with? (As compared to say, the filenames in Ref, which can be much longer, making this sort of optimization more clearly worth it.)

nridge marked 7 inline comments as done.Mar 17 2019, 12:20 PM

nridge added inline comments.Mar 17 2019, 12:20 PM

clang-tools-extra/clangd/index/Index.h
44	The way I was thinking of it is that `getRelations(SymbolID, baseOf)` would return all the bases of `SymbolID`. However, the opposite interpretation is also reasonable, we just need to pick one and document it. I'm happy to go with your suggested one.
60	Note, `RefSlab::bytes()` (which I where I copied this from) uses `size()` as well. Should I change that too?

Address review comments, except for the deduplication which is still under discussion

Herald added a subscriber: mgorny. · View Herald TranscriptMar 17 2019, 12:21 PM

Harbormaster completed remote builds in B29273: Diff 191036.Mar 17 2019, 12:21 PM

gribozavr accepted this revision.Mar 18 2019, 1:53 AM

gribozavr added inline comments.

clang-tools-extra/clangd/index/Index.h
60	I'd say yes -- in a separate patch though. Thanks for catching it!
clang-tools-extra/clangd/index/Relation.h
2	"--- Relation.h --------------------"
42	Three slashes for doc comments, please.
46	Lift it up into the `clang::clangd` namespace? (like `Symbol` and `Ref`)
69	No need to repeat the type name being documented. "A mutable container that can ..."

This revision is now accepted and ready to land.Mar 18 2019, 1:53 AM

In D59407#1432543, @nridge wrote:

If it's (B), how many bytes should the index be? Are the space gains worth the complexity, given that SymbolID is only 8 bytes to begin with? (As compared to say, the filenames in Ref, which can be much longer, making this sort of optimization more clearly worth it.)

That was the case, I agree with you we should rather do that while serializing where we have variable length integers and duplication is more severe(all three slabs make use of same SymbolID pool).

clang-tools-extra/clangd/index/Index.h
44	It looks like IndexingAPI is also using the interpretation I suggested, so let's move with that one if you don't have any other concerns.

nridge marked 8 inline comments as done.Mar 21 2019, 7:01 PM

nridge added inline comments.

clang-tools-extra/clangd/index/Index.h
44	Already updated to this interpretation :)
clang-tools-extra/clangd/index/Relation.h
46	This comment made me realize that I haven't addressed your previous comment properly: I haven't changed `RelationSlab::value_type` from `std::pair<RelationKey, llvm::ArrayRef<SymbolID>>` to `Relation`. I tried to make that change this time, and ran into a problem: In the rest of the subtypes patch (D58880), one of the things I do is extend the `MemIndex` constructor so that, in addition to taking a symbol range and a ref range, it takes a relation range. That constructor assumes that the elements of that range have members of some name - either `first` and `second` (as currently in D58880), or `Key` and `Value`. However, that constructor has two call sites. In `MemIndex::build()`, we pass in the slabs themselves as the ranges. So, if we make this change, the field names for that call site will be `Key` and `Value`. However, for the second call site in `FileSymbols::buildIndex()`, the ranges that are passed in are `DenseMap`s, and therefore their elements' field names are necessarily `first` and `second`. The same constructor cannot therefore accept both ranges. How do you propose we address this? Scrap `struct Relation`, and keep `value_type` as `std::pair<RelationKey, llvm::ArrayRef<SymbolID>>`? Keep `struct Relation`, but make its fields named `first` and `second`? Split the constructor of `MemIndex` into two constructors, to accomodate both sets of field names? Something else?

gribozavr added inline comments.Mar 22 2019, 6:04 AM

clang-tools-extra/clangd/index/Relation.h
2	Not done?
42	Not done?
46	I guess we should scrap it then. Kadir, WDYT?
69	Not done?

nridge marked 2 inline comments as done.Mar 22 2019, 8:04 AM

nridge added inline comments.

clang-tools-extra/clangd/index/Relation.h
2	(Sorry, I have these comments addressed locally, was just waiting for the resolution of the remaining issue before uploading.)

Scrapped 'struct Relation' and addressed other comments

Harbormaster completed remote builds in B29500: Diff 191886.Mar 22 2019, 8:28 AM

As this is the first of a series of patches adding support for relations, and then building type hierarchy subtypes on top (as discussed here), how should we go about landing this -- should we land each patch in the series as soon as it's ready, or should we wait to land them all together?

Submitting code as it becomes ready is the usual practice here.

Ok, cool. In that case, I think this patch is ready to be committed, and would appreciate it if someone could commit it. Thanks!

(Sorry to arrive late at this discussion, I'm just back from leave.
I have some suggestions and would appreciate your thoughts, but if simply this feels too much like going around in circles I'm happy to work out how we can address this after this patch lands instead.
Have discussed offline with @gribozavr and I think we're roughly on the same page. @kadircet is now out on leave)

I think the data model might be overly complicated here.
I can see how the discussion got here: it mirrors the other *Slab types quite closely. But:

those types were designed to solve specific problems that we don't have here (string deduplication and symbol merging)
I think the class-parent relation is pretty sparse (maybe 1 edge per 10 symbols?) so lots of options will work fine
I don't think we yet know what the more resource-critical (denser) relations and queries are, so it's unclear what to optimize for

I think the simplest model that can work here is something like:

Relation is struct { SymbolID Subject; SymbolRole Relation; SymbolID Object }
this is a value type, so could be passed around simply as std::vector<Relation>. If RelationSlab is desired for symmetry, it could be an alias or simple wrapper around std::vector<Relation>.

This has a few advantages:

the Relation value_type is self-contained, has clearer semantics, and works as the result of any type of query: based on subject and/or predicate and/or object. (Similar to how it's convenient that Symbol contains SymbolID).
if lookup is desired, lookup by subject, subject+predicate, or full-triple is possible by sorting in SPO order with binary search. Return type is simple ArrayRef<Relation>. (Though fancy lookup maybe better belongs in MemIndex/Dex rather than in the slab). For more query types, storing a copies in OSP and/or POS order is pretty cheap too.
memory efficiency: it costs 20*distinct(subject, predicate, object) bytes, vs 28*distinct(subject, predicate) + 8 * distinct(subject, predicate, object). Unless the average number of objects for a (subject, predicate) pair (that has at least one object) is >2.3, the former is smaller. For the current use case, this average is certainly <1.1.
simplicity: no arenas, easy mental model.

I see discussion of (something like) this option stalled around the index constructors, but I don't see a fundamental block there. The concept that the index constructor should be templated over would be Iterable<Relation>. LMK if I'm missing something.

@sammccall, thank you for having a look at this.

I have no objection to revising the data model if there's agreement on a better one.

In D59407#1446464, @sammccall wrote:

I don't think we yet know what the more resource-critical (denser) relations and queries are, so it's unclear what to optimize for

Based on a brief mental survey of C++ IDE features I'm familiar with, I can think of the following additional uses of the relations capability:

A call hierarchy feature (which is also proposed for LSP, with client and server implementation efforts) would need every caller-callee relationship to be recorded in the index (RelationCalledBy).
Given a virtual method declaration, a user may want to see a list of implementations (overriders) and navigate to one or more of them. This would need every overrider relationship to be recorded in the index (RelationOverrideOf).

Intuitively, it seems like RelationOverrideOf would be slightly denser than RelationChildOf (though not by much), while RelationCalledBy would be significantly denser. In terms of queries, I believe the key for lookups for both of the above would be a (subject, predicate) pair, just like for subtypes.

Does that change your analysis at all?

I guess I should clear the "Accepted" status until we settle the question of the data model.

In D59407#1447070, @nridge wrote:

@sammccall, thank you for having a look at this.

I have no objection to revising the data model if there's agreement on a better one.

In D59407#1446464, @sammccall wrote:

I don't think we yet know what the more resource-critical (denser) relations and queries are, so it's unclear what to optimize for

Based on a brief mental survey of C++ IDE features I'm familiar with, I can think of the following additional uses of the relations capability:

A call hierarchy feature (which is also proposed for LSP, with client and server implementation efforts) would need every caller-callee relationship to be recorded in the index (RelationCalledBy).

Given a virtual method declaration, a user may want to see a list of implementations (overriders) and navigate to one or more of them. This would need every overrider relationship to be recorded in the index (RelationOverrideOf).

Intuitively, it seems like RelationOverrideOf would be slightly denser than RelationChildOf (though not by much), while RelationCalledBy would be significantly denser. In terms of queries, I believe the key for lookups for both of the above would be a (subject, predicate) pair, just like for subtypes.

Does that change your analysis at all?

Sorry for the slow response here. Override and callgraph are great examples!
As you say, override is probably pretty sparse and it's probably not worth worrying about the storage too much.

If we stored a callgraph we'd definitely need to worry about the representation though. The space-saving hierarchy in this case would be map<relationtype, map<callee, caller>>I guess. Maybe storing one vector<pair<Subject, Predicate>> for each relationship type would work here - querying for a bunch of relationship types is rare.
One thing that strikes me here is that this case is very similar to our existing Ref data - it's basically a subset, but with a symbolid payload instead of location. We could consider just adding the SymbolID to Refs - it'd blow up the size of that by 50%, but we may not do much better with some other representation, and it would avoid adding any new complexity.
Ultimately it may also be that supporting both find references and callgraph (which provide closely overlapping functionality) isn't a good use of index memory.

In D59407#1456394, @sammccall wrote:

One thing that strikes me here is that this case is very similar to our existing Ref data - it's basically a subset, but with a symbolid payload instead of location. We could consider just adding the SymbolID to Refs - it'd blow up the size of that by 50%, but we may not do much better with some other representation, and it would avoid adding any new complexity.

Note that this was considered and rejected earlier (see here and here), though that discussion did not consider denser relations like callgraph.

Given the additional information and perspectives from the conversation here, I have two suggestions for potential ways forward:

Approach 1: Add SymbolID to Refs

This would be initially wasteful when only subtypes use it, but if we then build callgraph on top of it as well it will become significantly less wasteful.
This has the benefit that we don't have duplicate information for find-references and callgraph (they both use Refs).
This approach probably adds the least amount of complexity overall.

Approach 2: Add a RelationSlab storing (subject, predicate, object) triples, intended for sparse relations

This would allow us to implement features that require sparse relations, such as subtypes and overrides, without any significant increase to index size.
If we later want to add dense relations like callgraph, we'd use a different mechanism for them (possibly by adding SymbolID to Refs as in approach 1).
If we do end up adding SymbolID to Refs for callgraph, and want to start using that for sparse relations too, we can rip out RelationSlab.
This also adds relatively little complexity, though slightly more than approach 1, and we are throwing away some code if we end up adding SymbolID to Refs eventually.

Any thoughts on which of the approaches to take, or something else altogether? It would be good to have some confidence that the chosen approach will pass code review before I implement it :)

Hi, any update here? I would appreciate some guidance so I can move forward with this.

Hi Nathan, sorry for the stall here, and for repeatedly going over the same issues.
The design space here is pretty complicated.

I think the conclusion of recent offline discussions is:

refs and relations can be the same thing-ish when seen through a certain lens.
Unifying them probably isn't space-prohibitive *if* we implement callgraph: it's something like +15% to overall memory usage, and storing callgraph in another way is likely to be similar.
however unifying the concepts seems likely to be awkward in practice. LSP features seem to want one or the other but not both - symbolID is generally better than location if it's precise enough, and useless if not. We're storing redundant information (the location for the method override ref is the same as the declaration of the related symbol). We risk tying ourselves in knots maintaining a model that doesn't map well onto our problems.
In terms of storage size, relation-major (map<SymbolRole, vector<pair<SymbolID, SymbolID>>> or so) seems like a quick win. But it's a small enough one that we should try to live without it first.

So if you can stomach it, I think

Approach 2: Add a RelationSlab storing (subject, predicate, object) triples, intended for sparse relations*

is certainly fine with us (having spoken with @kadircet @ilya-biryukov @sammccall @gribozavr - @hokein is on vacation but not nearly as stubborn as I am!)

In D59407#1478794, @sammccall wrote:

So if you can stomach it, I think

Approach 2: Add a RelationSlab storing (subject, predicate, object) triples, intended for sparse relations*

is certainly fine with us (having spoken with @kadircet @ilya-biryukov @sammccall @gribozavr - @hokein is on vacation but not nearly as stubborn as I am!)

Yep, I'm happy to move forward with that approach. Thanks for the guidance!

nridge mentioned this in D60953: [clangd] Respect clang-tidy suppression comments.May 17 2019, 8:05 AM

Implemented discussed design approach ("Add a RelationSlab storing (subject, predicate, object) triples, intended for sparse relations")

Herald added a subscriber: mgrang. · View Herald TranscriptMay 22 2019, 7:27 AM

Harbormaster completed remote builds in B32310: Diff 200735.May 22 2019, 7:28 AM

Mostly LG from my side, thanks!

clang-tools-extra/clangd/index/Relation.cpp
19	I would suggest adding another version which returns all the relations a given `Subject` participates, but I suppose currently we don't have any use cases. Feel free to add or skip, that can be added when necessary.
24	`std::tie`
clang-tools-extra/clangd/index/Relation.h
31	nit: could you use `std::tie`
35	again `std::tie`
52	I believe `RelationSlab` should still be immutable, even after the `build` operation below `RelationSlab` is mutable. Let's introduce the builder class and move mutating operations and build into it. So that it would be more symmetric to other slabs we have. Also are there any use-cases requiring RelationSlab to be mutable?
60	there seems to be some typos in the comment
72	maybe rename to `insert` to keep symmetry with other slabs? also it suggests a semantics on the operation that we don't care about.
79	maybe even get rid of sort by inlining it into build?
82	again, we could get rid of this comment if slab was immutable
89	Do we really need this?

Address review comments

clang-tools-extra/clangd/index/Relation.cpp
19	I'd rather add this when a use case arises.
clang-tools-extra/clangd/index/Relation.h
52	I'm not aware of any use-cases requiring it to be mutable. I'm making it immutable as suggested.
89	I think so. The builder needs a way to construct the slab. Aggregate initialization doesn't work because the class already has declared default and copy constructors.

Harbormaster completed remote builds in B32499: Diff 201427.May 25 2019, 4:20 PM

nridge mentioned this in D62459: [clangd] Serialization support for RelationSlab.May 25 2019, 7:21 PM

nridge mentioned this in D62471: [clangd] SymbolCollector support for relations.May 26 2019, 4:48 PM

LGTM, except the duplication issue. Thanks!

clang-tools-extra/clangd/index/Relation.h
76	maybe use a set so that we can be sure that there won't be any duplicates. sorry for missing that in the previous iteration.
clang-tools-extra/clangd/unittests/IndexTests.cpp
79	could you also add a test case that's inserting duplicate relations?

This revision is now accepted and ready to land.May 27 2019, 12:40 AM

Awesome, thank you!

clang-tools-extra/clangd/index/Relation.h
76	(you're sorting the array at the end anyway, so sort + unique + erase seems neater)

btw, I believe you have enough good quality patches to apply for commit access.

Are there any obstacles that is keeping you from applying?

Address latest review comments

Harbormaster completed remote builds in B32717: Diff 202369.May 30 2019, 10:32 PM

nridge marked 3 inline comments as done.May 30 2019, 10:33 PM

Still LG, thanks for the patch!

Closed by commit rL362352: [clangd] Add RelationSlab (authored by nridge). · Explain WhyJun 2 2019, 9:54 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2019, 9:54 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nridge mentioned this in rL362353: [clangd] Serialization support for RelationSlab.Jun 2 2019, 10:05 PM

nridge mentioned this in rG92524f9bf84d: [clangd] Serialization support for RelationSlab.Jun 2 2019, 10:10 PM

nridge mentioned this in rL362467: [clangd] SymbolCollector support for relations.Jun 3 2019, 9:24 PM

nridge mentioned this in rG73e6f47da249: [clangd] SymbolCollector support for relations.

nridge mentioned this in D62839: [clangd] Index API and implementations for relations.Jun 4 2019, 10:10 PM

nridge mentioned this in D58880: [clangd] Type hierarchy subtypes.Jun 4 2019, 10:13 PM

nridge mentioned this in rL363481: [clangd] Index API and implementations for relations.Jun 14 2019, 7:23 PM

nridge mentioned this in rGf1e6f5713caa: [clangd] Index API and implementations for relations.

nridge mentioned this in rL363506: [clangd] Type hierarchy subtypes.Jun 15 2019, 7:28 PM

nridge mentioned this in rGa552508841ad: [clangd] Type hierarchy subtypes.

Revision Contents

Path

Size

clang-tools-extra/

clangd/

CMakeLists.txt

1 line

index/

Index.h

1 line

Relation.h

98 lines

Relation.cpp

37 lines

unittests/

IndexTests.cpp

23 lines

Diff 200735

clang-tools-extra/clangd/CMakeLists.txt

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	add_clang_library(clangDaemon
index/BackgroundIndexStorage.cpp		index/BackgroundIndexStorage.cpp
index/CanonicalIncludes.cpp		index/CanonicalIncludes.cpp
index/FileIndex.cpp		index/FileIndex.cpp
index/Index.cpp		index/Index.cpp
index/IndexAction.cpp		index/IndexAction.cpp
index/MemIndex.cpp		index/MemIndex.cpp
index/Merge.cpp		index/Merge.cpp
index/Ref.cpp		index/Ref.cpp
		index/Relation.cpp
index/Serialization.cpp		index/Serialization.cpp
index/Symbol.cpp		index/Symbol.cpp
index/SymbolCollector.cpp		index/SymbolCollector.cpp
index/SymbolID.cpp		index/SymbolID.cpp
index/SymbolLocation.cpp		index/SymbolLocation.cpp
index/SymbolOrigin.cpp		index/SymbolOrigin.cpp
index/YAMLSerialization.cpp		index/YAMLSerialization.cpp

▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

clang-tools-extra/clangd/index/Index.h

	//===--- Index.h -------------------------------------------------- C++--===//			//===--- Index.h -------------------------------------------------- C++--===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H			#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H
	#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H			#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H

	#include "Ref.h"			#include "Ref.h"
				#include "Relation.h"
	#include "Symbol.h"			#include "Symbol.h"
	#include "SymbolID.h"			#include "SymbolID.h"
	#include "llvm/ADT/DenseSet.h"			#include "llvm/ADT/DenseSet.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/StringExtras.h"			#include "llvm/ADT/StringExtras.h"
	#include "llvm/Support/JSON.h"			#include "llvm/Support/JSON.h"
	#include <mutex>			#include <mutex>
	#include <string>			#include <string>

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {

	struct FuzzyFindRequest {			struct FuzzyFindRequest {
				nridgeAuthorUnsubmitted Done Reply Inline Actions One thing I'm wondering about is: would it be better to just use `clang::index::SymbolRole` here (which has `Relation___Of` entries that cover what we're interested in) instead of having our own enum? nridge: One thing I'm wondering about is: would it be better to just use `clang::index::SymbolRole`…
				kadircetUnsubmitted Done Reply Inline Actions I totally agree, but that struct is 4 bytes. I am not sure it is worth the trade off when storing the relationslab, but since this patch is not related to serialization let's get rid of RelationKind and make use of clang::index::SymbolRole. When landing serialization part we can either split clang::index::SymbolRole into two parts or add some custom mapping to serialize only relevant bits etc. kadircet: I totally agree, but that struct is 4 bytes. I am not sure it is worth the trade off when…
	/// \brief A query string for the fuzzy find. This is matched against symbols'			/// \brief A query string for the fuzzy find. This is matched against symbols'
	/// un-qualified identifiers and should not contain qualifiers like "::".			/// un-qualified identifiers and should not contain qualifiers like "::".
	std::string Query;			std::string Query;
	/// \brief If this is non-empty, symbols must be in at least one of the scopes			/// \brief If this is non-empty, symbols must be in at least one of the scopes
	/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"			/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"
	/// is provided, the matched symbols must be defined in namespace xyz but not			/// is provided, the matched symbols must be defined in namespace xyz but not
	/// namespace xyz::abc.			/// namespace xyz::abc.
	///			///
	/// The global scope is "", a top level scope is "foo::", etc.			/// The global scope is "", a top level scope is "foo::", etc.
	std::vector<std::string> Scopes;			std::vector<std::string> Scopes;
	/// If set to true, allow symbols from any scope. Scopes explicitly listed			/// If set to true, allow symbols from any scope. Scopes explicitly listed
	/// above will be ranked higher.			/// above will be ranked higher.
	bool AnyScope = false;			bool AnyScope = false;
	/// \brief The number of top candidates to return. The index may choose to			/// \brief The number of top candidates to return. The index may choose to
	/// return more than this, e.g. if it doesn't know which candidates are best.			/// return more than this, e.g. if it doesn't know which candidates are best.
	llvm::Optional<uint32_t> Limit;			llvm::Optional<uint32_t> Limit;
	/// If set to true, only symbols for completion support will be considered.			/// If set to true, only symbols for completion support will be considered.
	bool RestrictForCodeCompletion = false;			bool RestrictForCodeCompletion = false;
				gribozavrUnsubmitted Done Reply Inline Actions `struct Relation`? And in the comments for it, please explain which way the relationship is directed (is the SymbolID in the key the subtype? or is the SymbolID in the ArrayRef the subtype?). gribozavr: `struct Relation`? And in the comments for it, please explain which way the relationship is…
				kadircetUnsubmitted Done Reply Inline Actions Ah exactly my thoughts, forget to mention this. I believe current usage is the counter-intuitive one. For example, we will most likely query with something like: `getRelations(SymbolID, baseOf)` to get all relations where `SymbolID` is `baseOf` something else(which says get children of `SymbolID`) So that this valueType can be read like, `SymbolID` is `RelationKind` every `SymbolID inside array` WDYT? kadircet: Ah exactly my thoughts, forget to mention this. I believe current usage is the counter…
				nridgeAuthorUnsubmitted Done Reply Inline Actions The way I was thinking of it is that `getRelations(SymbolID, baseOf)` would return all the bases of `SymbolID`. However, the opposite interpretation is also reasonable, we just need to pick one and document it. I'm happy to go with your suggested one. nridge: The way I was thinking of it is that `getRelations(SymbolID, baseOf)` would return all the…
				kadircetUnsubmitted Done Reply Inline Actions It looks like IndexingAPI is also using the interpretation I suggested, so let's move with that one if you don't have any other concerns. kadircet: It looks like IndexingAPI is also using the interpretation I suggested, so let's move with that…
				nridgeAuthorUnsubmitted Done Reply Inline Actions Already updated to this interpretation :) nridge: Already updated to this interpretation :)
	/// Contextually relevant files (e.g. the file we're code-completing in).			/// Contextually relevant files (e.g. the file we're code-completing in).
	/// Paths should be absolute.			/// Paths should be absolute.
	std::vector<std::string> ProximityPaths;			std::vector<std::string> ProximityPaths;
	/// Preferred types of symbols. These are raw representation of `OpaqueType`.			/// Preferred types of symbols. These are raw representation of `OpaqueType`.
	std::vector<std::string> PreferredTypes;			std::vector<std::string> PreferredTypes;

	bool operator==(const FuzzyFindRequest &Req) const {			bool operator==(const FuzzyFindRequest &Req) const {
	return std::tie(Query, Scopes, Limit, RestrictForCodeCompletion,			return std::tie(Query, Scopes, Limit, RestrictForCodeCompletion,
	ProximityPaths, PreferredTypes) ==			ProximityPaths, PreferredTypes) ==
	std::tie(Req.Query, Req.Scopes, Req.Limit,			std::tie(Req.Query, Req.Scopes, Req.Limit,
	Req.RestrictForCodeCompletion, Req.ProximityPaths,			Req.RestrictForCodeCompletion, Req.ProximityPaths,
	Req.PreferredTypes);			Req.PreferredTypes);
	}			}
	bool operator!=(const FuzzyFindRequest &Req) const { return !(*this == Req); }			bool operator!=(const FuzzyFindRequest &Req) const { return !(*this == Req); }
	};			};
	bool fromJSON(const llvm::json::Value &Value, FuzzyFindRequest &Request);			bool fromJSON(const llvm::json::Value &Value, FuzzyFindRequest &Request);
				kadircetUnsubmitted Done Reply Inline Actions use capacity instead of size kadircet: use capacity instead of size
				nridgeAuthorUnsubmitted Done Reply Inline Actions Note, `RefSlab::bytes()` (which I where I copied this from) uses `size()` as well. Should I change that too? nridge: Note, `RefSlab::bytes()` (which I where I copied this from) uses `size()` as well. Should I…
				gribozavrUnsubmitted Done Reply Inline Actions I'd say yes -- in a separate patch though. Thanks for catching it! gribozavr: I'd say yes -- in a separate patch though. Thanks for catching it!
	llvm::json::Value toJSON(const FuzzyFindRequest &Request);			llvm::json::Value toJSON(const FuzzyFindRequest &Request);

	struct LookupRequest {			struct LookupRequest {
	llvm::DenseSet<SymbolID> IDs;			llvm::DenseSet<SymbolID> IDs;
	};			};

	struct RefsRequest {			struct RefsRequest {
	llvm::DenseSet<SymbolID> IDs;			llvm::DenseSet<SymbolID> IDs;
				kadircetUnsubmitted Done Reply Inline Actions I am not sure comment currently applies to RelationSlab internals, since everything is of value type anyways, there are no references. kadircet: I am not sure comment currently applies to RelationSlab internals, since everything is of value…
	RefKind Filter = RefKind::All;			RefKind Filter = RefKind::All;
	/// If set, limit the number of refers returned from the index. The index may			/// If set, limit the number of refers returned from the index. The index may
	/// choose to return less than this, e.g. it tries to avoid returning stale			/// choose to return less than this, e.g. it tries to avoid returning stale
	/// results.			/// results.
	llvm::Optional<uint32_t> Limit;			llvm::Optional<uint32_t> Limit;
	};			};

	/// Interface for symbol indexes that can be used for searching or			/// Interface for symbol indexes that can be used for searching or
	/// matching symbols among a set of symbols based on names or unique IDs.			/// matching symbols among a set of symbols based on names or unique IDs.
	class SymbolIndex {			class SymbolIndex {
	public:			public:
	virtual ~SymbolIndex() = default;			virtual ~SymbolIndex() = default;

	/// \brief Matches symbols in the index fuzzily and applies \p Callback on			/// \brief Matches symbols in the index fuzzily and applies \p Callback on
	/// each matched symbol before returning.			/// each matched symbol before returning.
	/// If returned Symbols are used outside Callback, they must be deep-copied!			/// If returned Symbols are used outside Callback, they must be deep-copied!
	///			///
	/// Returns true if there may be more results (limited by Req.Limit).			/// Returns true if there may be more results (limited by Req.Limit).
	virtual bool			virtual bool
	fuzzyFind(const FuzzyFindRequest &Req,			fuzzyFind(const FuzzyFindRequest &Req,
	llvm::function_ref<void(const Symbol &)> Callback) const = 0;			llvm::function_ref<void(const Symbol &)> Callback) const = 0;
				gribozavrUnsubmitted Done Reply Inline Actions Please move all new declarations into `Relation.h`. gribozavr: Please move all new declarations into `Relation.h`.

	/// Looks up symbols with any of the given symbol IDs and applies \p Callback			/// Looks up symbols with any of the given symbol IDs and applies \p Callback
	/// on each matched symbol.			/// on each matched symbol.
	/// The returned symbol must be deep-copied if it's used outside Callback.			/// The returned symbol must be deep-copied if it's used outside Callback.
	virtual void			virtual void
	lookup(const LookupRequest &Req,			lookup(const LookupRequest &Req,
	llvm::function_ref<void(const Symbol &)> Callback) const = 0;			llvm::function_ref<void(const Symbol &)> Callback) const = 0;

	Show All 40 Lines

clang-tools-extra/clangd/index/Relation.h

This file was added.

				//===--- Relation.h ----------------------------------------------- C++--===//
				//
				gribozavrUnsubmitted Done Reply Inline Actions "--- Relation.h --------------------" gribozavr: "--- Relation.h --------------------"
				gribozavrUnsubmitted Done Reply Inline Actions Not done? gribozavr: Not done?
				nridgeAuthorUnsubmitted Done Reply Inline Actions (Sorry, I have these comments addressed locally, was just waiting for the resolution of the remaining issue before uploading.) nridge: (Sorry, I have these comments addressed locally, was just waiting for the resolution of the…
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_RELATION_H
				#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_RELATION_H

				#include "SymbolID.h"
				#include "SymbolLocation.h"
				#include "clang/Index/IndexSymbol.h"
				#include "llvm/ADT/iterator_range.h"
				#include <cstdint>
				#include <utility>

				namespace clang {
				namespace clangd {

				/// Represents a relation between two symbols.
				/// For an example "A is a base class of B" may be represented
				/// as { Subject = A, Predicate = RelationBaseOf, Object = B }.
				struct Relation {
				SymbolID Subject;
				index::SymbolRole Predicate;
				SymbolID Object;

				bool operator==(const Relation &Other) const {
				return Subject == Other.Subject && Predicate == Other.Predicate &&
				kadircetUnsubmitted Done Reply Inline Actions nit: could you use `std::tie` kadircet: nit: could you use `std::tie`
				Object == Other.Object;
				}
				// SPO order
				bool operator<(const Relation &Other) const {
				kadircetUnsubmitted Done Reply Inline Actions again `std::tie` kadircet: again `std::tie`
				if (Subject < Other.Subject) {
				return true;
				}
				if (Subject == Other.Subject) {
				if (Predicate < Other.Predicate) {
				return true;
				}
				gribozavrUnsubmitted Done Reply Inline Actions Three slashes for doc comments, please. gribozavr: Three slashes for doc comments, please.
				gribozavrUnsubmitted Done Reply Inline Actions Not done? gribozavr: Not done?

				if (Predicate == Other.Predicate) {
				return Object < Other.Object;
				}
				gribozavrUnsubmitted Done Reply Inline Actions Lift it up into the `clang::clangd` namespace? (like `Symbol` and `Ref`) gribozavr: Lift it up into the `clang::clangd` namespace? (like `Symbol` and `Ref`)
				nridgeAuthorUnsubmitted Done Reply Inline Actions This comment made me realize that I haven't addressed your previous comment properly: I haven't changed `RelationSlab::value_type` from `std::pair<RelationKey, llvm::ArrayRef<SymbolID>>` to `Relation`. I tried to make that change this time, and ran into a problem: In the rest of the subtypes patch (D58880), one of the things I do is extend the `MemIndex` constructor so that, in addition to taking a symbol range and a ref range, it takes a relation range. That constructor assumes that the elements of that range have members of some name - either `first` and `second` (as currently in D58880), or `Key` and `Value`. However, that constructor has two call sites. In `MemIndex::build()`, we pass in the slabs themselves as the ranges. So, if we make this change, the field names for that call site will be `Key` and `Value`. However, for the second call site in `FileSymbols::buildIndex()`, the ranges that are passed in are `DenseMap`s, and therefore their elements' field names are necessarily `first` and `second`. The same constructor cannot therefore accept both ranges. How do you propose we address this? Scrap `struct Relation`, and keep `value_type` as `std::pair<RelationKey, llvm::ArrayRef<SymbolID>>`? Keep `struct Relation`, but make its fields named `first` and `second`? Split the constructor of `MemIndex` into two constructors, to accomodate both sets of field names? Something else? nridge: This comment made me realize that I haven't addressed your previous comment properly: I haven't…
				gribozavrUnsubmitted Done Reply Inline Actions I guess we should scrap it then. Kadir, WDYT? gribozavr: I guess we should scrap it then. Kadir, WDYT?
				}
				return false;
				}
				};

				class RelationSlab {
				kadircetUnsubmitted Done Reply Inline Actions I believe `RelationSlab` should still be immutable, even after the `build` operation below `RelationSlab` is mutable. Let's introduce the builder class and move mutating operations and build into it. So that it would be more symmetric to other slabs we have. Also are there any use-cases requiring RelationSlab to be mutable? kadircet: I believe `RelationSlab` should still be immutable, even after the `build` operation below…
				nridgeAuthorUnsubmitted Done Reply Inline Actions I'm not aware of any use-cases requiring it to be mutable. I'm making it immutable as suggested. nridge: I'm not aware of any use-cases requiring it to be mutable. I'm making it immutable as suggested.
				public:
				using value_type = Relation;
				using const_iterator = std::vector<value_type>::const_iterator;
				using iterator = const_iterator;

				// We don't need a separate builder type for this, but reserve
				// that possibility for the future. Having a build() method is
				// also useful so when know when to sort the relations.
				kadircetUnsubmitted Done Reply Inline Actions there seems to be some typos in the comment kadircet: there seems to be some typos in the comment
				using Builder = RelationSlab;

				RelationSlab() = default;
				RelationSlab(RelationSlab &&Slab) = default;
				RelationSlab &operator=(RelationSlab &&RHS) = default;

				const_iterator begin() const { return Relations.begin(); }
				const_iterator end() const { return Relations.end(); }
				size_t size() const { return Relations.size(); }
				gribozavrUnsubmitted Done Reply Inline Actions No need to repeat the type name being documented. "A mutable container that can ..." gribozavr: No need to repeat the type name being documented. "A mutable container that can ..."
				gribozavrUnsubmitted Done Reply Inline Actions Not done? gribozavr: Not done?
				bool empty() const { return Relations.empty(); }

				void push_back(const Relation &R) { Relations.push_back(R); }
				kadircetUnsubmitted Done Reply Inline Actions maybe rename to `insert` to keep symmetry with other slabs? also it suggests a semantics on the operation that we don't care about. kadircet: maybe rename to `insert` to keep symmetry with other slabs? also it suggests a semantics on the…

				size_t bytes() const {
				return sizeof(this) + sizeof(value_type) Relations.capacity();
				}
				kadircetUnsubmitted Done Reply Inline Actions maybe use a set so that we can be sure that there won't be any duplicates. sorry for missing that in the previous iteration. kadircet: maybe use a set so that we can be sure that there won't be any duplicates. sorry for missing…
				sammccallUnsubmitted Done Reply Inline Actions (you're sorting the array at the end anyway, so sort + unique + erase seems neater) sammccall: (you're sorting the array at the end anyway, so sort + unique + erase seems neater)

				/// Sort relations in SPO order.
				void sort();
				kadircetUnsubmitted Done Reply Inline Actions maybe even get rid of sort by inlining it into build? kadircet: maybe even get rid of sort by inlining it into build?

				/// Lookup all relations matching the given subject and predicate.
				/// Assumes the slab is sorted in SPO order.
				kadircetUnsubmitted Done Reply Inline Actions again, we could get rid of this comment if slab was immutable kadircet: again, we could get rid of this comment if slab was immutable
				llvm::iterator_range<iterator> lookup(const SymbolID &Subject,
				index::SymbolRole Predicate) const;

				RelationSlab build() &&;

				private:
				RelationSlab(std::vector<Relation> Relations)
				kadircetUnsubmitted Done Reply Inline Actions Do we really need this? kadircet: Do we really need this?
				nridgeAuthorUnsubmitted Done Reply Inline Actions I think so. The builder needs a way to construct the slab. Aggregate initialization doesn't work because the class already has declared default and copy constructors. nridge: I think so. The builder needs a way to construct the slab. Aggregate initialization doesn't…
				: Relations(std::move(Relations)) {}

				std::vector<Relation> Relations;
				};

				} // namespace clangd
				} // namespace clang

				#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_RELATION_H

clang-tools-extra/clangd/index/Relation.cpp

This file was added.

				//===--- Relation.cpp --------------------------------------------- C++--===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "Relation.h"

				#include <algorithm>

				namespace clang {
				namespace clangd {

				void RelationSlab::sort() { std::sort(Relations.begin(), Relations.end()); }

				llvm::iterator_range<RelationSlab::iterator>
				RelationSlab::lookup(const SymbolID &Subject,
				kadircetUnsubmitted Done Reply Inline Actions I would suggest adding another version which returns all the relations a given `Subject` participates, but I suppose currently we don't have any use cases. Feel free to add or skip, that can be added when necessary. kadircet: I would suggest adding another version which returns all the relations a given `Subject`…
				nridgeAuthorUnsubmitted Done Reply Inline Actions I'd rather add this when a use case arises. nridge: I'd rather add this when a use case arises.
				index::SymbolRole Predicate) const {
				auto IterPair = std::equal_range(Relations.begin(), Relations.end(),
				Relation{Subject, Predicate, SymbolID{}},
				[](const Relation &A, const Relation &B) {
				return (A.Subject < B.Subject) \|\|
				kadircetUnsubmitted Done Reply Inline Actions `std::tie` kadircet: `std::tie`
				(A.Subject == B.Subject &&
				A.Predicate < B.Predicate);
				});
				return {IterPair.first, IterPair.second};
				}

				RelationSlab RelationSlab::build() && {
				sort();
				return std::move(*this);
				}

				} // namespace clangd
				} // namespace clang

clang-tools-extra/clangd/unittests/IndexTests.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	TEST(SymbolSlab, FindAndIterate) {

SymbolSlab S = std::move(B).build();		SymbolSlab S = std::move(B).build();
EXPECT_THAT(S, UnorderedElementsAre(Named("X"), Named("Y"), Named("Z")));		EXPECT_THAT(S, UnorderedElementsAre(Named("X"), Named("Y"), Named("Z")));
EXPECT_EQ(S.end(), S.find(SymbolID("W")));		EXPECT_EQ(S.end(), S.find(SymbolID("W")));
for (const char *Sym : {"X", "Y", "Z"})		for (const char *Sym : {"X", "Y", "Z"})
EXPECT_THAT(*S.find(SymbolID(Sym)), Named(Sym));		EXPECT_THAT(*S.find(SymbolID(Sym)), Named(Sym));
}		}

		TEST(RelationSlab, Lookup) {
		kadircetUnsubmitted Done Reply Inline Actions could you also add a test case that's inserting duplicate relations? kadircet: could you also add a test case that's inserting duplicate relations?
		SymbolID A{"A"};
		SymbolID B{"B"};
		SymbolID C{"C"};
		SymbolID D{"D"};

		RelationSlab::Builder Builder;
		Builder.push_back(Relation{A, index::SymbolRole::RelationBaseOf, B});
		Builder.push_back(Relation{A, index::SymbolRole::RelationBaseOf, C});
		Builder.push_back(Relation{B, index::SymbolRole::RelationBaseOf, D});
		Builder.push_back(Relation{C, index::SymbolRole::RelationBaseOf, D});
		Builder.push_back(Relation{B, index::SymbolRole::RelationChildOf, A});
		Builder.push_back(Relation{C, index::SymbolRole::RelationChildOf, A});
		Builder.push_back(Relation{D, index::SymbolRole::RelationChildOf, B});
		Builder.push_back(Relation{D, index::SymbolRole::RelationChildOf, C});

		RelationSlab Slab = std::move(Builder).build();
		EXPECT_THAT(
		Slab.lookup(A, index::SymbolRole::RelationBaseOf),
		UnorderedElementsAre(Relation{A, index::SymbolRole::RelationBaseOf, B},
		Relation{A, index::SymbolRole::RelationBaseOf, C}));
		}

TEST(SwapIndexTest, OldIndexRecycled) {		TEST(SwapIndexTest, OldIndexRecycled) {
auto Token = std::make_shared<int>();		auto Token = std::make_shared<int>();
std::weak_ptr<int> WeakToken = Token;		std::weak_ptr<int> WeakToken = Token;

SwapIndex S(llvm::make_unique<MemIndex>(		SwapIndex S(llvm::make_unique<MemIndex>(
SymbolSlab(), RefSlab(), std::move(Token), /BackingDataSize=/0));		SymbolSlab(), RefSlab(), std::move(Token), /BackingDataSize=/0));
EXPECT_FALSE(WeakToken.expired()); // Current MemIndex keeps it alive.		EXPECT_FALSE(WeakToken.expired()); // Current MemIndex keeps it alive.
S.reset(llvm::make_unique<MemIndex>()); // Now the MemIndex is destroyed.		S.reset(llvm::make_unique<MemIndex>()); // Now the MemIndex is destroyed.
▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add RelationSlabClosedPublic

Details

Diff Detail

Event Timeline