This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clangd/
1/1
ClangdLSPServer.cpp
-
SourceCode.h
-
SourceCode.cpp
-
index/
4/11
SymbolCollector.cpp
-
unittests/clangd/
-
clangd/
2/3
SymbolCollectorTests.cpp

Differential D46751

[clangd] Filter out private proto symbols in SymbolCollector.
ClosedPublic

Authored by ioeric on May 11 2018, 5:46 AM.

Download Raw Diff

Details

Reviewers

ilya-biryukov
malaperle

Commits

rGd67ec24f3efd: [clangd] Filter out private proto symbols in SymbolCollector.
rCTE332456: [clangd] Filter out private proto symbols in SymbolCollector.
rL332456: [clangd] Filter out private proto symbols in SymbolCollector.

Summary

This uses heuristics to identify private proto symbols. For example,
top-level symbols whose name contains "_" are considered private. These symbols
are not expected to be used by users.

Diff Detail

Repository

rCTE Clang Tools Extra

Build Status

Buildable 18174
Build 18174: arc lint + arc unit

Event Timeline

ioeric created this revision.May 11 2018, 5:46 AM

Herald added subscribers: cfe-commits, jkorous, MaskRay, klimek. · View Herald TranscriptMay 11 2018, 5:46 AM

ilya-biryukov added inline comments.May 11 2018, 6:24 AM

clangd/index/SymbolCollector.cpp
95	NIT: reduce nesting by inverting if condition (see LLVM style guide)
171	Maybe add a comment mentioning that we remove internal symbols from protobuf code generator here? I can easily decipher what happens here, because I know of protos, but I can imagine the comment might be very helpful to someone reading the code and being unfamiliar with protos.
172	We should also run the same code to filter our those decls coming from sema completions. Otherwise, they might still pop up in completion results if internal proto symbols were deserialized from preamble by clang before running the code completion.
unittests/clangd/SymbolCollectorTests.cpp
713	Maybe also test that the same symbol is not excluded in the file that does not end with `.proto.h`?

Can there be an option for this? This seems very library specific and could break other code bases. Ideally, there would be a generic mechanism for this kind of filtering, i.e. specify a pattern of excluded files or symbol names. But I understand this would be cumbersome because you want to filter only *some* symbol names in *some* files, so it would be difficult for users to specify this intersection of conditions on command-line arguments, for example. I think this needs to be discussed a bit more or have this turned off by default (with an option to turn on!) until there is a more general solution for this kind of filtering.

In D46751#1095894, @malaperle wrote:

Can there be an option for this? This seems very library specific and could break other code bases. Ideally, there would be a generic mechanism for this kind of filtering, i.e. specify a pattern of excluded files or symbol names. But I understand this would be cumbersome because you want to filter only *some* symbol names in *some* files, so it would be difficult for users to specify this intersection of conditions on command-line arguments, for example. I think this needs to be discussed a bit more or have this turned off by default (with an option to turn on!) until there is a more general solution for this kind of filtering.

Having user-configurable filtering may certainly be useful, but requires some design to get right.
And even if we have it, I think there's value in automatically handling popular frameworks, unless we know it might break other code in practice.

E.g., for protobuf, we know that generated headers always end with .proto.h. We could also check for comments that proto compiler tends to put into the generated files to be sure. If we do that, I feel it's better to have the filtering for protos enabled by default, since there's almost zero chance that people had a file that ends with .proto.h and put a proto compiler comment into it.
But even the .proto.h ending seems like a good enough indication.

ioeric added a reviewer: malaperle.May 11 2018, 7:04 AM

ioeric removed a subscriber: malaperle.

So, the first line of the file generated by proto compiler seems to be something like this:

// Generated by the protocol buffer compiler.  DO NOT EDIT!

If we check the symbol comes from a file with this comment, there will be zero chance that we guess it wrong.
And we can always filter, in addition to the user-provided filters that @malaperle proposed (which also sound like a very useful feature to me!)

@ioeric, @malaperle, @sammccall, WDYT?

In D46751#1095923, @ilya-biryukov wrote:

In D46751#1095894, @malaperle wrote:

Can there be an option for this? This seems very library specific and could break other code bases. Ideally, there would be a generic mechanism for this kind of filtering, i.e. specify a pattern of excluded files or symbol names. But I understand this would be cumbersome because you want to filter only *some* symbol names in *some* files, so it would be difficult for users to specify this intersection of conditions on command-line arguments, for example. I think this needs to be discussed a bit more or have this turned off by default (with an option to turn on!) until there is a more general solution for this kind of filtering.

Having user-configurable filtering may certainly be useful, but requires some design to get right.
And even if we have it, I think there's value in automatically handling popular frameworks, unless we know it might break other code in practice.

Here :)
http://www.sidefx.com/docs/hdk/_s_o_p___bone_capture_lines_8proto_8h_source.html

I feel it's better to have the filtering for protos enabled by default

I like the idea that things work "out of the box", we have to make sure that it doesn't make it buggy for certain code bases and impossible to work around.

In D46751#1095926, @ilya-biryukov wrote:
So, the first line of the file generated by proto compiler seems to be something like this:
// Generated by the protocol buffer compiler.  DO NOT EDIT!
If we check the symbol comes from a file with this comment, there will be zero chance that we guess it wrong.
And we can always filter, in addition to the user-provided filters that @malaperle proposed (which also sound like a very useful feature to me!)

@ioeric, @malaperle, @sammccall, WDYT?

I think this is good if that's true that the comment is always there. I think it would be OK for this to be enabled by default, with a general option to turn heuristics off. Not sure what to call it... -use-symbol-filtering-heuristics :)

If handling for other libraries is added later it would be good to split out this code a bit. A collection of "filters" could be passed to symbol collector. Each filter/framework-handling could be in it's own source file. Later...

Addressed a few comments.

Harbormaster completed remote builds in B17997: Diff 146338.May 11 2018, 9:07 AM

Thanks for sharing the example Marc! It's a bit surprising to see files that are not protobuf-generated named proto.h.

I'm not a big fan of parsing file comment in proto. It seems a bit cumbersome and we might not be able (or too expensive) to do so for completion results from sema (if we do want to filter at completion time).

Pattern-based filtering could be an option as it wouldn't require code modification and could support potentially more filters, although I'm a bit worries about rules getting too complicated (e.g. filters on symbol kinds etc) or running into limitation.

But for now, it seems to me that the easiest approach is putting an option around proto heuristic for the symbol collector, until we need more filters.

clangd/index/SymbolCollector.cpp
172	Just want to clarify before going further. IIUC, in index-based completion, the preamble could still contain symbols from headers such that sema completion could still give you symbols from headers. If we do need to build the filter into code completion, we would need more careful design as code completion code path is more latency sensitive.
unittests/clangd/SymbolCollectorTests.cpp
713	Sounds good.

Here :)
http://www.sidefx.com/docs/hdk/_s_o_p___bone_capture_lines_8proto_8h_source.html

Didn't take along to find an example. Thanks for digging this up. That looks like a good enough reason to not apply proto-specific filtering based solely on the filename...

I like the idea that things work "out of the box", we have to make sure that it doesn't make it buggy for certain code bases and impossible to work around.

Totally agree, we should only enable something that we all agree is good enough at detecting the frameworks that it never guesses wrong on real code.

If handling for other libraries is added later it would be good to split out this code a bit. A collection of "filters" could be passed to symbol collector. Each filter/framework-handling could be in it's own source file. Later...

That LG, I guess we can iterate on the design in a CL, design doc or an email thread. However, it's outside the scope of this patch probably.

I'm not a big fan of parsing file comment in proto. It seems a bit cumbersome and we might not be able (or too expensive) to do so for completion results from sema (if we do want to filter at completion time).

Why do you feel it's cubersome? Getting the first line of the file from SourceManager and looking at it seems easy.
I certainly don't see why this should be expensive if we do it at the right time (perhaps it means doing that when building the preamble and stashing the results alongside it, but that's also easy in our current setup).

clangd/index/SymbolCollector.cpp
172	For reference. As discussed outside this thread, we might have decls from headers in the sema completion items. It does not seem too hard to add filtering for those as well: essentially, we just need to call the same function at code completion time.

Add heuristic to reduce false position on identifying proto headers

I think this is good if that's true that the comment is always there. I think it would be OK for this to be enabled by default, with a general option to turn heuristics off. Not sure what to call it... -use-symbol-filtering-heuristics :)

@malaperle Having an option for filtering heuristics seems a bit confusing. We have other filters in the symbol collector that we think could improve user experience, and we don't provide options for those. Similarly, for proto headers, I think we could also get away without such an option if we strike for low/no false positive (e.g. correctly identify proto headers). If folks run into problems with the filter, we would like to understand the use cases and improve the filters. In general, we think the proto filter, when it works, would improve user experience.

I'm not a big fan of parsing file comment in proto. It seems a bit cumbersome and we might not be able (or too expensive) to do so for completion results from sema (if we do want to filter at completion time).

Why do you feel it's cubersome? Getting the first line of the file from SourceManager and looking at it seems easy.
I certainly don't see why this should be expensive if we do it at the right time (perhaps it means doing that when building the preamble and stashing the results alongside it, but that's also easy in our current setup).

@ilya-biryukov You are right, getting a working solution seems easy enough. I was more concerned about finding a design that doesn't intrude completion workflow with library specific logic. But this makes more sense now that we are not doing the filtering on code completion workflow.

clangd/index/SymbolCollector.cpp
172	After further discussion, we agreed that filtering on code completion results may not be the right approach. For example, it would break assumptions in code completion workflow e.g. limit of completion results from index. Alternatively, we could push filtering to the symbol source level. Currently, we have two sources of symbols: sema and index, and both of them can gather symbols from headers, which are then merged in code completion. With this setup, we would need to apply filters on both sources if we want to do any filtering on header symbols. One solution (for index-based completion) is to make sema only collect symbols in main file (just for code completion) and make indexer index headers (current behavior), where we would only need to filter on index. This doesn't address problem for sema-only code completion, but we think it's not a priority to strike for feature parity between sema-based completion and index-based completion, which we don't really have at this point. So to proceed: I'll go ahead with the current approach (filter index only) with a stricter check for proto headers. Make sema only collect symbols in main files. Potentially also apply the filter on sema completion results.

@malaperle to expand on what Eric said, adding proto hacks with false positives and no way to turn them off is indeed not the way to go here!
There's probably going to be other places we want to filter symbols too, and it should probably be extensible/customizable in some way.
We don't yet have enough examples to know what the structure should be (something regex based, a code-plugin system based on Registry, or something in between), thus the simplest/least invasive option for now (it's important for our internal rollout to have *some* mitigation in place).

@ioeric can you add a comment near the proto-filtering stuff indicating we should work out how to make this extensible?

In D46751#1099097, @ioeric wrote:

I think this is good if that's true that the comment is always there. I think it would be OK for this to be enabled by default, with a general option to turn heuristics off. Not sure what to call it... -use-symbol-filtering-heuristics :)

We have other filters in the symbol collector that we think could improve user experience, and we don't provide options for those.

What others filters do you mean? If you mean skipping "members", symbols in main files, etc, I a working on making them not skipped, see D44954.

In D46751#1099479, @malaperle wrote:

In D46751#1099097, @ioeric wrote:

I think this is good if that's true that the comment is always there. I think it would be OK for this to be enabled by default, with a general option to turn heuristics off. Not sure what to call it... -use-symbol-filtering-heuristics :)

We have other filters in the symbol collector that we think could improve user experience, and we don't provide options for those.

What others filters do you mean? If you mean skipping "members", symbols in main files, etc, I a working on making them not skipped, see D44954.

I meant the filters in https://github.com/llvm-mirror/clang-tools-extra/blob/master/clangd/index/SymbolCollector.cpp#L93 e.g. filtering symbols in anonymous namespace, which we think should never appear in the index.

I think members are more interesting than the private proto symbols. We want to un-filter members because there are features that would use them, so indexing them makes sense. But private proto symbols should never be shown to users (e.g. in code completion or workspaceSymbols).

I also think adding an option for indexing members would actually make more sense because they might significantly increase the index size, and it would be good to have options to disable it if users don't use members (e.g. including members can increase size of our internal global index service, and we are not sure if we are ready for that).

In D46751#1099235, @sammccall wrote:

@malaperle to expand on what Eric said, adding proto hacks with false positives and no way to turn them off is indeed not the way to go here!
There's probably going to be other places we want to filter symbols too, and it should probably be extensible/customizable in some way.
We don't yet have enough examples to know what the structure should be (something regex based, a code-plugin system based on Registry, or something in between), thus the simplest/least invasive option for now (it's important for our internal rollout to have *some* mitigation in place).
@ioeric can you add a comment near the proto-filtering stuff indicating we should work out how to make this extensible?

I agree with all of that. What I don't quite understand is why a flag is not ok? Just a fail-safe switch in the mean time? You can even leave it on by default so your internal service is not affected. We know for a fact that some code bases like Houdini won't work with this, at least there will be an option to make it work.

In D46751#1099537, @ioeric wrote:

In D46751#1099479, @malaperle wrote:

In D46751#1099097, @ioeric wrote:

I think this is good if that's true that the comment is always there. I think it would be OK for this to be enabled by default, with a general option to turn heuristics off. Not sure what to call it... -use-symbol-filtering-heuristics :)

We have other filters in the symbol collector that we think could improve user experience, and we don't provide options for those.

What others filters do you mean? If you mean skipping "members", symbols in main files, etc, I a working on making them not skipped, see D44954.

I meant the filters in https://github.com/llvm-mirror/clang-tools-extra/blob/master/clangd/index/SymbolCollector.cpp#L93 e.g. filtering symbols in anonymous namespace, which we think should never appear in the index.

I'll be looking at adding them too. For workspaceSymbols it's useful to be able to find them and matches what we had before. But completion will not pull them from the index.

I think members are more interesting than the private proto symbols. We want to un-filter members because there are features that would use them, so indexing them makes sense. But private proto symbols should never be shown to users (e.g. in code completion or workspaceSymbols).

I also think adding an option for indexing members would actually make more sense because they might significantly increase the index size, and it would be good to have options to disable it if users don't use members (e.g. including members can increase size of our internal global index service, and we are not sure if we are ready for that).

It sounds like we'll need both flags. We should discuss that because I'm planning to add even more (almost all?) symbols. I don't think it's common that users won't want members for workspaceSymbols though, but I see how this is not good for the internal indexing service.

In D46751#1099633, @malaperle wrote:

In D46751#1099235, @sammccall wrote:

@malaperle to expand on what Eric said, adding proto hacks with false positives and no way to turn them off is indeed not the way to go here!
There's probably going to be other places we want to filter symbols too, and it should probably be extensible/customizable in some way.
We don't yet have enough examples to know what the structure should be (something regex based, a code-plugin system based on Registry, or something in between), thus the simplest/least invasive option for now (it's important for our internal rollout to have *some* mitigation in place).
@ioeric can you add a comment near the proto-filtering stuff indicating we should work out how to make this extensible?

I agree with all of that. What I don't quite understand is why a flag is not ok? Just a fail-safe switch in the mean time? You can even leave it on by default so your internal service is not affected.

I think a flag doesn't solve much of the problem, and adds new ones:

users have to find the flag, and work out how to turn it on in their editor, and (other than embedders) few will bother. And each flag hurts the usability of all the other flags.
if this heuristic is usable only sometimes, that's at codebase granularity, not user granularity. Flags don't work that way. (Static index currently has this problem...)
these flags end up in config files, so if we later remove the flag we'll *completely* break clangd for such users

We know for a fact that some code bases like Houdini won't work with this, at least there will be an option to make it work.

Is this still the case after the last revision (with the comment check?)
Agree we should only hardwire this on if we are confident that false positives are vanishingly small.

clangd/index/SymbolCollector.cpp
101	We're going to end up calling this code on every decl/def we see. Am I being paranoid by thinking we should check whether the file is a proto once, rather than doing a bunch of string matching every time?
112	this asserts if the name is not a simple identifier (Maybe operators or something will trigger this?).

In D46751#1099786, @sammccall wrote:

In D46751#1099633, @malaperle wrote:

In D46751#1099235, @sammccall wrote:

@malaperle to expand on what Eric said, adding proto hacks with false positives and no way to turn them off is indeed not the way to go here!
There's probably going to be other places we want to filter symbols too, and it should probably be extensible/customizable in some way.
We don't yet have enough examples to know what the structure should be (something regex based, a code-plugin system based on Registry, or something in between), thus the simplest/least invasive option for now (it's important for our internal rollout to have *some* mitigation in place).
@ioeric can you add a comment near the proto-filtering stuff indicating we should work out how to make this extensible?

I agree with all of that. What I don't quite understand is why a flag is not ok? Just a fail-safe switch in the mean time? You can even leave it on by default so your internal service is not affected.

I think a flag doesn't solve much of the problem, and adds new ones:

users have to find the flag, and work out how to turn it on in their editor, and (other than embedders) few will bother. And each flag hurts the usability of all the other flags.

if this heuristic is usable only sometimes, that's at codebase granularity, not user granularity. Flags don't work that way. (Static index currently has this problem...)

these flags end up in config files, so if we later remove the flag we'll *completely* break clangd for such users

I don't really agree with those points, but...

We know for a fact that some code bases like Houdini won't work with this, at least there will be an option to make it work.

Is this still the case after the last revision (with the comment check?)
Agree we should only hardwire this on if we are confident that false positives are vanishingly small.

...I hadn't noticed the latest version. I think it's safe enough in the new version that we don't need to discuss this much further until it becomes a bigger problem (more libraries, etc).

Address review comments.

clangd/index/SymbolCollector.cpp
101	`s/Symbol/Decl/` We could store a cache in the symbol collector (just need to add another state in the class, remember to invalidate for a new ASTContext, make this a member etc), but I think the matching is cheap enough?
112	Good catch!

...I hadn't noticed the latest version. I think it's safe enough in the new version that we don't need to discuss this much further until it becomes a bigger problem (more libraries, etc).

Sounds good, thanks!

@ilya-biryukov Could you take another look at the patch?

ilya-biryukov added inline comments.May 16 2018, 5:04 AM

clangd/ClangdLSPServer.cpp
274	NIT: not related to this change, but maybe use `std::move(Edits)` to avoid extra copies.
clangd/index/SymbolCollector.cpp
127	The heuristrics that rely on naming style seem too fragile. More thorough heuristics should do better, but are more complicated. Maybe we could leave a fixme saying that these can be improved in case we'll run into problems. WDYT?
unittests/clangd/SymbolCollectorTests.cpp
711	Could you give an intuition on why is this considered private? We don't filter out those operators from other headers, right? Why are proto headers special?

Merge remote-tracking branch 'origin/master' into proto
Addressed review comments.

Harbormaster completed remote builds in B18178: Diff 147058.May 16 2018, 5:12 AM

LGTM

This revision is now accepted and ready to land.May 16 2018, 5:14 AM

Closed by commit rL332456: [clangd] Filter out private proto symbols in SymbolCollector. (authored by ioeric). · Explain WhyMay 16 2018, 5:16 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 16 2018, 5:16 AM

malaperle mentioned this in D44954: [clangd] Add "member" symbols to the index.May 16 2018, 1:07 PM

Revision Contents

Path

Size

clangd/

ClangdLSPServer.cpp

32 lines

SourceCode.h

6 lines

SourceCode.cpp

15 lines

index/

SymbolCollector.cpp

41 lines

unittests/

clangd/

SymbolCollectorTests.cpp

34 lines

Diff 147032

clangd/ClangdLSPServer.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	#endif
uriFromAbsolutePath(llvm::StringRef AbsolutePath) const override {		uriFromAbsolutePath(llvm::StringRef AbsolutePath) const override {
llvm_unreachable("Clangd must never create a test URI.");		llvm_unreachable("Clangd must never create a test URI.");
}		}
};		};

static URISchemeRegistry::Add<TestScheme>		static URISchemeRegistry::Add<TestScheme>
X("test", "Test scheme for clangd lit tests.");		X("test", "Test scheme for clangd lit tests.");

TextEdit replacementToEdit(StringRef Code, const tooling::Replacement &R) {
Range ReplacementRange = {
offsetToPosition(Code, R.getOffset()),
offsetToPosition(Code, R.getOffset() + R.getLength())};
return {ReplacementRange, R.getReplacementText()};
}

std::vector<TextEdit>
replacementsToEdits(StringRef Code,
const std::vector<tooling::Replacement> &Replacements) {
// Turn the replacements into the format specified by the Language Server
// Protocol. Fuse them into one big JSON array.
std::vector<TextEdit> Edits;
for (const auto &R : Replacements)
Edits.push_back(replacementToEdit(Code, R));
return Edits;
}

std::vector<TextEdit> replacementsToEdits(StringRef Code,
const tooling::Replacements &Repls) {
std::vector<TextEdit> Edits;
for (const auto &R : Repls)
Edits.push_back(replacementToEdit(Code, R));
return Edits;
}

SymbolKindBitset defaultSymbolKinds() {		SymbolKindBitset defaultSymbolKinds() {
SymbolKindBitset Defaults;		SymbolKindBitset Defaults;
for (size_t I = SymbolKindMin; I <= static_cast<size_t>(SymbolKind::Array);		for (size_t I = SymbolKindMin; I <= static_cast<size_t>(SymbolKind::Array);
++I)		++I)
Defaults.set(I);		Defaults.set(I);
return Defaults;		return Defaults;
}		}

▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	void ClangdLSPServer::onRename(RenameParams &Params) {
Server.rename(		Server.rename(
File, Params.position, Params.newName,		File, Params.position, Params.newName,
[File, Code,		[File, Code,
Params](llvm::Expected<std::vector<tooling::Replacement>> Replacements) {		Params](llvm::Expected<std::vector<tooling::Replacement>> Replacements) {
if (!Replacements)		if (!Replacements)
return replyError(ErrorCode::InternalError,		return replyError(ErrorCode::InternalError,
llvm::toString(Replacements.takeError()));		llvm::toString(Replacements.takeError()));

std::vector<TextEdit> Edits = replacementsToEdits(Code, Replacements);		// Turn the replacements into the format specified by the Language
		// Server Protocol. Fuse them into one big JSON array.
		std::vector<TextEdit> Edits;
		for (const auto &R : *Replacements)
		Edits.push_back(replacementToEdit(*Code, R));
WorkspaceEdit WE;		WorkspaceEdit WE;
WE.changes = {{Params.textDocument.uri.uri(), Edits}};		WE.changes = {{Params.textDocument.uri.uri(), Edits}};
		ilya-biryukovUnsubmitted Done Reply Inline Actions NIT: not related to this change, but maybe use `std::move(Edits)` to avoid extra copies. ilya-biryukov: NIT: not related to this change, but maybe use `std::move(Edits)` to avoid extra copies.
reply(WE);		reply(WE);
});		});
}		}

void ClangdLSPServer::onDocumentDidClose(DidCloseTextDocumentParams &Params) {		void ClangdLSPServer::onDocumentDidClose(DidCloseTextDocumentParams &Params) {
PathRef File = Params.textDocument.uri.file();		PathRef File = Params.textDocument.uri.file();
DraftMgr.removeDraft(File);		DraftMgr.removeDraft(File);
Server.removeDocument(File);		Server.removeDocument(File);
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

clangd/SourceCode.h

	Show All 9 Lines
	// Various code that examines C++ source code without using heavy AST machinery			// Various code that examines C++ source code without using heavy AST machinery
	// (and often not even the lexer). To be used sparingly!			// (and often not even the lexer). To be used sparingly!
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_SOURCECODE_H			#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_SOURCECODE_H
	#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_SOURCECODE_H			#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_SOURCECODE_H
	#include "Protocol.h"			#include "Protocol.h"
	#include "clang/Basic/SourceLocation.h"			#include "clang/Basic/SourceLocation.h"
				#include "clang/Tooling/Core/Replacement.h"

	namespace clang {			namespace clang {
	class SourceManager;			class SourceManager;

	namespace clangd {			namespace clangd {

	/// Turn a [line, column] pair into an offset in Code.			/// Turn a [line, column] pair into an offset in Code.
	///			///
	Show All 24 Lines
	std::pair<size_t, size_t> offsetToClangLineColumn(llvm::StringRef Code,			std::pair<size_t, size_t> offsetToClangLineColumn(llvm::StringRef Code,
	size_t Offset);			size_t Offset);

	/// From "a::b::c", return {"a::b::", "c"}. Scope is empty if there's no			/// From "a::b::c", return {"a::b::", "c"}. Scope is empty if there's no
	/// qualifier.			/// qualifier.
	std::pair<llvm::StringRef, llvm::StringRef>			std::pair<llvm::StringRef, llvm::StringRef>
	splitQualifiedName(llvm::StringRef QName);			splitQualifiedName(llvm::StringRef QName);

				TextEdit replacementToEdit(StringRef Code, const tooling::Replacement &R);

				std::vector<TextEdit> replacementsToEdits(StringRef Code,
				const tooling::Replacements &Repls);

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang
	#endif			#endif

clangd/SourceCode.cpp

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	std::pair<llvm::StringRef, llvm::StringRef>			std::pair<llvm::StringRef, llvm::StringRef>
	splitQualifiedName(llvm::StringRef QName) {			splitQualifiedName(llvm::StringRef QName) {
	size_t Pos = QName.rfind("::");			size_t Pos = QName.rfind("::");
	if (Pos == llvm::StringRef::npos)			if (Pos == llvm::StringRef::npos)
	return {StringRef(), QName};			return {StringRef(), QName};
	return {QName.substr(0, Pos + 2), QName.substr(Pos + 2)};			return {QName.substr(0, Pos + 2), QName.substr(Pos + 2)};
	}			}

				TextEdit replacementToEdit(StringRef Code, const tooling::Replacement &R) {
				Range ReplacementRange = {
				offsetToPosition(Code, R.getOffset()),
				offsetToPosition(Code, R.getOffset() + R.getLength())};
				return {ReplacementRange, R.getReplacementText()};
				}

				std::vector<TextEdit> replacementsToEdits(StringRef Code,
				const tooling::Replacements &Repls) {
				std::vector<TextEdit> Edits;
				for (const auto &R : Repls)
				Edits.push_back(replacementToEdit(Code, R));
				return Edits;
				}

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang

clangd/index/SymbolCollector.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	if (U)
return U->toString();		return U->toString();
ErrMsg += llvm::toString(U.takeError()) + "\n";		ErrMsg += llvm::toString(U.takeError()) + "\n";
}		}
log(llvm::Twine("Failed to create an URI for file ") + AbsolutePath + ": " +		log(llvm::Twine("Failed to create an URI for file ") + AbsolutePath + ": " +
ErrMsg);		ErrMsg);
return llvm::None;		return llvm::None;
}		}

		// All proto generated headers should start with this line.
		static const char *PROTO_HEADER_COMMENT =
		"// Generated by the protocol buffer compiler. DO NOT EDIT!";
		ilya-biryukovUnsubmitted Done Reply Inline Actions NIT: reduce nesting by inverting if condition (see LLVM style guide) ilya-biryukov: NIT: reduce nesting by inverting if condition (see [[https://llvm.org/docs/CodingStandards.

		// Checks whether the decl is a private symbol in a header generated by
		// protobuf compiler.
		// To identify whether a proto header is actually generated by proto compiler,
		// we check whether it starts with PROTO_HEADER_COMMENT.
		// FIXME: make filtering extensible when there are more use cases for symbol
		sammccallUnsubmitted Not Done Reply Inline Actions We're going to end up calling this code on every decl/def we see. Am I being paranoid by thinking we should check whether the file is a proto once, rather than doing a bunch of string matching every time? sammccall: We're going to end up calling this code on every decl/def we see. Am I being paranoid by…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions `s/Symbol/Decl/` We could store a cache in the symbol collector (just need to add another state in the class, remember to invalidate for a new ASTContext, make this a member etc), but I think the matching is cheap enough? ioeric: `s/Symbol/Decl/` We could store a cache in the symbol collector (just need to add another…
		// filters.
		bool isPrivateProtoDecl(const NamedDecl &ND) {
		const auto &SM = ND.getASTContext().getSourceManager();
		auto Loc = findNameLoc(&ND);
		auto FileName = SM.getFilename(Loc);
		if (!FileName.endswith(".proto.h") && !FileName.endswith(".pb.h"))
		return false;
		auto FID = SM.getFileID(Loc);
		// Double check that this is an actual protobuf header.
		if (!SM.getBufferData(FID).startswith(PROTO_HEADER_COMMENT))
		return false;
		sammccallUnsubmitted Done Reply Inline Actions this asserts if the name is not a simple identifier (Maybe operators or something will trigger this?). sammccall: this asserts if the name is not a simple identifier (Maybe operators or something will trigger…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Good catch! ioeric: Good catch!

		// If ND does not have an identifier/name, it must be private.
		if (ND.getIdentifier() == nullptr)
		return true;
		auto Name = ND.getIdentifier()->getName();
		if (!Name.contains('_'))
		return false;
		// Nested proto entities (e.g. Message::Nested) have top-level decls
		// that shouldn't be used (Message_Nested). Ignore them completely.
		// The nested entities are dangling type aliases, we may want to reconsider
		// including them in the future.
		// For enum constants, SOME_ENUM_CONSTANT is not private and should be
		// indexed. Outer_INNER is private. This heuristic relies on naming style, it
		// will include OUTER_INNER and exclude some_enum_constant.
		return (ND.getKind() != Decl::EnumConstant) \|\|
		ilya-biryukovUnsubmitted Done Reply Inline Actions The heuristrics that rely on naming style seem too fragile. More thorough heuristics should do better, but are more complicated. Maybe we could leave a fixme saying that these can be improved in case we'll run into problems. WDYT? ilya-biryukov: The heuristrics that rely on naming style seem too fragile. More thorough heuristics should do…
		std::any_of(Name.begin(), Name.end(), islower);
		}

bool shouldFilterDecl(const NamedDecl ND, ASTContext ASTCtx,		bool shouldFilterDecl(const NamedDecl ND, ASTContext ASTCtx,
const SymbolCollector::Options &Opts) {		const SymbolCollector::Options &Opts) {
using namespace clang::ast_matchers;		using namespace clang::ast_matchers;
if (ND->isImplicit())		if (ND->isImplicit())
return true;		return true;
// Skip anonymous declarations, e.g (anonymous enum/class/struct).		// Skip anonymous declarations, e.g (anonymous enum/class/struct).
if (ND->getDeclName().isEmpty())		if (ND->getDeclName().isEmpty())
return true;		return true;
Show All 24 Lines	if (match(decl(allOf(unless(isExpansionInMainFile()),
anyOf(InTopLevelScope,		anyOf(InTopLevelScope,
hasDeclContext(enumDecl(InTopLevelScope,		hasDeclContext(enumDecl(InTopLevelScope,
unless(isScoped())))),		unless(isScoped())))),
unless(IsSpecialization))),		unless(IsSpecialization))),
ND, ASTCtx)		ND, ASTCtx)
.empty())		.empty())
return true;		return true;

		// Avoid indexing internal symbols in protobuf generated headers.
		ilya-biryukovUnsubmitted Done Reply Inline Actions Maybe add a comment mentioning that we remove internal symbols from protobuf code generator here? I can easily decipher what happens here, because I know of protos, but I can imagine the comment might be very helpful to someone reading the code and being unfamiliar with protos. ilya-biryukov: Maybe add a comment mentioning that we remove internal symbols from protobuf code generator…
		if (isPrivateProtoDecl(*ND))
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions We should also run the same code to filter our those decls coming from sema completions. Otherwise, they might still pop up in completion results if internal proto symbols were deserialized from preamble by clang before running the code completion. ilya-biryukov: We should also run the same code to filter our those decls coming from sema completions.
		ioericAuthorUnsubmitted Not Done Reply Inline Actions Just want to clarify before going further. IIUC, in index-based completion, the preamble could still contain symbols from headers such that sema completion could still give you symbols from headers. If we do need to build the filter into code completion, we would need more careful design as code completion code path is more latency sensitive. ioeric: Just want to clarify before going further. IIUC, in index-based completion, the preamble…
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions For reference. As discussed outside this thread, we might have decls from headers in the sema completion items. It does not seem too hard to add filtering for those as well: essentially, we just need to call the same function at code completion time. ilya-biryukov: For reference. As discussed outside this thread, we might have decls from headers in the sema…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions After further discussion, we agreed that filtering on code completion results may not be the right approach. For example, it would break assumptions in code completion workflow e.g. limit of completion results from index. Alternatively, we could push filtering to the symbol source level. Currently, we have two sources of symbols: sema and index, and both of them can gather symbols from headers, which are then merged in code completion. With this setup, we would need to apply filters on both sources if we want to do any filtering on header symbols. One solution (for index-based completion) is to make sema only collect symbols in main file (just for code completion) and make indexer index headers (current behavior), where we would only need to filter on index. This doesn't address problem for sema-only code completion, but we think it's not a priority to strike for feature parity between sema-based completion and index-based completion, which we don't really have at this point. So to proceed: I'll go ahead with the current approach (filter index only) with a stricter check for proto headers. Make sema only collect symbols in main files. Potentially also apply the filter on sema completion results. ioeric: After further discussion, we agreed that filtering on code completion results may not be the…
		return true;
return false;		return false;
}		}

// We only collect #include paths for symbols that are suitable for global code		// We only collect #include paths for symbols that are suitable for global code
// completion, except for namespaces since #include path for a namespace is hard		// completion, except for namespaces since #include path for a namespace is hard
// to define.		// to define.
bool shouldCollectIncludePath(index::SymbolKind Kind) {		bool shouldCollectIncludePath(index::SymbolKind Kind) {
using SK = index::SymbolKind;		using SK = index::SymbolKind;
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

unittests/clangd/SymbolCollectorTests.cpp

	Show First 20 Lines • Show All 691 Lines • ▼ Show 20 Lines
	TEST_F(SymbolCollectorTest, UTF16Character) {			TEST_F(SymbolCollectorTest, UTF16Character) {
	// ö is 2-bytes.			// ö is 2-bytes.
	Annotations Header(/Header=/"class [[pörk]] {};");			Annotations Header(/Header=/"class [[pörk]] {};");
	runSymbolCollector(Header.code(), /Main=/"");			runSymbolCollector(Header.code(), /Main=/"");
	EXPECT_THAT(Symbols, UnorderedElementsAre(			EXPECT_THAT(Symbols, UnorderedElementsAre(
	AllOf(QName("pörk"), DeclRange(Header.range()))));			AllOf(QName("pörk"), DeclRange(Header.range()))));
	}			}

				TEST_F(SymbolCollectorTest, FilterPrivateProtoSymbols) {
				TestHeaderName = testPath("x.proto.h");
				const std::string Header =
				R"(// Generated by the protocol buffer compiler. DO NOT EDIT!
				namespace nx {
				class Top_Level {};
				class TopLevel {};
				enum Kind {
				KIND_OK,
				Kind_Not_Ok,
				};
				bool operator<(int x, int y);
				ilya-biryukovUnsubmitted Done Reply Inline Actions Could you give an intuition on why is this considered private? We don't filter out those operators from other headers, right? Why are proto headers special? ilya-biryukov: Could you give an intuition on why is this considered private? We don't filter out those…
				})";
				runSymbolCollector(Header, /Main=/"");
				ilya-biryukovUnsubmitted Done Reply Inline Actions Maybe also test that the same symbol is not excluded in the file that does not end with `.proto.h`? ilya-biryukov: Maybe also test that the same symbol is not excluded in the file that does not end with `.
				ioericAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. ioeric: Sounds good.
				EXPECT_THAT(Symbols, UnorderedElementsAre(QName("nx"), QName("nx::TopLevel"),
				QName("nx::Kind"),
				QName("nx::KIND_OK")));
				}

				TEST_F(SymbolCollectorTest, DoubleCheckProtoHeaderComment) {
				TestHeaderName = testPath("x.proto.h");
				const std::string Header = R"(
				namespace nx {
				class Top_Level {};
				enum Kind {
				Kind_Fine
				};
				}
				)";
				runSymbolCollector(Header, /Main=/"");
				EXPECT_THAT(Symbols,
				UnorderedElementsAre(QName("nx"), QName("nx::Top_Level"),
				QName("nx::Kind"), QName("nx::Kind_Fine")));
				}

	} // namespace			} // namespace
	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang