This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
-
refactor/
1/1
Rename.h
6/9
Rename.cpp
-
unittests/
-
RenameTests.cpp

Differential D71598

[clangd] Filter implicit references from index while renaming
AbandonedPublic

Authored by kbobyrev on Dec 17 2019, 4:03 AM.

Download Raw Diff

Details

Reviewers

ilya-biryukov
kadircet

Summary

When asked for references during cross-file rename, index might return implicit references to the renamed symbol (such as those in macro expansions). To fix the incorrect behavior, this patch introduces basic filtering machinery which ensures that all ranges where renaming is about to be applied actually contains the identifier the user asked to rename.

Diff Detail

Event Timeline

kbobyrev created this revision.Dec 17 2019, 4:03 AM

Herald added subscribers: usaxena95, arphaman, mgrang and 2 others. · View Herald TranscriptDec 17 2019, 4:03 AM

(apologies, the FIXME may imply this approach...)

this approach is based on an assumption: the index results are matched to the latest file content, but this is not always true in practice, our index maybe stale (index results came from an old snapshot of the file), then this approach will fail.

I think we should do it in another direction:

add a new RefKind (something like implicit references, or named references) to clangd::Ref
when querying the index for rename, we set a corresponding Filter in the query request (or filter out non-interesting references based on the RefKind afterwards)

In D71598#1787806, @hokein wrote:

(apologies, the FIXME may imply this approach...)

this approach is based on an assumption: the index results are matched to the latest file content, but this is not always true in practice, our index maybe stale (index results came from an old snapshot of the file), then this approach will fail.

I think we should do it in another direction:

add a new RefKind (something like implicit references, or named references) to clangd::Ref

when querying the index for rename, we set a corresponding Filter in the query request (or filter out non-interesting references based on the RefKind aftjerwards)

I think this approach would also fail for stale index, wouldn't it?

I can totally understand why that might be slightly better for performance, but if we have no guarantees that our index is aware of implicit references and would not be able to mark those in the first place, this implementation would shield us from that.

Anyway, I am happy to learn more about why the proposed approach might be better, but I do not fully understand the concern here.

(also, this should eliminate unwanted changes caused by the index being stale, wouldn't it?)

If we go with the solution proposed by @hokein, it looks like using the current patch is an improvement of what we have now.
One big issue with the adding a new ref kind/ref modifier is that it requires modifications to Kythe-based index implementation, something that cannot be done as easily as landing this patch.

WDYT about landing something similar to this patch for now and discussing the possibilities of fixing it by storing enough information in the references later?

kadircet added inline comments.Dec 18 2019, 8:53 AM

clang-tools-extra/clangd/SourceCode.cpp
217 ↗	(On Diff #234265)	this one isn't used anywhere?
1136 ↗	(On Diff #234265)	SM doesn't seem to be necessary, as `lex` already provides that in the callback.
clang-tools-extra/clangd/SourceCode.h
301 ↗	(On Diff #234265)	i don't think it is necessary for this function to be made public, it should be OK for it to leave in rename.cpp as a helper.
clang-tools-extra/clangd/refactor/Rename.cpp
367	this one should go after `adjustRenameRanges`
367	both this and `adjustRenameRanges` seems to be lexing the source code to get identifier locations. can we lex the file only a single time instead and make use of the result in both of the functions? I would suggest moving `collectIdentifierRanges` into here and passing the result as a parameter to both of the functions. as for implementation of `filterRenameRanges` you might wanna return intersection of `RenameRanges` and result of `collectIdentifierRanges`

Sorry for a delay: I was trying to work with range patching heuristics and get it to work in generic case of "stale index returns more results than lexer", but in the end I converged to the simplest possible version of the intended change.

Move helper function back to the anonymous namespace.

kadircet added inline comments.Dec 19 2019, 7:46 AM

clang-tools-extra/clangd/refactor/Rename.cpp
431	duplication
590–599	i believe lexing might yield a superset even if index is up-to-date, void foo() { int bar; } void baz() { int ba^r; }
594	exactly for the same reason above, it might not be a subset even in those circumstances.
596	i believe we should do that even if it is not a subset.
627	`+`
clang-tools-extra/clangd/refactor/Rename.h
80	assertion below says `assert(Indexed.size() <= Lexed.size());`

Addressed a bunch of comments to cleanup the patch and replied to ask for clarification of several unresolved comments.

clang-tools-extra/clangd/refactor/Rename.cpp
590–599	Good point. Any local variable (IIRC we don't store local variables in index, so it's not there) having the same identifier as the renamed symbol might cause the indexed ranges to be a subset of lexed ranges. However, i believe lexing might yield a superset even if index is up-to-date exactly for the same reason above, it might not be a subset even in those circumstances. I tried to describe the case when lexer will return a _subset_ of indexed results, which happens in practice. Is the suggestion to change the comments you think are misleading?
596	Maybe my understanding of `getMappedRanges` is incorrect, but I suppose this will happen in case indexed ranges are subset of lexed ranges in some form (i.e. no need to find exact matches explicitly). Am I missing something?

kbobyrev planned changes to this revision.Dec 19 2019, 1:07 PM

kbobyrev mentioned this in D72071: [clangd] Add correctness checks for index-based rename.Jan 2 2020, 2:40 AM

kbobyrev abandoned this revision.Apr 20 2020, 6:45 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

refactor/

Rename.h

1 line

Rename.cpp

36 lines

unittests/

RenameTests.cpp

21 lines

Diff 234727

clang-tools-extra/clangd/refactor/Rename.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	adjustRenameRanges(llvm::StringRef DraftCode, llvm::StringRef Identifier,
std::vector<Range> Indexed, const LangOptions &LangOpts);		std::vector<Range> Indexed, const LangOptions &LangOpts);

/// Calculates the lexed occurrences that the given indexed occurrences map to.		/// Calculates the lexed occurrences that the given indexed occurrences map to.
/// Returns None if we don't find a mapping.		/// Returns None if we don't find a mapping.
///		///
/// Exposed for testing only.		/// Exposed for testing only.
///		///
/// REQUIRED: Indexed and Lexed are sorted.		/// REQUIRED: Indexed and Lexed are sorted.
		/// REQUIRED: Indexed.size() <= Lexed.size().
		kadircetUnsubmitted Done Reply Inline Actions assertion below says `assert(Indexed.size() <= Lexed.size());` kadircet: assertion below says `assert(Indexed.size() <= Lexed.size());`
llvm::Optional<std::vector<Range>> getMappedRanges(ArrayRef<Range> Indexed,		llvm::Optional<std::vector<Range>> getMappedRanges(ArrayRef<Range> Indexed,
ArrayRef<Range> Lexed);		ArrayRef<Range> Lexed);
/// Evaluates how good the mapped result is. 0 indicates a perfect match.		/// Evaluates how good the mapped result is. 0 indicates a perfect match.
///		///
/// Exposed for testing only.		/// Exposed for testing only.
///		///
/// REQUIRED: Indexed and Lexed are sorted, Indexed and MappedIndex have the		/// REQUIRED: Indexed and Lexed are sorted, Indexed and MappedIndex have the
/// same size.		/// same size.
size_t renameRangeAdjustmentCost(ArrayRef<Range> Indexed, ArrayRef<Range> Lexed,		size_t renameRangeAdjustmentCost(ArrayRef<Range> Indexed, ArrayRef<Range> Lexed,
ArrayRef<size_t> MappedIndex);		ArrayRef<size_t> MappedIndex);

} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_REFACTOR_RENAME_H		#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_REFACTOR_RENAME_H

clang-tools-extra/clangd/refactor/Rename.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	if (!AffectedFileCode) {
elog("Fail to read file content: {0}", AffectedFileCode.takeError());		elog("Fail to read file content: {0}", AffectedFileCode.takeError());
continue;		continue;
}		}
auto RenameRanges =		auto RenameRanges =
adjustRenameRanges(*AffectedFileCode, RenameDecl.getNameAsString(),		adjustRenameRanges(*AffectedFileCode, RenameDecl.getNameAsString(),
std::move(FileAndOccurrences.second),		std::move(FileAndOccurrences.second),
RenameDecl.getASTContext().getLangOpts());		RenameDecl.getASTContext().getLangOpts());
if (!RenameRanges) {		if (!RenameRanges) {
// Our heuristics fails to adjust rename ranges to the current state of		// Our heuristics fails to adjust rename ranges to the current state of
		kadircetUnsubmitted Done Reply Inline Actions this one should go after `adjustRenameRanges` kadircet: this one should go after `adjustRenameRanges`
		kadircetUnsubmitted Done Reply Inline Actions both this and `adjustRenameRanges` seems to be lexing the source code to get identifier locations. can we lex the file only a single time instead and make use of the result in both of the functions? I would suggest moving `collectIdentifierRanges` into here and passing the result as a parameter to both of the functions. as for implementation of `filterRenameRanges` you might wanna return intersection of `RenameRanges` and result of `collectIdentifierRanges` kadircet: both this and `adjustRenameRanges` seems to be lexing the source code to get identifier…
// the file, it is most likely the index is stale, so we give up the		// the file, it is most likely the index is stale, so we give up the
// entire rename.		// entire rename.
return llvm::make_error<llvm::StringError>(		return llvm::make_error<llvm::StringError>(
llvm::formatv("Index results don't match the content of file {0} "		llvm::formatv("Index results don't match the content of file {0} "
"(the index may be stale)",		"(the index may be stale)",
FilePath),		FilePath),
llvm::inconvertibleErrorCode());		llvm::inconvertibleErrorCode());
}		}
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (impliesSimpleEdit(IndexedRest.front().start, LexedRest.front().start)) {
findNearMiss(PartialMatch, IndexedRest.drop_front(), LexedRest.drop_front(),		findNearMiss(PartialMatch, IndexedRest.drop_front(), LexedRest.drop_front(),
LexedIndex + 1, Fuel, MatchedCB);		LexedIndex + 1, Fuel, MatchedCB);
PartialMatch.pop_back();		PartialMatch.pop_back();
}		}
findNearMiss(PartialMatch, IndexedRest, LexedRest.drop_front(),		findNearMiss(PartialMatch, IndexedRest, LexedRest.drop_front(),
LexedIndex + 1, Fuel, MatchedCB);		LexedIndex + 1, Fuel, MatchedCB);
}		}

		// Assume that index is stale/has returned invalid results that can be filtered
		// by finding the intersection between both sets.
		// REQUIRES: Indexed and Lexed are sorted.
		llvm::Optional<std::vector<Range>> filterIndexResults(ArrayRef<Range> Indexed,
		kadircetUnsubmitted Done Reply Inline Actions duplication kadircet: duplication
		ArrayRef<Range> Lexed) {
		assert(std::is_sorted(Indexed.begin(), Indexed.end()));
		assert(std::is_sorted(Lexed.begin(), Lexed.end()));
		assert(Indexed.size() > Lexed.size());
		std::vector<Range> Result;
		std::set_intersection(Lexed.begin(), Lexed.end(), Indexed.begin(),
		Indexed.end(), std::back_inserter(Result));
		if (Result.empty())
		return llvm::None;
		return Result;
		}

} // namespace		} // namespace

llvm::Expected<FileEdits> rename(const RenameInputs &RInputs) {		llvm::Expected<FileEdits> rename(const RenameInputs &RInputs) {
ParsedAST &AST = RInputs.AST;		ParsedAST &AST = RInputs.AST;
const SourceManager &SM = AST.getSourceManager();		const SourceManager &SM = AST.getSourceManager();
llvm::StringRef MainFileCode = SM.getBufferData(SM.getMainFileID());		llvm::StringRef MainFileCode = SM.getBufferData(SM.getMainFileID());
auto GetFileContent = [&RInputs,		auto GetFileContent = [&RInputs,
&SM](PathRef AbsPath) -> llvm::Expected<std::string> {		&SM](PathRef AbsPath) -> llvm::Expected<std::string> {
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	for (const auto &R : OccurrencesOffsets) {
if (auto Err = RenameEdit.add(		if (auto Err = RenameEdit.add(
tooling::Replacement(AbsFilePath, R.first, ByteLength, NewName)))		tooling::Replacement(AbsFilePath, R.first, ByteLength, NewName)))
return std::move(Err);		return std::move(Err);
}		}
return Edit(InitialCode, std::move(RenameEdit));		return Edit(InitialCode, std::move(RenameEdit));
}		}

// Details:		// Details:
// - lex the draft code to get all rename candidates, this yields a superset		// - lex the draft code to get all rename candidates, this yields a set of
// of candidates.		// candidates. It may be a superset of candidates returned from the index
		// in case index is stale and there are new references to the renamed entity.
		// It may also be a subset of candidates from the index in case when index
		// returns some incorrect results (such as implicit references), when some
		kadircetUnsubmitted Not Done Reply Inline Actions exactly for the same reason above, it might not be a subset even in those circumstances. kadircet: exactly for the same reason above, it might not be a subset even in those circumstances.
		// references to the renamed entity have been removed or simply when
		// local variables (references to which are not stored in index) have the
		kadircetUnsubmitted Not Done Reply Inline Actions i believe we should do that even if it is not a subset. kadircet: i believe we should do that even if it is not a subset.
		kbobyrevAuthorUnsubmitted Done Reply Inline Actions Maybe my understanding of `getMappedRanges` is incorrect, but I suppose this will happen in case indexed ranges are subset of lexed ranges in some form (i.e. no need to find exact matches explicitly). Am I missing something? kbobyrev: Maybe my understanding of `getMappedRanges` is incorrect, but I suppose this will happen in…
		// same identifier name.
		// - if the lexed ranges are subset of index candidates, try to filter the
		// results from index by exactly matching existing ranges from lexer.
		kadircetUnsubmitted Not Done Reply Inline Actions i believe lexing might yield a superset even if index is up-to-date, void foo() { int bar; } void baz() { int ba^r; } kadircet: i believe lexing might yield a superset even if index is up-to-date, ``` void foo() { int bar…
		kbobyrevAuthorUnsubmitted Done Reply Inline Actions Good point. Any local variable (IIRC we don't store local variables in index, so it's not there) having the same identifier as the renamed symbol might cause the indexed ranges to be a subset of lexed ranges. However, i believe lexing might yield a superset even if index is up-to-date exactly for the same reason above, it might not be a subset even in those circumstances. I tried to describe the case when lexer will return a _subset_ of indexed results, which happens in practice. Is the suggestion to change the comments you think are misleading? kbobyrev: Good point. Any local variable (IIRC we don't store local variables in index, so it's not…
// - apply range patching heuristics to generate "authoritative" occurrences,		// - apply range patching heuristics to generate "authoritative" occurrences,
// cases we consider:		// cases we consider:
// (a) index returns a subset of candidates, we use the indexed results.		// (a) index returns a subset of candidates, we use the indexed results.
// - fully equal, we are sure the index is up-to-date		// - fully equal, we are sure the index is up-to-date
// - proper subset, index is correct in most cases? there may be false		// - proper subset, index is correct in most cases? there may be false
// positives (e.g. candidates got appended), but rename is still safe		// positives (e.g. candidates got appended), but rename is still safe
// (b) index returns non-candidate results, we attempt to map the indexed		// (b) index returns non-candidate results, we attempt to map the indexed
// ranges onto candidates in a plausible way (e.g. guess that lines		// ranges onto candidates in a plausible way (e.g. guess that lines
// were inserted). If such a "near miss" is found, the rename is still		// were inserted). If such a "near miss" is found, the rename is still
// possible		// possible
llvm::Optional<std::vector<Range>>		llvm::Optional<std::vector<Range>>
adjustRenameRanges(llvm::StringRef DraftCode, llvm::StringRef Identifier,		adjustRenameRanges(llvm::StringRef DraftCode, llvm::StringRef Identifier,
std::vector<Range> Indexed, const LangOptions &LangOpts) {		std::vector<Range> Indexed, const LangOptions &LangOpts) {
assert(!Indexed.empty());		assert(!Indexed.empty());
assert(std::is_sorted(Indexed.begin(), Indexed.end()));		assert(std::is_sorted(Indexed.begin(), Indexed.end()));
std::vector<Range> Lexed =		std::vector<Range> Lexed =
collectIdentifierRanges(Identifier, DraftCode, LangOpts);		collectIdentifierRanges(Identifier, DraftCode, LangOpts);
llvm::sort(Lexed);		llvm::sort(Lexed);
return getMappedRanges(Indexed, Lexed);		return Indexed.size() <= Lexed.size() ? getMappedRanges(Indexed, Lexed)
		: filterIndexResults(Indexed, Lexed);
}		}

llvm::Optional<std::vector<Range>> getMappedRanges(ArrayRef<Range> Indexed,		llvm::Optional<std::vector<Range>> getMappedRanges(ArrayRef<Range> Indexed,
ArrayRef<Range> Lexed) {		ArrayRef<Range> Lexed) {
assert(!Indexed.empty());		assert(!Indexed.empty());
assert(std::is_sorted(Indexed.begin(), Indexed.end()));		assert(std::is_sorted(Indexed.begin(), Indexed.end()));
assert(std::is_sorted(Lexed.begin(), Lexed.end()));		assert(std::is_sorted(Lexed.begin(), Lexed.end()));
		assert(Indexed.size() <= Lexed.size());
		kadircetUnsubmitted Done Reply Inline Actions `+` kadircet: `+`

if (Indexed.size() > Lexed.size()) {
vlog("The number of lexed occurrences is less than indexed occurrences");
return llvm::None;
}
// Fast check for the special subset case.		// Fast check for the special subset case.
if (std::includes(Indexed.begin(), Indexed.end(), Lexed.begin(), Lexed.end()))		if (std::includes(Indexed.begin(), Indexed.end(), Lexed.begin(), Lexed.end()))
return Indexed.vec();		return Indexed.vec();

std::vector<size_t> Best;		std::vector<size_t> Best;
size_t BestCost = std::numeric_limits<size_t>::max();		size_t BestCost = std::numeric_limits<size_t>::max();
bool HasMultiple = 0;		bool HasMultiple = 0;
std::vector<size_t> ResultStorage;		std::vector<size_t> ResultStorage;
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/RenameTests.cpp

Show First 20 Lines • Show All 862 Lines • ▼ Show 20 Lines	llvm::StringRef FooCC;
)cpp",		)cpp",
R"cpp(		R"cpp(
#include "foo.h"		#include "foo.h"
Kind ff() {		Kind ff() {
return Kind::[[ABC]];		return Kind::[[ABC]];
}		}
)cpp",		)cpp",
},		},
		{
		// Macros and implicit references.
		R"cpp(
		class [[Fo^o]] {};
		#define FooFoo Foo
		#define FOO Foo
		)cpp",
		R"cpp(
		#include "foo.h"
		void bar() {
		[[Foo]] x;
		FOO y;
		FooFoo z;
		}
		)cpp",
		},
};		};

for (const auto& T : Cases) {		for (const auto& T : Cases) {
Annotations FooH(T.FooH);		Annotations FooH(T.FooH);
Annotations FooCC(T.FooCC);		Annotations FooCC(T.FooCC);
std::string FooHPath = testPath("foo.h");		std::string FooHPath = testPath("foo.h");
std::string FooCCPath = testPath("foo.cc");		std::string FooCCPath = testPath("foo.cc");

▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
TEST(RangePatchingHeuristic, GetMappedRanges) {		TEST(RangePatchingHeuristic, GetMappedRanges) {
// ^ in LexedCode marks the ranges we expect to be mapped; no ^ indicates		// ^ in LexedCode marks the ranges we expect to be mapped; no ^ indicates
// there are no mapped ranges.		// there are no mapped ranges.
struct {		struct {
llvm::StringRef IndexedCode;		llvm::StringRef IndexedCode;
llvm::StringRef LexedCode;		llvm::StringRef LexedCode;
} Tests[] = {		} Tests[] = {
{		{
// no lexed ranges.
"[[]]",
"",
},
{
// both line and column are changed, not a near miss.		// both line and column are changed, not a near miss.
R"([[]])",		R"([[]])",
R"(		R"(
[[]]		[[]]
)",		)",
},		},
{		{
// subset.		// subset.
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines