This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Filter implicit references from index while renaming
AbandonedPublic

Authored by kbobyrev on Dec 17 2019, 4:03 AM.

Details

Summary

When asked for references during cross-file rename, index might return implicit references to the renamed symbol (such as those in macro expansions). To fix the incorrect behavior, this patch introduces basic filtering machinery which ensures that all ranges where renaming is about to be applied actually contains the identifier the user asked to rename.

Diff Detail

Event Timeline

kbobyrev created this revision.Dec 17 2019, 4:03 AM
hokein added a subscriber: hokein.Dec 17 2019, 6:12 AM

(apologies, the FIXME may imply this approach...)

this approach is based on an assumption: the index results are matched to the latest file content, but this is not always true in practice, our index maybe stale (index results came from an old snapshot of the file), then this approach will fail.

I think we should do it in another direction:

  • add a new RefKind (something like implicit references, or named references) to clangd::Ref
  • when querying the index for rename, we set a corresponding Filter in the query request (or filter out non-interesting references based on the RefKind afterwards)

(apologies, the FIXME may imply this approach...)

this approach is based on an assumption: the index results are matched to the latest file content, but this is not always true in practice, our index maybe stale (index results came from an old snapshot of the file), then this approach will fail.

I think we should do it in another direction:

  • add a new RefKind (something like implicit references, or named references) to clangd::Ref
  • when querying the index for rename, we set a corresponding Filter in the query request (or filter out non-interesting references based on the RefKind aftjerwards)

I think this approach would also fail for stale index, wouldn't it?

I can totally understand why that might be slightly better for performance, but if we have no guarantees that our index is aware of implicit references and would not be able to mark those in the first place, this implementation would shield us from that.

Anyway, I am happy to learn more about why the proposed approach might be better, but I do not fully understand the concern here.

(also, this should eliminate unwanted changes caused by the index being stale, wouldn't it?)

If we go with the solution proposed by @hokein, it looks like using the current patch is an improvement of what we have now.
One big issue with the adding a new ref kind/ref modifier is that it requires modifications to Kythe-based index implementation, something that cannot be done as easily as landing this patch.

WDYT about landing something similar to this patch for now and discussing the possibilities of fixing it by storing enough information in the references later?

kadircet added inline comments.Dec 18 2019, 8:53 AM
clang-tools-extra/clangd/SourceCode.cpp
217 ↗(On Diff #234265)

this one isn't used anywhere?

1136 ↗(On Diff #234265)

SM doesn't seem to be necessary, as lex already provides that in the callback.

clang-tools-extra/clangd/SourceCode.h
301 ↗(On Diff #234265)

i don't think it is necessary for this function to be made public, it should be OK for it to leave in rename.cpp as a helper.

clang-tools-extra/clangd/refactor/Rename.cpp
367

this one should go after adjustRenameRanges

367

both this and adjustRenameRanges seems to be lexing the source code to get identifier locations.

can we lex the file only a single time instead and make use of the result in both of the functions?
I would suggest moving collectIdentifierRanges into here and passing the result as a parameter to both of the functions.

as for implementation of filterRenameRanges you might wanna return intersection of RenameRanges and result of collectIdentifierRanges

kbobyrev updated this revision to Diff 234719.Dec 19 2019, 7:15 AM
kbobyrev marked 7 inline comments as done.

Sorry for a delay: I was trying to work with range patching heuristics and get it to work in generic case of "stale index returns more results than lexer", but in the end I converged to the simplest possible version of the intended change.

kbobyrev updated this revision to Diff 234720.Dec 19 2019, 7:16 AM

Move helper function back to the anonymous namespace.

kadircet added inline comments.Dec 19 2019, 7:46 AM
clang-tools-extra/clangd/refactor/Rename.cpp
431

duplication

590–599

i believe lexing might yield a superset even if index is up-to-date,

void foo() { int bar; }
void baz() { int ba^r; }
594

exactly for the same reason above, it might not be a subset even in those circumstances.

596

i believe we should do that even if it is not a subset.

627

+

clang-tools-extra/clangd/refactor/Rename.h
80

assertion below says
assert(Indexed.size() <= Lexed.size());

kbobyrev updated this revision to Diff 234727.Dec 19 2019, 8:16 AM
kbobyrev marked 5 inline comments as done.

Addressed a bunch of comments to cleanup the patch and replied to ask for clarification of several unresolved comments.

clang-tools-extra/clangd/refactor/Rename.cpp
590–599

Good point. Any local variable (IIRC we don't store local variables in index, so it's not there) having the same identifier as the renamed symbol might cause the indexed ranges to be a subset of lexed ranges.

However,

i believe lexing might yield a superset even if index is up-to-date
exactly for the same reason above, it might not be a subset even in those circumstances.

I tried to describe the case when lexer will return a _subset_ of indexed results, which happens in practice. Is the suggestion to change the comments you think are misleading?

596

Maybe my understanding of getMappedRanges is incorrect, but I suppose this will happen in case indexed ranges are subset of lexed ranges in some form (i.e. no need to find exact matches explicitly). Am I missing something?

kbobyrev planned changes to this revision.Dec 19 2019, 1:07 PM
kbobyrev abandoned this revision.Apr 20 2020, 6:45 AM