This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement range patching heuristics for cross-file rename.
ClosedPublic

Authored by hokein on Nov 22 2019, 3:56 AM.

Details

Summary

The heuristic is simple, only for cases we are confident.

Diff Detail

Event Timeline

hokein created this revision.Nov 22 2019, 3:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2019, 3:56 AM

Build result: FAILURE - Could not check out parent git hash "6171238cb7e9e8616d21ce09d463e945ce0a9fb8". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

High level comments based on offline discussion:

I think we want to define/formalize the concept of a near miss, to make precise the tradeoffs between false positives, false negatives, and implementability.
Not just at the individual level (an index occurrence vs a lexed occurrence) but at a whole-document level.
A possible definition, intuitively finding the locations where the indexed occurrences are now spelled:

  • a near miss maps all of the name occurrences in the index onto a subset of the lexed occurrences. (names may refer to more than one thing)
  • indexed occurrences must all be mapped. Result must be distinct, and preserve order. (Support only simple edits to ensure our mapping is robust)
  • each indexed->lexed correspondence may change row or column but not both (increases chance our mapping is robust)

Then we can use a greedy algorithm to find a match (or memoized DFS to enumerate/compare all matches)

A good metric for comparing competing mappings might be the sum of the implied edit sizes between successive entries.
i.e. if the first three mappings are offset by 1 column down, and the last is offset by 2 columns down, then this implies an insertion of 1 line at the top of the file and 1 line between edits 3 and 4, for a total of 2.

subset is fine/good to handle as a special case - if the subset is small we don't want to find it by search.

clang-tools-extra/clangd/refactor/Rename.cpp
388

I think this is just
return std::includes(Superset.begin, Superset.end(), Subset.begin(), Subset.end());
(probably clear enough to just inline to the callsite)

567

That doesn't mean you need std::move to get a move, it just means you can't avoid calling the move constructor.
https://godbolt.org/z/Rh-CvT

sammccall added inline comments.Dec 2 2019, 7:14 AM
clang-tools-extra/clangd/refactor/Rename.h
96

Oops, forgot this...
I think the public API isn't quite right here - exposing parts for testing is fine, but the classification itself isn't fine grained enough I think. (Too easy to write a test that "passes" but the actual mapping found isn't the right one).

And the class structure wrapping a LangOpts ref seems like a detail that can be hidden.

I'd like to see:

  • a function that returns the lexed ranges from a StringRef/LangOpts
  • a function that constructs the mapping given two sequences of ranges (like getMappedRanges(ArrayRef<Range>, ArrayRef<Range>) -> vector<Range>
  • a function that ties these together to the data structures we care about (i.e. taking Code + identifier + LangOpts + ArrayRef<Ref> or so)

then you can unit test the first and second and smoke test the third.

Tests like

Indexed = "int [[x]] = 0; void foo(int x);";
Draft = "double [[x]] = 0; void foo(double x);";
verifyRenameMatches(Indexed, Draft);
hokein updated this revision to Diff 232101.Dec 4 2019, 5:52 AM
hokein marked 3 inline comments as done.

address comments

  • re-define the concept of a near miss
  • add metric for evaluate how good a near miss is
hokein added inline comments.Dec 4 2019, 5:53 AM
clang-tools-extra/clangd/refactor/Rename.h
96

a function that returns the lexed ranges from a StringRef/LangOpts

There is an existing function collectIdentifierRanges in SourceCode.cpp, and it has been unittested.

a function that constructs the mapping given two sequences of ranges (like getMappedRanges(ArrayRef<Range>, ArrayRef<Range>) -> vector<Range>
a function that ties these together to the data structures we care about (i.e. taking Code + identifier + LangOpts + ArrayRef<Ref> or so)

sure, I think it is sufficient to test the second one, since the second one is a simple wrapper of the getMappedRanges.

sammccall added inline comments.Dec 4 2019, 6:02 AM
clang-tools-extra/clangd/refactor/Rename.h
96

sure, I think it is sufficient to test the second one, since the second one is a simple wrapper of the getMappedRanges.

Did you mean "sufficient to test the first one"?
Testing the second one is certainly sufficient, but tests more than it needs to (particularly the lexing bits again).

Build result: pass - 60453 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

sammccall added inline comments.Dec 5 2019, 10:45 AM
clang-tools-extra/clangd/refactor/Rename.cpp
384

the name here isn't much simpler/clearer than the code.
impliesSimpleEdit?

442

it looks like itercount at the moment is counting depth and ignoring breadth.

I think you want to pass by reference and increment on each call, instead.

442

FWIW, this seems like a class that could be a function:
params are:

vector<int> &PartialMatch
ArrayRef<Range> IndexedRest // just those not covered by PartialMatch
ArrayRef<Range> MatchedRest // those still to consider
int &Fuel // set to 10000, decrement, bail out once it goes negative
Callback // as now, though no need for the parameter
442

you can bail out as soon as there are not enough lexed tokens remaining to match

452

if we're actually evaluating all ranges, can we pass the index array (by reference), use it to evaluate scores, and only copy ranges for the winner?

456

This works but maybe more common/obvious is to use recursion for both cases instead of the loop:

if (isLineOrColumnEqual(..., Lexed[NextLexed])){
  // match
  ...
  enumerate(NextMatched+1, NextLexed+1);
}
// don't match
enumerate(NextMatched, NextLexed+1);
457

Visited seems redundant: it always contains [0, NextLexed)

574

Not clear to me why we're building these structures.
The cost is a sum of implied edits, implied edits are a function of (last displacement, current displacement, are they on the same line)

so what about something like:

Cost = 0;
LastLine = -1;
LastDX = 0, LastDY = 0;
for (I) {
  DX = Mapped[I].begin.character - Indexed[I].begin.character;
  DY = Mapped[I].begin.line - Indexed[I].begin.line;
  Line = Mapped[I].begin.line;
  if (!(Line == LastLine && DX == LastDX))
    LastDX = 0; // horizontal offsets don't carry across lines
  Cost += abs(DX - LastDX) + abs(DY - LastDY);
  LastDX, LastDY, LastLine = DX, DY, Line;
}
clang-tools-extra/clangd/refactor/Rename.h
59

I'm not sure if authoritative is accurate here - authoritative would be from an AST or fresh index.

I'd suggest "Adjusts indexed occurrences to match the current state of the file"

61

nit: The index is not always up to date.

62

"bad rename result" is vague.
Maybe "Blindly editing at the locations reported by the index may mangle the code in such cases".

62

"This API helps" doesn't add much here - maybe merge with the next sentence?
"This function determines whether the indexed occurrences can be applied to this file, and heuristically..."

66

The details comment belongs in the implementation rather than the header, I think.

75

"we use the best near miss" - it would help to describe approximately what this means. It's very hard to understand that this can fail because no candidate is "near".

e.g. "we attempt to map the indexed locations onto candidates in a plausible way (e.g. guess that lines were inserted). If such a "near miss" is found, the rename is still possible"

77

nit: occcurrences -> occurences
nit: break before (

79

I think this use of "patch" is a little confusing, and the name is a bit vague for the clangd namespace.
adjustRenameRanges?

84

"the mapping result" doesn't make sense in this context (the header), as you haven't exposed/described the function that computes mappings or otherwise defined the concept.

(I think exposing the function would be best both for clarity and for unit-tests)

86

I'd suggest moving most of this comment (at least the examples) to the cpp file.

103

editCost is again not a specific enough name for this operation.

The cost we're describing is specific to *adjusting* the indexed ranges to correspond to the mapped ones. Maybe renameRangeAdjustmentCost()?

hokein updated this revision to Diff 232743.Dec 8 2019, 12:52 PM
hokein marked 21 inline comments as done.

address reveiw comments.

Build result: pass - 60605 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

hokein updated this revision to Diff 232744.Dec 8 2019, 1:09 PM

some minor fixes.

clang-tools-extra/clangd/refactor/Rename.cpp
452

we could use the index array to evaluate the scores, but it would make the cost API signature a bit weird, like size_t renameRangeAdjustmentCost(ArrayRef<Range> Indexed, ArrayRef<Range> Lexed, ArrayRef<size_t> MatchedLexedIndex);

clang-tools-extra/clangd/refactor/Rename.h
96

I got the point here, exposed getMappedRanges and added unit tests for it.

Build result: pass - 60605 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

sammccall accepted this revision.Dec 9 2019, 4:53 AM
sammccall added inline comments.
clang-tools-extra/clangd/refactor/Rename.cpp
359–371

nit: these are not candidates, they are RenameRanges or similar

364

This is worth a comment (because the error message returned describes a condition that we usually recover from)

452

It's not really an API right, just a helper function exposed for testing? I don't think this is a a problem.

569

why returning Expected rather than Optional here - what do we want to do with the message?

589

This message isn't meaningful outside this TU - it should be a vlog, or easier to understand

593

return Indexed.vec()

618

distinct -> unique

622

found -> find

622

(again, these error messages are not useful to the user, as-is)

clang-tools-extra/clangd/refactor/Rename.h
82

also exposed for testing only?

clang-tools-extra/clangd/unittests/RenameTests.cpp
866

can you make the new line non-empty (add a comment) and change int->double instead of adding whitespace? Was hard to see what's going on here

This revision is now accepted and ready to land.Dec 9 2019, 4:53 AM
hokein updated this revision to Diff 232844.Dec 9 2019, 6:59 AM
hokein marked 12 inline comments as done.

address comments:

  • don't emit the internal messages to users, llvm::Expected => llvm::Optional
  • use the index of lexed array to calculate the adjustment cost.
hokein added inline comments.Dec 9 2019, 7:01 AM
clang-tools-extra/clangd/refactor/Rename.cpp
569

I found these error messages are useful for debugging. But you are right, end users are not interested in them, changed to vlog and Optional here.

Build result: pass - 60627 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

This revision was automatically updated to reflect the committed changes.