Page MenuHomePhabricator

Add addOrMerge interface to tooling::Replacements.
AbandonedPublic

Authored by ioeric on Sep 9 2016, 3:34 AM.

Details

Reviewers
klimek
djasper

Diff Detail

Event Timeline

ioeric updated this revision to Diff 70810.Sep 9 2016, 3:34 AM
ioeric retitled this revision from to Add addOrMerge interface to tooling::Replacements..
ioeric updated this object.
ioeric added reviewers: klimek, djasper.
ioeric added subscribers: alexshap, hokein, omtcyfz, cfe-commits.
djasper edited edge metadata.Sep 10 2016, 1:24 AM

This seems like a pretty weird function to have in the interface. Could you explain what use case(s) you are envisioning?

disclaimer: i don't know if this method is the right thing to add (to be honest i would prefer a better interface but don't have any good suggestions on my mind at the moment), i see several issues (IMO, i might be mistaken, apologize in advance) with the current interface of Replacements. I will not list all of them, but want to add some context:

A. right now i see the same code (with minor changes) in several places:

llvm/tools/clang/tools/extra/include-fixer/IncludeFixer.cpp +382

auto R = tooling::Replacement(
      {FilePath, Info.Range.getOffset(), Info.Range.getLength(),
      Context.getHeaderInfos().front().QualifiedName});
 auto Err = Replaces.add(R);
 if (Err) {
   llvm::consumeError(std::move(Err));
   R = tooling::Replacement(
       R.getFilePath(), Replaces.getShiftedCodePosition(R.getOffset()),
       R.getLength(), R.getReplacementText());
   Replaces = Replaces.merge(tooling::Replacements(R));
 }

llvm/tools/clang/tools/extra/change-namespace/ChangeNamespace.cpp +126 (see https://reviews.llvm.org/D24183)

 
void addOrMergeReplacement(const tooling::Replacement &R,
                         tooling::Replacements *Replaces) {
 auto Err = Replaces->add(R);
 if (Err) {
    llvm::consumeError(std::move(Err));
    auto Replace = getReplacementInChangedCode(*Replaces, R);
    *Replaces = Replaces->merge(tooling::Replacements(Replace));
 }
}

B. For replacements in headers we can get into if(Err) branch quite often because the same replacement can be generated multiple times (if that header is included into several *.cpp files).

@djasper Thanks for the insight into the problem.

For the cases you listed, I have added some test cases. Case 1-3 can be handled as expected. And you are right, the behavior for case 4 is interesting.

But I still agree that addOrMerge is confusing. As you said, the intricate behavior here is adding benign conflicting replacement in case 1-3. This is actually like add with assumption that benign conflicts will be resolved by combining/merging with the adjacent replacements. Maybe we can add an optional parameter to add interface (e.g. mergeAdjacent) to resolve conflicts for case 1-3 and still return error for case 4. And this might also answer @alexshap's question 2 and 3.

Copying discussion from email thread:

@ioeric:

    1. regarding B: i apologize in advance, i am not sure if it's directly related to your patch, but i'm kinda afraid that the current approach will shadow the issue (if it's considered to be an issue). For simplicity let's take a look at the following example:

      Point.h: struct Point {};

      a.cpp: include "Point.h"

      b.cpp: include "Point.h"

      clang-rename rename-all -i -old-name Point -new-name Point2 a.cpp b.cpp Renaming failed in /Users/Alexshap/PlayRename/srcs/./include/Point.h! New replacement: /Users/Alexshap/PlayRename/srcs/./include/Point.h: 7:+5:"Point2" conflicts with existing replacement: /Users/Alexshap/PlayRename/srcs/./include/Point.h: 7:+5:"Point2"

      the thing is that in the past Replacements was a typedef on std::set<...> (if i am not mistaken) and insert() ignored duplicates. Now Replacements.add(...) will return an error (the same replacement has already been added).
  1. (Separate observation, I'd like to take advantage of this discussion and ask about it)

    Replacements Replacements::merge(const Replacements &ReplacesToMerge)

    The question is the following: with the current implementation we copy all the replacements even if the merge is "trivial" or if there are only a few conflicts. This looks suboptimal, please, correct me if I'm wrong.
  2. Having said that - just curious - is the fallback on merge (in addOrMergeReplacement ) really necessary in those examples ? (I mean in change-namespace and include-fixer)

Thanks,
Alex

On Sat, Sep 10, 2016 at 9:43 PM, Daniel Jasper <djasper@google.com> wrote:
There are several things I find particularly confusing about this. Fundamentally, Replacements refer to a specific version of the code. Say you have a Replacements object R and a Replacement r that you want to integrate into it. Further assume, that C0 is the initial version of the code whereas C1 is the version after applying R. If you *add* r to R, that means that r (specifically the offset and length in it) refer to C0. If you *merge* it, that means that r refers to C1. Correct replacements can always be *merged*, while *adding* might cause conflicts if they overlap with existing ones. Thus, a function "add or merge" fundamentally doesn't make sense to me. These are fundamentally different concepts.

What this function (and the different implementations we already have) is fundamentally trying to do is to have a version of *add* that resolves (some) conflicts. It does that by assuming that r is currently referring to C0, so it first shifts it and then calls *merge*. The fact that you can shortcut if there isn't actually a conflict (i.e. the "addOr" part) is just an optimization. I am fine with implementing that optimization, but it shouldn't be part of the name.

Now, I don't (yet) have a good name as this has very intricate behavior. And I am not sure it is really a good idea to have this magically "resolve" all conflicts, because I think there are different classes. I think what we want here is to have a way to combine replacements that require a defined order, but that don't actually conflict. Lets look at a few cases. Assume R contains a single Replacement A and you trying to "addOrMerge" another Replacement B.

  1. A and B are both inserts at the same code location. This seems relatively benign and we just want a defined order
  2. A inserts at offset X and B replaces (X, X+N). Still quite benign, but we don't want B to replace anything that A inserted, it should replace the original text
  3. B inserts at offset X and A replaces (X, X+N). Same thing, though interesting as we now actually have to switch the order
  4. A and B actually replace overlapping code ranges. This is really bad and I think we should continue to report a conflict instead of doing something weird

    I'd think that your existing implementation gets #1 right but possibly only one of #2/#3. It will also do something very interesting for #4 and I really think what we want is to report an error.

    On Sat, Sep 10, 2016 at 5:53 PM, Eric Liu <ioeric@google.com> wrote: This function has been implemented in several places, so I figure it might be useful to have this function in the library. Although I'm not sure whether this should be a class interface or just a standalone helper function.

    @alexshap: thanks for the explanation. For your point B, I'm not sure about the example, could you elaborate a bit? E.g. which use case to be specific? What are the headers?

    Generally, the way users of Replacements class adding new replacements also matters. For example, if you have multiple insertions at the same offset, you could concatenate them and insert as one single replacement, if you care about performance.

    For most use cases I've seen when converting from old std::set implementation to the current implementation, performance can be improved by changing the way replacements are added. And the most natural way is often the most efficient one.
ioeric updated this revision to Diff 71006.Sep 12 2016, 7:37 AM
ioeric edited edge metadata.
  • Also cleanup comments around redundant colons/commas in format::cleanup.

Ok.. Thought some more and discussed with Manuel.

I think we should do a partial solution for now. I still think addOrMerge is harmful and it is always never the right thing to use. The codepaths that currently implement some version of it, should likely reconsider.

What I think we should implement is that some combinations of inserts and replacements are accepted by "add". Specifically, imagine you have an original text "aa" and a replacement that replaces it to "bb". Now, imagine you have an additional insert of "c" at offsets 0, 1 or 2:

  • Offset 0: Not ambiguous, result should be "cbb".
  • Offset 1: Conflict, report error.
  • Offset 2: Not ambiguous, result should be "bbc"

So basically, inserts adjacent to replacements (including deletions) have a well defined order and should be supported.

Two replacements still need a defined order and we should probably error for now. I have some ideas on how to solve this. The most intuitive one might be to have a pair functions "StringRef getReplacement(unsigned offset)"/"void setReplacement(unsigned offset, StringRef text)", that control the replacement at a given offset. A replacement object shouldn't have more than one insert at any given offset. But again, I think that's a problem we can get to later. I think most tools that currently would insert stuff at the same location are actually wrong and we should continue to error.

Similarly, we never want to merge overlapping replacements, but we should ensure that directly adjacent ones work, e.g. "aa" can be convert into "bc" with two replacements (one for the first character, one for the second) that are combined with add().

What do you think?

Ok.. Thought some more and discussed with Manuel.

I think we should do a partial solution for now. I still think addOrMerge is harmful and it is always never the right thing to use. The codepaths that currently implement some version of it, should likely reconsider.

What I think we should implement is that some combinations of inserts and replacements are accepted by "add". Specifically, imagine you have an original text "aa" and a replacement that replaces it to "bb". Now, imagine you have an additional insert of "c" at offsets 0, 1 or 2:

  • Offset 0: Not ambiguous, result should be "cbb".
  • Offset 1: Conflict, report error.
  • Offset 2: Not ambiguous, result should be "bbc"

    So basically, inserts adjacent to replacements (including deletions) have a well defined order and should be supported.

    Two replacements still need a defined order and we should probably error for now. I have some ideas on how to solve this. The most intuitive one might be to have a pair functions "StringRef getReplacement(unsigned offset)"/"void setReplacement(unsigned offset, StringRef text)", that control the replacement at a given offset. A replacement object shouldn't have more than one insert at any given offset. But again, I think that's a problem we can get to later. I think most tools that currently would insert stuff at the same location are actually wrong and we should continue to error.

If I understand correctly, add should support:

  • adding non-conflict replacement as before.
  • adding insertion that is adjacent to but not contained in an existing replacement with length>0 (deletion, actual replacement), and vice versa. The replacement should always change the original code instead of inserted text.

But why don't we want to support inserting twice at the same offset? It seems reasonable to me to assume that insertions can be ordered by the time they are added. And there are also reasonable use cases to insert at the same offset. For example, a fix or tool that generates parentheses for expressions can easily generate two '(' at the same offset, e.g. A && B || C && D -> ((A && B) || (C && D)). Of course, this makes it impossible to order the later insertion before existing insertions with add, but I don't think we want to support that since it doesn't seem to be a reasonable way to add replacements. I've seen tools that relied on the previous set implementation to achieve this (i.e. insert before existing insertions), but the code is very weird and can be easily refactored to work without this feature. But if we must support it, we might want to an insertBefore option (default=false) to add interface that makes add new insertion before the existing/conflicting insertion.

Similarly, we never want to merge overlapping replacements, but we should ensure that directly adjacent ones work, e.g. "aa" can be convert into "bc" with two replacements (one for the first character, one for the second) that are combined with add().

I think this is already true with the current implementation of add.

The problem is that it is implicit behavior, that's not easy to understand. What's worse is that the behavior of add and merge would fundamentally be reverse. If you add two inserts at the same location, the first one would come first. If you merge them, the second one would come first.

And you parentheses example is actually a good one. The tool you would write for that would insert the parentheses pair-wise, e.g. in an ASTMatchFinder callback. Without extra effort, the opening parentheses and the closing parentheses would be inserted in the same order, which is wrong. Doesn't matter as they are identical, but assume you'd want to add () and [].. You'd likely end up with "([...)]".

We should definitely provide ways to support such use cases, but I think the default behavior should for now be to report an error.

The problem is that it is implicit behavior, that's not easy to understand. What's worse is that the behavior of add and merge would fundamentally be reverse. If you add two inserts at the same location, the first one would come first. If you merge them, the second one would come first.

And you parentheses example is actually a good one. The tool you would write for that would insert the parentheses pair-wise, e.g. in an ASTMatchFinder callback. Without extra effort, the opening parentheses and the closing parentheses would be inserted in the same order, which is wrong. Doesn't matter as they are identical, but assume you'd want to add () and [].. You'd likely end up with "([...)]".

We should definitely provide ways to support such use cases, but I think the default behavior should for now be to report an error.

Ahh, make sense. I'll start implementing this then.

ioeric abandoned this revision.Sep 14 2016, 1:11 AM

Abandon in favor of D24515