This is an archive of the discontinued LLVM Phabricator instance.

Merge deletions that are contained in a larger deletion.
AbandonedPublic

Authored by ioeric on Sep 19 2016, 2:39 AM.

Details

Reviewers
djasper
Summary

If a new deletion replacement is contained in an existing deletion or contains one or more existing replacements, then all deletions are merged into the largest deletion.

Diff Detail

Event Timeline

ioeric updated this revision to Diff 71784.Sep 19 2016, 2:39 AM
ioeric retitled this revision from to Merge deletions that are contained in a larger deletion..
ioeric updated this object.
ioeric updated this object.Sep 19 2016, 2:41 AM
ioeric added a reviewer: djasper.
ioeric added a subscriber: cfe-commits.
djasper edited edge metadata.Sep 19 2016, 3:10 AM

Thinking about this some more, starting to merge deletions now, but only some of them is a bit suspect. I think we either want to allow even more or continue to be restrictive for now.

I think fundamentally, there are two questions that we need to answer:

  1. Is this something that the user/tool author would likely want to do?
  2. Is add the replacement order-dependent in any way?

I have no clue about #1, I'd have to see use cases. E.g. what use case are you trying to solve here?

But lets look at #2: I think I have come up with an easy definition of what makes something order-dependent. Lets assume we have two replacements A and B that both refer to the same original code (I am using A and B as single replacements or as sets of a single replacement for simplicity). The question is whether A.add(B) is order-dependent. I think we should define this as (assuming we have a function that shifts a replacement by another replacement like your getReplacementInChangedCode from https://reviews.llvm.org/D24383):

A.add(B) is order-dependent (and thus should conflict, if A.merge(getReplacementInChangedCode(B)) != B.merge(getReplacementInChangedCode(A)).

I think, this enables exactly the kinds of additions that we have so far enabled, which seems good. It also enables overlapping deletions, e.g. deleting range [0-2] and [1-3] will result in deleting [0-3], not matter in which order.

Thinking about this some more, starting to merge deletions now, but only some of them is a bit suspect. I think we either want to allow even more or continue to be restrictive for now.

I think fundamentally, there are two questions that we need to answer:

  1. Is this something that the user/tool author would likely want to do?
  2. Is add the replacement order-dependent in any way?

I have no clue about #1, I'd have to see use cases. E.g. what use case are you trying to solve here?

Cong has this problem with dead code deletion where one dead code block is contained in another dead code block. Removing both dead entities will cause conflict now, so I figure maybe this is something we can also support because they are also order-independent and safe to deduplicate.

But lets look at #2: I think I have come up with an easy definition of what makes something order-dependent. Lets assume we have two replacements A and B that both refer to the same original code (I am using A and B as single replacements or as sets of a single replacement for simplicity). The question is whether A.add(B) is order-dependent. I think we should define this as (assuming we have a function that shifts a replacement by another replacement like your getReplacementInChangedCode from https://reviews.llvm.org/D24383):

A.add(B) is order-dependent (and thus should conflict, if A.merge(getReplacementInChangedCode(B)) != B.merge(getReplacementInChangedCode(A)).

I think, this enables exactly the kinds of additions that we have so far enabled, which seems good. It also enables overlapping deletions, e.g. deleting range [0-2] and [1-3] will result in deleting [0-3], not matter in which order.

This seems to be a nice definition for order-dependent. Just one caveat: with this condition, A=(0,0,"a") and B=(0,0,"a") are now also order-independent. Although the result for applying A and B in either order would be the same, I feel this is somehow less safe than merging deletions. And I guess the question here is whether users want to deduplicate. But for deletions, duplication doesn't matter.

I actually think this is a good example. So lets assume we'd write a tool to fully quote binary expressions, e.g. that turns

if (a * b + c * d == 10) ...

into

if (((a * b) + (c * d)) == 10) ...

So, here, we would be inserting two "(" and two ")" at the same locations. And, as you correctly mention, the order doesn't matter because we are inserting the same string twice. I think this is actually good behavior.

Deduplication is an interesting concern, but I think we probably want to handle that at a different layer. E.g. in the use case above, deduplicating would be quite fatal :).

I actually think this is a good example. So lets assume we'd write a tool to fully quote binary expressions, e.g. that turns

if (a * b + c * d == 10) ...

into

if (((a * b) + (c * d)) == 10) ...

So, here, we would be inserting two "(" and two ")" at the same locations. And, as you correctly mention, the order doesn't matter because we are inserting the same string twice. I think this is actually good behavior.

I agree that this is good behavior.

Deduplication is an interesting concern, but I think we probably want to handle that at a different layer. E.g. in the use case above, deduplicating would be quite fatal :).

Okay, it does make more sense to handle deduplication in a different layer.

So, with this assumption, the implementation should be much easier now: when there is conflict found in add, check this condition. If A and B are order-dependent as defined above, we then merge(getReplacementInChangedCode(B)) into the set.

ioeric abandoned this revision.Sep 27 2016, 7:45 AM

Abandon in favor of D24800