This is an archive of the discontinued LLVM Phabricator instance.

[clang-apply-replacements] No longer deduplucates replacements from the same TU
ClosedPublic

Authored by njames93 on Mar 12 2020, 3:14 AM.

Details

Summary

clang-apply-replacements currently deduplicates all diagnostic replacements. However if you get a duplicated replacement from one TU then its expected that it should not be deduplicated. This goes some way to solving export-fixes to yaml adds extra newlines and breaks offsets.

Take this example yaml.

---
MainSourceFile:  '/home/nathan/test/test.cpp'
Diagnostics:
  - DiagnosticName:  readability-braces-around-statements
    DiagnosticMessage:
      Message:         statement should be inside braces
      FilePath:        '/home/nathan/test/test.cpp'
      FileOffset:      14
      Replacements:
        - FilePath:        '/home/nathan/test/test.cpp'
          Offset:          14
          Length:          0
          ReplacementText: ' {'
        - FilePath:        '/home/nathan/test/test.cpp'
          Offset:          28
          Length:          0
          ReplacementText: '

}'
  - DiagnosticName:  readability-braces-around-statements
    DiagnosticMessage:
      Message:         statement should be inside braces
      FilePath:        '/home/nathan/test/test.cpp'
      FileOffset:      20
      Replacements:
        - FilePath:        '/home/nathan/test/test.cpp'
          Offset:          20
          Length:          0
          ReplacementText: ' {'
        - FilePath:        '/home/nathan/test/test.cpp'
          Offset:          28
          Length:          0
          ReplacementText: '

}'
...

The current behaviour is to deduplicate the text insertions at Offset 28 and only apply one of the replacements.
However as both of these replacements came from the same translation unit we can be confident they were both meant to be applied together
The new behaviour won't deduplicate the text insertion and instead insert both of the replacements.
If the duplicate replacement is found inside different translation units (from a header file change perhaps) then they will still be deduplicated as before.

Diff Detail

Event Timeline

njames93 created this revision.Mar 12 2020, 3:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2020, 3:14 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Test cases will follow, just time constrained for now.

njames93 updated this revision to Diff 250054.Mar 12 2020, 2:22 PM
  • Added test cases

Unfortunately I cannot say whether the code is good, but the fix works and certainly helps readability-braces-around-statements which can sometimes add multiple } at the same offset to close several scopes.

Can you please expand on the description? Specifically, can you clarify what you've *changed* in this patch? I gather that previously, it distinguished between two different sources of replacements: diagnostics and otherwise, and only deduplicated the ones from diagnostics. What is the new behavior?

Also, perhaps give a simple example in the description?

njames93 edited the summary of this revision. (Show Details)Mar 18 2020, 2:54 PM

Thanks for expanding the description, including the helpful example. I'm not sure, though, that this is the "right" behavior, at least not always. Worse, I'm not sure there is a single "right" behavior. I can easily imagine a tidy that matches multiple times in the same TU and tries, therefore, to apply the same fix multiple times, even though it wants at most one (and possibly an error). The major problem is that character-level (versus AST level) edits simply don't compose. So, multiple edits on a file are always suspect. Could you also give an example (in terms of code changes, not YAML) of why this comes up? As in, are we sure that the problem lies in the algorithm here, rather than in the phrasing of the transformation itself?

That said, based on some of your linked bugs, it sounds like your change would make the behavior here consistent with the behavior in another situation (when clang-tidy applies the edits directly rather than outputting to YAML?). Consistency could be a strong argument in favor of this change.

Thanks for expanding the description, including the helpful example. I'm not sure, though, that this is the "right" behavior, at least not always. Worse, I'm not sure there is a single "right" behavior. I can easily imagine a tidy that matches multiple times in the same TU and tries, therefore, to apply the same fix multiple times, even though it wants at most one (and possibly an error). The major problem is that character-level (versus AST level) edits simply don't compose. So, multiple edits on a file are always suspect. Could you also give an example (in terms of code changes, not YAML) of why this comes up? As in, are we sure that the problem lies in the algorithm here, rather than in the phrasing of the transformation itself?

That said, based on some of your linked bugs, it sounds like your change would make the behavior here consistent with the behavior in another situation (when clang-tidy applies the edits directly rather than outputting to YAML?). Consistency could be a strong argument in favor of this change.

clang-tidy does a good job of filtering out conflicting fix-its so duplicated insertions are going to be intentional.

The example yaml is generated from running is running the readability-braces-around-statements check on this code(offsets dont match due to formatting)

void f() {
  if (1)
    if (2) return;
}

Both if statements have the same end location which is where the check will insert a }.
This leads to 2 insertion fix its that are the same location and same text.
When clang-tidy applies the fix-it there is no issue, but currently clang-apply-replacements thinks they should be deduplicated.
This ultimately leads to malformed code as 2 { are inserted after the if conditions, but only one } is inserted at the end.

ymandel accepted this revision.Mar 23 2020, 6:08 AM

Thanks for the example and, generally, explanation. Just a few small comments.

clang-tools-extra/clang-apply-replacements/lib/Tooling/ApplyReplacements.cpp
148–150

Please update the comments to explain the new structure.

154

nit: I'd rename to SourceTU. Before, FromDiag was used as a proposition, as in "from diag?". Here, it's an (optional) value, so I think it would be better to name it that way.

160–162

This results in a double lookup. Instead, maybe:

auto It = Replaces.find(R);
if (It == Replaces.end())
  Replaces.emplace(R, FromTU);
else if (*It != FromTU)
  // This replacement is a duplicate of one suggested by another TU.
  return;
clang-tools-extra/test/clang-apply-replacements/identical2.cpp
3 ↗(On Diff #250054)

Why is this %/T rather than %T? I realize it is the same in identical.cpp, but just want to understand what I'm reading...

8 ↗(On Diff #250054)

Consider differentiating the name of this test more and making it more descriptive, e.g. identical_in_TU.

This revision is now accepted and ready to land.Mar 23 2020, 6:08 AM
njames93 updated this revision to Diff 252017.Mar 23 2020, 6:42 AM
njames93 marked 5 inline comments as done.
  • Address reviewer comments
njames93 added inline comments.Mar 23 2020, 6:44 AM
clang-tools-extra/test/clang-apply-replacements/identical2.cpp
3 ↗(On Diff #250054)

It's like that in most of the checks, not sure why but wanted to keep it consistent.

This revision was automatically updated to reflect the committed changes.