This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] insert new #includes into correct blocks when cleaning up Replacement with cleanupAroundReplacements().
ClosedPublic

Authored by ioeric on May 27 2016, 9:24 AM.

Details

Summary

When a replacement's offset is set to UINT_MAX or -1U, it is treated as
a header insertion replacement by cleanupAroundReplacements(). The new #include
directive is then inserted into the correct block.

Diff Detail

Repository
rL LLVM

Event Timeline

ioeric updated this revision to Diff 58799.May 27 2016, 9:24 AM
ioeric retitled this revision from to [clang-format] insert new #includes into correct blocks when cleaning up Replacement with cleanupAroundReplacements()..
ioeric updated this object.
ioeric added reviewers: djasper, klimek.
ioeric added subscribers: bkramer, cfe-commits.
djasper added inline comments.May 29 2016, 7:31 AM
lib/Format/Format.cpp
1411 ↗(On Diff #58799)

But you should pass it as an ArrayRef or a SmallVectorImpl.

1413 ↗(On Diff #58799)

nit: no braces

1418 ↗(On Diff #58799)

From the docs in Format.h: "If none of the regular expressions match, INT_MAX is assigned". If that is true for an existing #include as well as the inserted one, I don't see a reason, why those should belong to the same block. So, I think you should return INT_MAX here and remove all the special-casing below.

1431 ↗(On Diff #58799)

We already have an include regex in this file, I don't think we should have two. Can you pull out the string into a constant?

1435 ↗(On Diff #58799)

There is a lot of duplicated code between this and sortCppIncludes. Pull common code out into functions.

1454 ↗(On Diff #58799)

There is no guarantee that people assign categories sequentially, I can easily see people assigning category "1000" or "10000" to be able to add more categories later. I think this should be a map and I think you then also don't really need the vector. Also note that categories can be negative, which I think would break this.

1458 ↗(On Diff #58799)

I am guessing that AfterFirstDefine is used so that we insert #includes after header guards. I think we should name it AfterHeaderGuard than and either make the behavior more strict or at least add a FIXME for it (only in header files, only if the #define is after an #ifndef, ...)

1482 ↗(On Diff #58799)

I think if you assign the EndOffset of all lower categories that aren't set yet, that will simplify the logic below.

1509 ↗(On Diff #58799)

What happens if there is no category with higher priority, but there is a category with lower priority. Shouldn't you then insert the #include right before that?

1548 ↗(On Diff #58799)

I think you can remove this FIXME.. Seems obvious.

1551 ↗(On Diff #58799)

Why do you split out all the header insertion replacements here instead of just ignoring the ones that aren't header insertions in fixCppIncludeInsertions?

1554 ↗(On Diff #58799)

nit: No braces.

ioeric updated this revision to Diff 58952.May 30 2016, 6:24 AM
ioeric marked 9 inline comments as done.
  • Addressed reviewer's comments.
ioeric added inline comments.May 30 2016, 6:26 AM
lib/Format/Format.cpp
1551 ↗(On Diff #58799)

I think this enables us to exit early in fixCppIncludeInsertions if there is no header insertion replacement at all. Also, it is easier to split out header insertion replacements and then merge fixed replacements with other replacements instead of doing this in place, which IMO might not be more efficient since there are still set deletion and insertion for each header insertion replacement.

bkramer added inline comments.May 30 2016, 9:54 AM
lib/Format/Format.cpp
1283 ↗(On Diff #58952)

Typo: IncldueName

1503 ↗(On Diff #58952)

Typo: intialized

djasper added inline comments.May 30 2016, 10:44 AM
include/clang/Format/Format.h
778 ↗(On Diff #58952)

Why don't we just stick with UINT_MAX everywhere?

lib/Format/Format.cpp
1457 ↗(On Diff #58952)

Unused

1466 ↗(On Diff #58952)

Why not just a set?

1488 ↗(On Diff #58952)

nit: s/more strict/stricter/

1494 ↗(On Diff #58952)

Are you sure that this is what's implemented? The way I see it is:

  • Ensure that EndOffset[Highest] is always populated.
  • If EndOffset[Priority] isn't set, use the next higher value that is set, up to EndOffset[Highest].
1513 ↗(On Diff #58952)

I'd leave all of this logic out. This only affects headers with no current #includes, which is rare to begin with and on top of that, this is more to do with whether or not we want to automatically add new lines between the different blocks. As it currently is, this would do both:

#define HEADER_GUARD

#include <a> // inserted

and

#define HEADER_GUARD
#include <a> // inserted
#include <b>

Which might be the right behavior, but I think this is the wrong place to worry about it. In the long run, we should probably make clang-format inserted empty lines around the #include blocks if desired.

1525 ↗(On Diff #58952)

Add a comment:

// By this point, CategoryEndOffset[Highest] is always set appropriately:
//  - to an appropriate location before/after existing #includes, or
//  - to right after the header guard, or
//  - to the beginning of the file.
1531 ↗(On Diff #58952)

This should be done together with the check for the offset being -1u. Eventually we are going to have something else that we are going to insert (e.g. using declarations) and we don't want to confuse those. And even more so, if all replacements at offset -1u are not #includes, we don't need to execute most of this function.

1538 ↗(On Diff #58952)

Why is CheckMainHeader false?

1541 ↗(On Diff #58952)

No parentheses.

1550 ↗(On Diff #58952)

Hm.. Not happy with the naming. There is no real difference between fixHeaderInsertions and fixCppIncludeInsertions. You could argue that the "Cpp" part is the difference, but the actual difference is that one iterates over an entire set of replacements whereas the other only modifies header insertions.

I think I'd just merge the two functions, but that probably also ties back to my wish not to separate out the two replacement sets ;-).

ioeric updated this revision to Diff 58982.May 30 2016, 11:46 AM
ioeric marked 16 inline comments as done.
  • Addressed reviewers' comments.
lib/Format/Format.cpp
1550 ↗(On Diff #58952)

How about fixInsertionReplacements?

djasper added inline comments.May 30 2016, 12:12 PM
lib/Format/Format.cpp
1284 ↗(On Diff #58982)

Nit: Otherwise, returns the priority of the matching category or INT_MAX.

1503 ↗(On Diff #58982)

nit: I think you should be using "I" and "E".

1508 ↗(On Diff #58982)

Can you just use "std::prev(I)" instead of Prev?

1550 ↗(On Diff #58982)

So, not knowing anything of it, what's the difference between fixCppIncludeInsertions and fixInsertionReplacements and when do you need to call what. To me the fact that it's that difficult to find a name for it is telling..

Also, I still don't understand why you think it is better to separate the two functions. In that way you are currently implementing it, it is certainly less efficient. You are always copying all the replacements, which (in the way you are doing it) is O(N lg N), even if there isn't a single header insertion. Maybe at least use set_difference (similar to what I am showing below).

So, how I would structure it:

  • Pull out a function isHeaderInsertion(const Replacement &). Easy naming, good to pull this functionality out.
  • At the start of fixCppIncludeInsertions, write:
tooling::Replacements HeaderInsertions;
for (const auto &R : Replaces)
  if (isHeaderInsertion(R))
    HeaderInsertions.insert(R);
if (HeaderInsertions.empty())
  return Replaces;

tooling::Replacements Result;
std::set_difference(Replaces.begin(), Replaces.end(),
                    HeaderInsertions.begin(), HeaderInsertions.end(),
                    Result.begin());
  • Do what the function currently does with HeaderInsertions and add the results back to Result.
ioeric updated this revision to Diff 58992.May 30 2016, 1:23 PM
ioeric marked 3 inline comments as done.
  • Addressed reviewer's comments.
lib/Format/Format.cpp
1550 ↗(On Diff #58982)

I really appreciate the way you structure it, which does make the code clearer.

Just one coveat: std::set_difference() is not applicable on Replacements since it calls std::copy() on Replacement, which Replacement does not support.

I think another way to approach it would be to delete header insertion replacement and insert new replacements back if we can assume that there are only few header insertion replacements.

ioeric updated this revision to Diff 58993.May 30 2016, 1:26 PM
  • Removed inline from isHeaderInsertion.
djasper added inline comments.May 31 2016, 12:26 AM
lib/Format/Format.cpp
1549 ↗(On Diff #58993)

So, add a copy constructor to Replacement?

ioeric updated this revision to Diff 59017.May 31 2016, 12:48 AM
  • Use std::set_difference in fixCppIncludeInsertions()
djasper added inline comments.May 31 2016, 2:18 AM
lib/Format/Format.cpp
1444 ↗(On Diff #59017)

This error output doesn't belong here. I think we can just remove it. If you'd prefer to keep it, add it to the loop in fixCppIncludeInsertions():

for (const auto &R : Replaces) {
  if (isHeaderInsertion(R))
    HeaderInsertions.insert(R);
  else if (R.getOffset() == UINT_MAX)
    log::errs() << ...
}
1556 ↗(On Diff #59017)

I'd move this check into fixCppIncludeInsertions.

unittests/Format/CleanupTest.cpp
310 ↗(On Diff #59017)

I'd pull out a lot of these environment setup things into abstractions in the test fixture. Maybe all you need is two functions like:

insert(Code, Replaces);

and

insertAndFormat(Code, Replaces);

?

311 ↗(On Diff #59017)

Can you just make this:

tooling::Replacements Replaces = {
    tooling::Replacement("fix.cpp", UINT_MAX, 0, "#include \"b.h\"")};

?

510 ↗(On Diff #59017)

I'd create a function:

tooling::Replacement createInsertion(StringRef HeaderName) {
  return tooling::Replacement("fix.cpp", UINT_MAX, 0, HeaderName);
}
ioeric updated this revision to Diff 59035.May 31 2016, 3:33 AM
ioeric marked 7 inline comments as done.
  • Addressed commments.
unittests/Format/CleanupTest.cpp
310 ↗(On Diff #59017)

I had tried to pull out the environment setup...but it didn't seem to save much code without sacrificing flexibility.

In some test cases, we also need to add replacements other than header insertions, and those replacements construction uses FileID, so createInMemoryFile needs to stay in test functions.

FormatStyle differs among test cases, so it has to stay.

All we save might just be the two lines which could be either cleanup or cleanup+format.\

Any better ideas?

djasper added inline comments.May 31 2016, 3:52 AM
unittests/Format/CleanupTest.cpp
310 ↗(On Diff #59035)

Well, the only reason you seem to be using the FileID or "Context" for that matter is to translate between line/col and offset. I'd just pull that functionality out into a separate function (which does it based on the "Code" input) or not at all (hard-coding the offset doesn't seem *that* bad).

510 ↗(On Diff #59035)

Have you seen this comment?

ioeric updated this revision to Diff 59043.May 31 2016, 5:02 AM
ioeric marked 3 inline comments as done.
  • Addressed reviewer's comments.
unittests/Format/CleanupTest.cpp
310 ↗(On Diff #59035)

Ohh, I see! Thanks!

510 ↗(On Diff #59035)

Sorry that I missed this one...

djasper accepted this revision.May 31 2016, 5:17 AM
djasper edited edge metadata.

A few nitpicks, but otherwise looks good.

lib/Format/Format.cpp
1287 ↗(On Diff #59043)

I'd consistently try to use i and e for integers vs. I and E for iterators.

unittests/Format/CleanupTest.cpp
256 ↗(On Diff #59043)

I'd probably make "Style" a class member, defaulting to LLVM Style that you can then overwrite for other styles.

270 ↗(On Diff #59043)

I'd consider making "Code" a member variable. But it has pros and cons.

313 ↗(On Diff #59043)

Why are you starting and ending the header guards with a double underscore? I think this is actually reserved for built-in identifiers. Maybe just stick with the LLVM naming scheme?

This revision is now accepted and ready to land.May 31 2016, 5:17 AM
ioeric updated this revision to Diff 59053.May 31 2016, 6:34 AM
ioeric marked 3 inline comments as done.
ioeric edited edge metadata.
  • Nits fixed.
This revision was automatically updated to reflect the committed changes.