This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] insert new #includes into correct blocks when cleaning up Replacement with cleanupAroundReplacements().
ClosedPublic

Authored by ioeric on May 27 2016, 9:24 AM.

Details

Summary

When a replacement's offset is set to UINT_MAX or -1U, it is treated as
a header insertion replacement by cleanupAroundReplacements(). The new #include
directive is then inserted into the correct block.

Diff Detail

Event Timeline

ioeric updated this revision to Diff 58799.May 27 2016, 9:24 AM
ioeric retitled this revision from to [clang-format] insert new #includes into correct blocks when cleaning up Replacement with cleanupAroundReplacements()..
ioeric updated this object.
ioeric added reviewers: djasper, klimek.
ioeric added subscribers: bkramer, cfe-commits.
djasper added inline comments.May 29 2016, 7:31 AM
lib/Format/Format.cpp
1444

But you should pass it as an ArrayRef or a SmallVectorImpl.

1446

nit: no braces

1451

From the docs in Format.h: "If none of the regular expressions match, INT_MAX is assigned". If that is true for an existing #include as well as the inserted one, I don't see a reason, why those should belong to the same block. So, I think you should return INT_MAX here and remove all the special-casing below.

1464

We already have an include regex in this file, I don't think we should have two. Can you pull out the string into a constant?

1468

There is a lot of duplicated code between this and sortCppIncludes. Pull common code out into functions.

1487

There is no guarantee that people assign categories sequentially, I can easily see people assigning category "1000" or "10000" to be able to add more categories later. I think this should be a map and I think you then also don't really need the vector. Also note that categories can be negative, which I think would break this.

1491

I am guessing that AfterFirstDefine is used so that we insert #includes after header guards. I think we should name it AfterHeaderGuard than and either make the behavior more strict or at least add a FIXME for it (only in header files, only if the #define is after an #ifndef, ...)

1515

I think if you assign the EndOffset of all lower categories that aren't set yet, that will simplify the logic below.

1542

What happens if there is no category with higher priority, but there is a category with lower priority. Shouldn't you then insert the #include right before that?

1581

I think you can remove this FIXME.. Seems obvious.

1584

Why do you split out all the header insertion replacements here instead of just ignoring the ones that aren't header insertions in fixCppIncludeInsertions?

1587

nit: No braces.

ioeric updated this revision to Diff 58952.May 30 2016, 6:24 AM
ioeric marked 9 inline comments as done.
  • Addressed reviewer's comments.
ioeric added inline comments.May 30 2016, 6:26 AM
lib/Format/Format.cpp
1584

I think this enables us to exit early in fixCppIncludeInsertions if there is no header insertion replacement at all. Also, it is easier to split out header insertion replacements and then merge fixed replacements with other replacements instead of doing this in place, which IMO might not be more efficient since there are still set deletion and insertion for each header insertion replacement.

bkramer added inline comments.May 30 2016, 9:54 AM
lib/Format/Format.cpp
1283

Typo: IncldueName

1502

Typo: intialized

djasper added inline comments.May 30 2016, 10:44 AM
include/clang/Format/Format.h
778

Why don't we just stick with UINT_MAX everywhere?

lib/Format/Format.cpp
1456

Unused

1465

Why not just a set?

1487

nit: s/more strict/stricter/

1493

Are you sure that this is what's implemented? The way I see it is:

  • Ensure that EndOffset[Highest] is always populated.
  • If EndOffset[Priority] isn't set, use the next higher value that is set, up to EndOffset[Highest].
1512

I'd leave all of this logic out. This only affects headers with no current #includes, which is rare to begin with and on top of that, this is more to do with whether or not we want to automatically add new lines between the different blocks. As it currently is, this would do both:

#define HEADER_GUARD

#include <a> // inserted

and

#define HEADER_GUARD
#include <a> // inserted
#include <b>

Which might be the right behavior, but I think this is the wrong place to worry about it. In the long run, we should probably make clang-format inserted empty lines around the #include blocks if desired.

1524

Add a comment:

// By this point, CategoryEndOffset[Highest] is always set appropriately:
//  - to an appropriate location before/after existing #includes, or
//  - to right after the header guard, or
//  - to the beginning of the file.
1530

This should be done together with the check for the offset being -1u. Eventually we are going to have something else that we are going to insert (e.g. using declarations) and we don't want to confuse those. And even more so, if all replacements at offset -1u are not #includes, we don't need to execute most of this function.

1537

Why is CheckMainHeader false?

1540

No parentheses.

1549

Hm.. Not happy with the naming. There is no real difference between fixHeaderInsertions and fixCppIncludeInsertions. You could argue that the "Cpp" part is the difference, but the actual difference is that one iterates over an entire set of replacements whereas the other only modifies header insertions.

I think I'd just merge the two functions, but that probably also ties back to my wish not to separate out the two replacement sets ;-).

ioeric updated this revision to Diff 58982.May 30 2016, 11:46 AM
ioeric marked 16 inline comments as done.
  • Addressed reviewers' comments.
lib/Format/Format.cpp
1549

How about fixInsertionReplacements?

djasper added inline comments.May 30 2016, 12:12 PM
lib/Format/Format.cpp
1284

Nit: Otherwise, returns the priority of the matching category or INT_MAX.

1502

nit: I think you should be using "I" and "E".

1507

Can you just use "std::prev(I)" instead of Prev?

1549

So, not knowing anything of it, what's the difference between fixCppIncludeInsertions and fixInsertionReplacements and when do you need to call what. To me the fact that it's that difficult to find a name for it is telling..

Also, I still don't understand why you think it is better to separate the two functions. In that way you are currently implementing it, it is certainly less efficient. You are always copying all the replacements, which (in the way you are doing it) is O(N lg N), even if there isn't a single header insertion. Maybe at least use set_difference (similar to what I am showing below).

So, how I would structure it:

  • Pull out a function isHeaderInsertion(const Replacement &). Easy naming, good to pull this functionality out.
  • At the start of fixCppIncludeInsertions, write:
tooling::Replacements HeaderInsertions;
for (const auto &R : Replaces)
  if (isHeaderInsertion(R))
    HeaderInsertions.insert(R);
if (HeaderInsertions.empty())
  return Replaces;

tooling::Replacements Result;
std::set_difference(Replaces.begin(), Replaces.end(),
                    HeaderInsertions.begin(), HeaderInsertions.end(),
                    Result.begin());
  • Do what the function currently does with HeaderInsertions and add the results back to Result.
ioeric updated this revision to Diff 58992.May 30 2016, 1:23 PM
ioeric marked 3 inline comments as done.
  • Addressed reviewer's comments.
lib/Format/Format.cpp
1549

I really appreciate the way you structure it, which does make the code clearer.

Just one coveat: std::set_difference() is not applicable on Replacements since it calls std::copy() on Replacement, which Replacement does not support.

I think another way to approach it would be to delete header insertion replacement and insert new replacements back if we can assume that there are only few header insertion replacements.

ioeric updated this revision to Diff 58993.May 30 2016, 1:26 PM
  • Removed inline from isHeaderInsertion.
djasper added inline comments.May 31 2016, 12:26 AM
lib/Format/Format.cpp
1548

So, add a copy constructor to Replacement?

ioeric updated this revision to Diff 59017.May 31 2016, 12:48 AM
  • Use std::set_difference in fixCppIncludeInsertions()
djasper added inline comments.May 31 2016, 2:18 AM
lib/Format/Format.cpp
1444

This error output doesn't belong here. I think we can just remove it. If you'd prefer to keep it, add it to the loop in fixCppIncludeInsertions():

for (const auto &R : Replaces) {
  if (isHeaderInsertion(R))
    HeaderInsertions.insert(R);
  else if (R.getOffset() == UINT_MAX)
    log::errs() << ...
}
1556

I'd move this check into fixCppIncludeInsertions.

unittests/Format/CleanupTest.cpp
310

I'd pull out a lot of these environment setup things into abstractions in the test fixture. Maybe all you need is two functions like:

insert(Code, Replaces);

and

insertAndFormat(Code, Replaces);

?

311

Can you just make this:

tooling::Replacements Replaces = {
    tooling::Replacement("fix.cpp", UINT_MAX, 0, "#include \"b.h\"")};

?

510

I'd create a function:

tooling::Replacement createInsertion(StringRef HeaderName) {
  return tooling::Replacement("fix.cpp", UINT_MAX, 0, HeaderName);
}
ioeric updated this revision to Diff 59035.May 31 2016, 3:33 AM
ioeric marked 7 inline comments as done.
  • Addressed commments.
unittests/Format/CleanupTest.cpp
310

I had tried to pull out the environment setup...but it didn't seem to save much code without sacrificing flexibility.

In some test cases, we also need to add replacements other than header insertions, and those replacements construction uses FileID, so createInMemoryFile needs to stay in test functions.

FormatStyle differs among test cases, so it has to stay.

All we save might just be the two lines which could be either cleanup or cleanup+format.\

Any better ideas?

djasper added inline comments.May 31 2016, 3:52 AM
unittests/Format/CleanupTest.cpp
310

Well, the only reason you seem to be using the FileID or "Context" for that matter is to translate between line/col and offset. I'd just pull that functionality out into a separate function (which does it based on the "Code" input) or not at all (hard-coding the offset doesn't seem *that* bad).

510

Have you seen this comment?

ioeric updated this revision to Diff 59043.May 31 2016, 5:02 AM
ioeric marked 3 inline comments as done.
  • Addressed reviewer's comments.
unittests/Format/CleanupTest.cpp
310

Ohh, I see! Thanks!

510

Sorry that I missed this one...

djasper accepted this revision.May 31 2016, 5:17 AM
djasper edited edge metadata.

A few nitpicks, but otherwise looks good.

lib/Format/Format.cpp
1287

I'd consistently try to use i and e for integers vs. I and E for iterators.

unittests/Format/CleanupTest.cpp
257

I'd probably make "Style" a class member, defaulting to LLVM Style that you can then overwrite for other styles.

271

I'd consider making "Code" a member variable. But it has pros and cons.

297

Why are you starting and ending the header guards with a double underscore? I think this is actually reserved for built-in identifiers. Maybe just stick with the LLVM naming scheme?

This revision is now accepted and ready to land.May 31 2016, 5:17 AM
ioeric updated this revision to Diff 59053.May 31 2016, 6:34 AM
ioeric marked 3 inline comments as done.
ioeric edited edge metadata.
  • Nits fixed.
This revision was automatically updated to reflect the committed changes.