This is an archive of the discontinued LLVM Phabricator instance.

Add basic #include sorting functionality to clang-format
ClosedPublic

Authored by djasper on Jul 15 2015, 1:57 PM.
Tokens
"Love" token, awarded by zauguin.

Details

Diff Detail

Event Timeline

djasper updated this revision to Diff 29826.Jul 15 2015, 1:57 PM
djasper retitled this revision from to Add basic #include sorting functionality to clang-format.
djasper updated this object.
djasper added reviewers: cfe-commits, klimek.
klimek added inline comments.Jul 20 2015, 9:30 AM
lib/Format/Format.cpp
1594–1598

Perhaps document that Includes needs to be in strict source order.

1624

You'll probably want to allow more spaces if this is run before formatting:
\s*#\s*include\s+
Also, we'll probably want .*? in the middle (or [^">]) if there is a > or " in a comment afterwards.

1632–1634

Having the ! case first without early exit confused me.

1642–1644

Why does this work if Pos is npos here? Or can that not happen?

1647

Here I can see that npos doens't matter, because we'll not re-use Prev.

tools/clang-format/ClangFormat.cpp
218

Call this outputReplacementsXML (so it's consistent with outputReplacementXML), or rename all.

244–249

Aren't now the ranges incorrect?

279

This is not atomic, which has been a problem in the past (and which is why the rewriter's overwriteChangedFiles is carefully crafted).
Any reason to not use a second rewriter? Or even better, can we:

  1. Create a rewriter
  2. Get affected ranges
  3. Iff include sorting, sort includes
  4. Recompute affected ranges in new file
  5. clang-format
  6. Rewrite

Alternatively, do not support include sorting and clang-format in the same run and have editor integrations do 2 calls (?)

unittests/Format/SortIncludesTest.cpp
32–37

Nit: I find it slightly distracting that we're not calling them a.h, b.h, c.h (and perhaps have one test that checks that they support longer filenames).

51–60

Do we want to sort C vs C++ system headers?

djasper marked 7 inline comments as done.Jul 20 2015, 11:21 AM
djasper added inline comments.
lib/Format/Format.cpp
1642–1644

Simplified so that this isn't an issue any more.

1647

Ack.

tools/clang-format/ClangFormat.cpp
244–249

So, there are two cases:

  1. Ranges outside of #include blocks. These should not change as the size of the include block doesn't change.
  2. Ranges inside of a #include block. Yes, here we kind of format a #include statement that isn't necessarily the one that was included in the original range. I think the right fix here is to format the entire #include block if something is moved around (to re-align comments if necessary). What do you think? Is that ok as a follow-up patch?
279

Yes, using a second rewriter or recomputing the affected ranges is a significant calculation for the sole purpose of un-doing the offsets calculated in the rewriter. Furthermore, we already have the new code after #include sorting, which we simply calculate again, which seems like wasted efforts.

I don't know how the non-atomic-write has been a problem, but if it is, I'd prefer to pull out a function or the AtomicallyMovedFile.

I am strongly against letting this be handled by editor integrations so that the integrations exhibit consistent behavior and that we don't need to implement this N times.

unittests/Format/SortIncludesTest.cpp
51–60

Yes, but I want to do refinements to the actual sorting in a follow up. I have a bunch of ideas.

djasper updated this revision to Diff 30176.Jul 20 2015, 11:22 AM
djasper marked 2 inline comments as done.

Addressed review comments.

klimek added inline comments.Jul 21 2015, 1:10 AM
tools/clang-format/ClangFormat.cpp
279

I don't understand yet why we can't use 1 rewriter. The rewriter should automatically use the updated ranges after we did the fixes, shouldn't it?

I definitely think we'll need to have 2 different runs anyway for all integrations that run on replacements, or otherwise we'd need to somehow group replacements in a way that it's clear in which order they need to be applied, and when the basis for the calculation of the ranges changes.

unittests/Format/SortIncludesTest.cpp
51–60

SG

djasper added inline comments.Jul 21 2015, 1:21 AM
tools/clang-format/ClangFormat.cpp
279

That's exactly what the rewriter does. It simply doesn't help us at all in this case. Other than writing the file atomically, it only makes our life harder.

I think clang-format outputting the replacements in the order in which they need to applied makes sense and is significantly easier to integrate with than a two-phase run.

klimek added inline comments.Jul 21 2015, 2:34 AM
tools/clang-format/ClangFormat.cpp
279

I'll need to tinker with that myself, but that's not a blocker. I'm still unhappy with the interface though. I think having a list of replacements where you can't select a single one and apply it is a very subtle interface. Have you tested this with the eclipse / visual studio integration?

djasper added inline comments.Jul 21 2015, 6:48 AM
tools/clang-format/ClangFormat.cpp
279

Oooo... I have a good idea to fix both of these things. Let me rewrite some stuff.

djasper updated this revision to Diff 31670.Aug 10 2015, 8:00 AM

Added an alternative way to handle the two sets of Replacements: Merge them properly.

I haven't completely finished implementing / testing this. Wanted to get feedback first before going further down this road.

djasper updated this revision to Diff 34137.Sep 7 2015, 3:15 AM
djasper updated this object.

Now properly implemented merging of two replacement sets that are meant to be applied in sequence. I think this functionality will also be highly useful, e.g. when formatting the result of a clang-tidy fix or many other such things.

klimek edited edge metadata.Sep 14 2015, 2:20 PM

First round of comments; some things are still a bit confusing, so I hope another round will help to weed them out.

include/clang/Tooling/Core/Replacement.h
223–224

s/to/two/

lib/Format/Format.cpp
1612–1620

I think this wants a comment along the lines of:
We create one large replacement for the block of includes.
If the order is already correct, we do not want to create that replacement.

1644

Do we want to use \s instead of "\ " so this also works for tab-using codebases?

1651

I'd have expected early exit, but I can see it'd introduce a one-line duplication. Not sure, just wanted to bring it up.

1660–1663

Nit: I think this would be a bit more readable if we only call sortInlucdes if !IncludesInBlock.empty() and add that as condition to the else.

lib/Tooling/Core/Replacement.cpp
310–313

Nit: remove the first ','.
s/shoud/should/
Also: 'non-overlapping in each set'? (or is that incorrect? the function comment in the header doesn't currently tell, I think we might want to change that)

314

Would be good to learn up here why SecondI == Second.end() implies "an element from 'Second' needs to be merged".
Ok, it looks like this implies that the other one is the one that is the basis for merging. Perhaps MergeIntoFirst?

332–336

This structure looks like we'd want 2 functions. I assume you had reasons not to pull out two local functions? What are they? Too many parameters? (I'd probably pull out a class to be able to put this into smaller methods, but I'm probably weird).

341

If SecondI == Second.end(), isn't this an undefined access?

354

I'm surprised the explicit Twine() is needed here...

djasper marked 8 inline comments as done.Sep 15 2015, 5:35 AM
djasper added inline comments.
lib/Format/Format.cpp
1651

It's actually three lines I think. Also, I think we'll significantly extend the logic here shortly (e.g. to properly support Google and LLVM styles), so I don't think it is worth thinking about too much.

1660–1663

Done and removed the check inside sortIncludes.

lib/Tooling/Core/Replacement.cpp
332–336

Maybe a class would be the way to go here. I'll try in a subsequent patch (don't think it should be a priority at this point).

341

Cannot happen. The while starts with MergeSecond && SecondI != Second.end().

354

How else would this know to use a Twine for efficient concatenation?

djasper updated this revision to Diff 34800.Sep 15 2015, 5:35 AM
djasper edited edge metadata.
djasper marked 4 inline comments as done.

Addressed (most) comments

djasper updated this revision to Diff 35375.Sep 22 2015, 8:34 AM

Restructured a bit. Not as many comments just yet. Do you think this is an improvement?

Sending another batch of comments.

lib/Tooling/Core/Replacement.cpp
305–307

Merge the two sets which are each sorted by offset:
Look at the the next replacement from both sets and check whether they overlap:

  1. as long as they do not overlap, we add the replacement with the lower offset to the result set
  2. when they do overlap, merge into the replacement with the lower offset: repeatedly take the next replacement from the other queue and merge it, until the end of the next replacement is past the end of the currently merged into replacement; in that case, the next replacement becomes the currently merged into replacement;

Invariants:

  • we always merge replacements from the first set into a replacement from the second set or vice versa
  • an invariant while merging is that the offset of the next replacement to merge is always >= the offset of the currently merged-into replacement
  • more?

Position projections:
When comparing positions, we calculate the projection of positions in the second (later) set into where they would happen in the original string.
In the above algorithm, we can always project the start position of a replacement from the second set correctly: only a replacement with a smaller offset can influence the start position of a replacement from the second set; due to the invariant, we never try to merge a replacement from the second set before we have merged all replacements with a smaller offset.
We cannot generally project the end position of a replacement from the second set correctly, but:

  • we can correctly project it correctly if it is smaller or equal to the end position of the replacement from the first set we currently compare to
  • that means if we cannot correctly project it, we still know it is larger than the end position of the replacement from the first set we currently compare against, so the replacement from the second set will stay the element to merge into, and we will continuously adapt the end position's projection when we merge further replacements from the first set.
308–309

Comment:
We get here on the first run or when we ended a merge sequence (that is, we found that
we are at the start of a new replacement in the result set).

(not sure whether that comment carries it's weight; I started it because it took me a moment to figure out why we don't compute which one to take next during merging, and it's a bit more complex, but not sure where to put the comment yet; basically, the problem is that if the next replacement at the end doesn't overlap, it might also be after a non-overlapping replacement from the other set, which I think is not entirely obvious).

310–314

Ok, I think I now understand what confuses me here:
When we do *not* merge (the replacements don't overlap), we take the other element next into the result set. A less specific name like 'FirstNext', or 'NextFromFirst', or even 'MergedIsFirst' (which would fit the 'Merged' prefix of the attributes later) would probably fit my mental model better.
Note that for me there's also ambiguity (at least on the first read) between 'merge into the result set' and 'merge (subsume) one replacement into the current one'.

For me calling this:
NextIsFirst, NextOffset, NextLength, NextReplacement
would help.

326

This I'm less sure about, but I think 'Text' is somewhat ambiguous here - perhaps 'MergedReplaceWith'?

354

There are various overloads of + that yield Twine?

djasper updated this revision to Diff 35395.Sep 22 2015, 10:57 AM

Added comments and did some renaming.

klimek accepted this revision.Sep 22 2015, 11:06 AM
klimek edited edge metadata.

Ok, I think this is now understandable enough for me to go in.

This revision is now accepted and ready to land.Sep 22 2015, 11:06 AM
djasper closed this revision.Sep 28 2015, 7:28 AM