This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Implement comment reflowing (v3)
ClosedPublic

Authored by krasimir on Jan 16 2017, 2:31 AM.

Details

Summary

This presents a version of the comment reflowing with less mutable state inside
the comment breakable token subclasses. The state has been pushed into the
driving breakProtrudingToken method. For this, the API of BreakableToken is enriched
by the methods getSplitBefore and getLineLengthAfterSplitBefore.

Event Timeline

krasimir updated this revision to Diff 84530.Jan 16 2017, 2:31 AM
krasimir retitled this revision from to [clang-format] Implement comment reflowing (v3).
krasimir updated this object.
krasimir added a reviewer: klimek.
krasimir added subscribers: ioeric, cfe-commits, mgorny and 2 others.
krasimir updated this revision to Diff 85339.Jan 23 2017, 3:19 AM
  • [clang-format] Improve the interface of BreakableToken and add comments.
krasimir added inline comments.Jan 23 2017, 3:22 AM
lib/Format/BreakableToken.h
87

By the way, I got confused, this stays because the new triplet of methods replaces the old replaceWhitespaceBefore; this method serves a different purpose during breaking, that is tries to shrink whitespace instead of inserting a break.

klimek added inline comments.Jan 23 2017, 3:40 AM
lib/Format/BreakableToken.h
40

Do we want to describe how replaceWhitespace fits in here, then?

krasimir updated this revision to Diff 85376.Jan 23 2017, 6:42 AM
  • Add a note about replaceWhitespace in the comments of BreakableToken.
krasimir marked an inline comment as done.Jan 23 2017, 6:43 AM
krasimir updated this revision to Diff 85381.Jan 23 2017, 6:53 AM
  • Add back a test case that I had previously removed for no good reason.
klimek added inline comments.Jan 23 2017, 8:53 AM
lib/Format/BreakableToken.h
55–56

Shouldn't that be called insertBreakBefore for consistency then?

Also, perhaps we want to rename replaceWhitespace to cleanupWhitespace or compressWhitespace or something...

lib/Format/ContinuationIndenter.cpp
1158–1159

Generally, we shouldn't need those here, as those should be part of the Token.Finalized state.

1220

Explain this part in a comment.

krasimir marked 3 inline comments as done.Jan 24 2017, 3:24 AM
krasimir added inline comments.
lib/Format/BreakableToken.h
55–56

insertBreakBefore will not be a good name for this, because it doesn't insert breaks.
There are two use-cases for this:

  1. The old one: adjusting indent in comments, for example:
// before
   // after
  1. The new one: reflowing, for example
before:
  // line1
  // line2
after:
  // line1 line2

The break between line1 and line2 gets removed in replaceWhitespaceBefore(2).

I'll rename replaceWhitespace to compressWhitespace.

lib/Format/ContinuationIndenter.cpp
1158–1159

Here the problem is that the comments /* clang-format on */ and /* clang-format off */ control the formatting of the tokens in between, but do not control their own formatting, that is they are not finalized themselves.
What happens is that if we rely solely on Token.Finalized and the ColumnLimit is small, say 20, then:

/* clang-format off */

gets broken into:

/* clang-format off
  */

Add comments about this.

1220

Realized that the place of this is not here; introduced a proper method BreakableToken::getLineLengthAfterCompres instead.

krasimir updated this revision to Diff 85557.Jan 24 2017, 3:24 AM
krasimir marked 3 inline comments as done.
  • Address review comments.
klimek added inline comments.Jan 24 2017, 4:17 AM
lib/Format/ContinuationIndenter.cpp
1158–1159

Isn't the right fix to change the tokens of the comment to be finalized?

krasimir added inline comments.Jan 24 2017, 4:32 AM
lib/Format/ContinuationIndenter.cpp
1158–1159

That's arguable. Consider FormatTest.cpp:10916: in this case we still want to re-indent the /* clang-format on/off */ comments, just not break them. If we finalize the token, we can't re-indent.

klimek edited edge metadata.Jan 24 2017, 8:14 AM

This is starting to be pretty awesome. The one thing that would help me review the gist of the implementation a bit more is if that had a bit more comments. Perhaps go over the math in the code and put some comments in why we're doing what at various steps.

lib/Format/BreakableToken.h
42–48

I assume the reason to not fold this into the next getSplitBefore is that sometimes we need to do the trimming even if we're the last line? Would it otherwise make sense?

85–88

I would cut "previously returned by getSplit", as I think this is irrelevant to this function.
Also, given that we hand in RemainingTokenColumn, why would LineIndex and TailOffset ever be relevant?
Lastly, this is not virtual, in which case it seems to not make sense to pass in any information that's not relevant for the current implementation?

105–108

Instead of passing in ReflowInProgress, I'd not call it if !ReflowInProgress and make the comment more reflow-examplish.

It basically returns the split of the current line, if any parts of the current line fit the the line above it.

131–133

s/makes sense when/can be used when/

lib/Format/ContinuationIndenter.cpp
1158–1159

I see two options:

  • mark the comment with a special token type for clang-format on/off
  • pull out a function - I see you already have one called mayReflowContent; can we use that?
1194–1205

(optional) I'd probably write this:

ReflowInProgress = SplitBefore.first != StringRef::npos;
if (ReflowInProgress) {
  TailOffset = SplitBefore.first + SplitBefore.second;
}
1222–1224

I think AfterCompression reads more naturally. Perhaps we should use "trim" instead of compress, now that I think about it.
So I'd probably go for getTrimmedLineLength or something.

lib/Format/WhitespaceManager.cpp
131

Why was this needed? (what breaks without this change)

unittests/Format/FormatTest.cpp
2169

But we want to reflow lines starting with one punctuation character?

krasimir marked an inline comment as done.Jan 24 2017, 8:35 AM
krasimir added inline comments.
lib/Format/BreakableToken.h
42–48

The reason why we can't fold is that the compressing and the reflow of the next might be independent and operate on totally different whitespace ranges.
Consider the following comments with line length 20:

// Column limit 20 |
// long           long
// long

Then after compressWhitespace(line 2) we get:

// Column limit 20 |
// long long
// long

And now replaceWhitespaceBefore(line 3) will reflow:

// Column limit 20 |
// long long long

Will add a note about this.

klimek added inline comments.Jan 24 2017, 8:41 AM
lib/Format/BreakableToken.h
42–48

Ah, wow, nice!

krasimir updated this revision to Diff 85725.Jan 25 2017, 3:25 AM
krasimir marked 8 inline comments as done.
  • Address review comments.
lib/Format/ContinuationIndenter.cpp
1158–1159

mayReflowContent is too specific for content during region. I'll add a delimitsRegion method.

1194–1205

How about this?

1222–1224

I don't agree with "trimmig" because i have a connotation of "trimming" around the ends of strings like in ltrim/rtrim and this fundamentally operates somewhere inside. +1 for AfterCompression, though.

lib/Format/WhitespaceManager.cpp
131

This is a dirty hack. The problem is that WhitespaceManager does its own formatting for trailing comments and sometimes it re-formats a piece of line even after it's been reflown with the previous line. Consider if we reflow the second line up:

// line 1
// line 2

That amounts to 2 changes, first (delimited by ()) for the whitespace between the two tokens, and second (delimited by []) for the beginning of the next line:

// line 1(
)[// ]line 2

So in the end we have two changes like this:

// line 1()[ ]line 2

Note that the OriginalWhitespaceStart of the second change is the same as the PreviousOriginalWhitespaceEnd of the first change.
In this case, the above code marks the second line as trailing comment and afterwards applies its own trailing-comment-alignment logic to it, which might introduce extra whitespace before "line 2".

For a proper solution we need a mechanism to say that a particular change emitted through the whitespace manager breaks the sequence of trailing comments.

unittests/Format/FormatTest.cpp
2169

I guess I'm looking at lines that start like this: "ala", 'ala'. So it may be better to special-case these two instead.
The heuristic for now, in mayReflowContent is: the content of the line needs to have at least two characters and either the first or the second character must be non-punctuation.

Another approach would be to not do these funny things and to see how users will complain about mis-re-flows and implement blacklist/whitelist content regex prefixes in the future.

Add a test showing the current positive behavior.

klimek added inline comments.Jan 25 2017, 3:38 AM
lib/Format/BreakableToken.cpp
295–296

Can we now use delimitsRegion here?

lib/Format/ContinuationIndenter.cpp
1090–1092

delimitsRegion seems overly abstract. Perhaps switchesFormatting?

1194–1205

That works.

1222–1224

Yea, correct, I completely missed that this might happen in the middle of the string if we happen to be the first token that runs over, which is a rather interesting case :)

lib/Format/WhitespaceManager.cpp
131

Ok. In this case this needs a largish comment with a FIXME, like you just described ;)

krasimir updated this revision to Diff 85737.Jan 25 2017, 5:30 AM
krasimir marked 3 inline comments as done.
  • Address comments. Add a bunch of comments about various range computations.

Tried to add some comments around range computations. Most of the time it's about converting range offsets from local line-based values to start-of-token-based offsets.

lib/Format/BreakableToken.cpp
295–296

Yep! Actually came up with even better alternative: we use switchesFormatting() below:

lib/Format/ContinuationIndenter.cpp
1090–1092

Yep. Also moved it to BreakableToken.h/.cpp so it can be used from both ContinuationIndenter::breakProtrudingToken and BreakableComment::mayReflow!

klimek added inline comments.Jan 25 2017, 5:45 AM
lib/Format/BreakableToken.cpp
697

What do you mean by "first line of a token"?

lib/Format/BreakableToken.h
93–94

I'd keep \p and LineIndex in the same line...

krasimir updated this revision to Diff 85739.Jan 25 2017, 5:56 AM
krasimir marked 2 inline comments as done.
  • Address review comments.
krasimir added inline comments.Jan 25 2017, 5:56 AM
lib/Format/BreakableToken.cpp
697

Sadly, there are multi-line line comment tokens, as in:

// line 1 \
// line 2

Add a note about it.

lib/Format/BreakableToken.h
93–94

Sadly, that's one thing that the reflowing doesn't support yet...

This revision is now accepted and ready to land.Jan 25 2017, 6:00 AM
This revision was automatically updated to reflect the committed changes.
lib/Format/CMakeLists.txt