This presents a version of the comment reflowing with less mutable state inside
the comment breakable token subclasses. The state has been pushed into the
driving breakProtrudingToken method. For this, the API of BreakableToken is enriched
by the methods getSplitBefore and getLineLengthAfterSplitBefore.
Details
Diff Detail
- Build Status
Buildable 3170 Build 3170: arc lint + arc unit
Event Timeline
lib/Format/BreakableToken.h | ||
---|---|---|
87 | By the way, I got confused, this stays because the new triplet of methods replaces the old replaceWhitespaceBefore; this method serves a different purpose during breaking, that is tries to shrink whitespace instead of inserting a break. |
lib/Format/BreakableToken.h | ||
---|---|---|
40 | Do we want to describe how replaceWhitespace fits in here, then? |
lib/Format/BreakableToken.h | ||
---|---|---|
55–56 | Shouldn't that be called insertBreakBefore for consistency then? Also, perhaps we want to rename replaceWhitespace to cleanupWhitespace or compressWhitespace or something... | |
lib/Format/ContinuationIndenter.cpp | ||
1158–1159 | Generally, we shouldn't need those here, as those should be part of the Token.Finalized state. | |
1220 | Explain this part in a comment. |
lib/Format/BreakableToken.h | ||
---|---|---|
55–56 | insertBreakBefore will not be a good name for this, because it doesn't insert breaks.
// before // after
before: // line1 // line2 after: // line1 line2 The break between line1 and line2 gets removed in replaceWhitespaceBefore(2). I'll rename replaceWhitespace to compressWhitespace. | |
lib/Format/ContinuationIndenter.cpp | ||
1158–1159 | Here the problem is that the comments /* clang-format on */ and /* clang-format off */ control the formatting of the tokens in between, but do not control their own formatting, that is they are not finalized themselves. /* clang-format off */ gets broken into: /* clang-format off */ Add comments about this. | |
1220 | Realized that the place of this is not here; introduced a proper method BreakableToken::getLineLengthAfterCompres instead. |
lib/Format/ContinuationIndenter.cpp | ||
---|---|---|
1158–1159 | Isn't the right fix to change the tokens of the comment to be finalized? |
lib/Format/ContinuationIndenter.cpp | ||
---|---|---|
1158–1159 | That's arguable. Consider FormatTest.cpp:10916: in this case we still want to re-indent the /* clang-format on/off */ comments, just not break them. If we finalize the token, we can't re-indent. |
This is starting to be pretty awesome. The one thing that would help me review the gist of the implementation a bit more is if that had a bit more comments. Perhaps go over the math in the code and put some comments in why we're doing what at various steps.
lib/Format/BreakableToken.h | ||
---|---|---|
42–48 | I assume the reason to not fold this into the next getSplitBefore is that sometimes we need to do the trimming even if we're the last line? Would it otherwise make sense? | |
85–88 | I would cut "previously returned by getSplit", as I think this is irrelevant to this function. | |
105–108 | Instead of passing in ReflowInProgress, I'd not call it if !ReflowInProgress and make the comment more reflow-examplish. It basically returns the split of the current line, if any parts of the current line fit the the line above it. | |
131–133 | s/makes sense when/can be used when/ | |
lib/Format/ContinuationIndenter.cpp | ||
1158–1159 | I see two options:
| |
1194–1205 | (optional) I'd probably write this: ReflowInProgress = SplitBefore.first != StringRef::npos; if (ReflowInProgress) { TailOffset = SplitBefore.first + SplitBefore.second; } | |
1222–1224 | I think AfterCompression reads more naturally. Perhaps we should use "trim" instead of compress, now that I think about it. | |
lib/Format/WhitespaceManager.cpp | ||
131 | Why was this needed? (what breaks without this change) | |
unittests/Format/FormatTest.cpp | ||
2169 | But we want to reflow lines starting with one punctuation character? |
lib/Format/BreakableToken.h | ||
---|---|---|
42–48 | The reason why we can't fold is that the compressing and the reflow of the next might be independent and operate on totally different whitespace ranges. // Column limit 20 | // long long // long Then after compressWhitespace(line 2) we get: // Column limit 20 | // long long // long And now replaceWhitespaceBefore(line 3) will reflow: // Column limit 20 | // long long long Will add a note about this. |
lib/Format/BreakableToken.h | ||
---|---|---|
42–48 | Ah, wow, nice! |
- Address review comments.
lib/Format/ContinuationIndenter.cpp | ||
---|---|---|
1158–1159 | mayReflowContent is too specific for content during region. I'll add a delimitsRegion method. | |
1194–1205 | How about this? | |
1222–1224 | I don't agree with "trimmig" because i have a connotation of "trimming" around the ends of strings like in ltrim/rtrim and this fundamentally operates somewhere inside. +1 for AfterCompression, though. | |
lib/Format/WhitespaceManager.cpp | ||
131 | This is a dirty hack. The problem is that WhitespaceManager does its own formatting for trailing comments and sometimes it re-formats a piece of line even after it's been reflown with the previous line. Consider if we reflow the second line up: // line 1 // line 2 That amounts to 2 changes, first (delimited by ()) for the whitespace between the two tokens, and second (delimited by []) for the beginning of the next line: // line 1( )[// ]line 2 So in the end we have two changes like this: // line 1()[ ]line 2 Note that the OriginalWhitespaceStart of the second change is the same as the PreviousOriginalWhitespaceEnd of the first change. For a proper solution we need a mechanism to say that a particular change emitted through the whitespace manager breaks the sequence of trailing comments. | |
unittests/Format/FormatTest.cpp | ||
2169 | I guess I'm looking at lines that start like this: "ala", 'ala'. So it may be better to special-case these two instead. Another approach would be to not do these funny things and to see how users will complain about mis-re-flows and implement blacklist/whitelist content regex prefixes in the future. Add a test showing the current positive behavior. |
lib/Format/BreakableToken.cpp | ||
---|---|---|
295–296 | Can we now use delimitsRegion here? | |
lib/Format/ContinuationIndenter.cpp | ||
1090–1092 | delimitsRegion seems overly abstract. Perhaps switchesFormatting? | |
1194–1205 | That works. | |
1222–1224 | Yea, correct, I completely missed that this might happen in the middle of the string if we happen to be the first token that runs over, which is a rather interesting case :) | |
lib/Format/WhitespaceManager.cpp | ||
131 | Ok. In this case this needs a largish comment with a FIXME, like you just described ;) |
Tried to add some comments around range computations. Most of the time it's about converting range offsets from local line-based values to start-of-token-based offsets.
lib/Format/BreakableToken.cpp | ||
---|---|---|
295–296 | Yep! Actually came up with even better alternative: we use switchesFormatting() below: | |
lib/Format/ContinuationIndenter.cpp | ||
1090–1092 | Yep. Also moved it to BreakableToken.h/.cpp so it can be used from both ContinuationIndenter::breakProtrudingToken and BreakableComment::mayReflow! |
Do we want to describe how replaceWhitespace fits in here, then?