This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Format/
-
Format/
-
ContinuationIndenter.h
4/4
ContinuationIndenter.cpp
-
FormatToken.h
-
FormatToken.cpp
2/5
UnwrappedLineFormatter.cpp
-
unittests/Format/
-
Format/
2/2
FormatTest.cpp

Differential D33589

clang-format: consider not splitting tokens in optimization
AbandonedPublic

Authored by Typz on May 26 2017, 3:11 AM.

Download Raw Diff

Details

Reviewers

krasimir
djasper

Summary

This patch tries to improve the optimizer a bit, to avoid splitting
tokens (e.g. comments/strings) if only there are only few characters
beyond the ColumnLimit.

Previously, comments/strings would be split whenever they went beyond
the ColumnLimit, without consideration for the PenaltyBreakComment/
String and PenaltyExcessCharacter. With this patch, the
OptimizingLineFormatter will also consider not splitting each token,
so that these 'small' offenders get a chance:

// With ColumnLimit=20
int a; // the
       // comment

// With ColumnLimit=20 and PenaltyExcessCharacter = 10
int a; // the comment

This patch does not fully optimize the reflowing, as it will only try
reflowing the whole comment or not reflowing it at all (instead of
trying each split, to allow some lines of overflow ColumnLimit even
when reflowing).

Diff Detail

Build Status

Buildable 6795
Build 6795: arc lint + arc unit

Event Timeline

Typz created this revision.May 26 2017, 3:11 AM

Herald added a subscriber: klimek. · View Herald TranscriptMay 26 2017, 3:11 AM

fix indentation issues

This seems to work fine, but may not find the 'optimal' reflow, thus I would need some feedback:

Is this approach desirable, as a relatively easy fix?
Or should this be fixed with a complete refactoring of the way the strings/comments are split, making multiple tokens out of them to let them be reflown 'directly' by the optimizing line formatter? (keep the optimizer, but changes the whole way comments are handled, and need to move all the information about the comment block and its relfowing into LineState)
Or should the breakProtrudingToken() actually perform an optimization of the split of this token? (keeps the same overall structure, but requires writing another optimizer)

unittests/Format/FormatTest.cpp
8571	This is not working as expected, format return: int a; /* first line * second * line third * line */ Any clue how things could go so wrong?

djasper edited reviewers, added: alexfh; removed: djasper.May 26 2017, 4:46 AM

Typz marked an inline comment as done.May 30 2017, 5:25 AM

Typz added inline comments.

unittests/Format/FormatTest.cpp
8571	This was a bug in tests: should not use test::messUp() with multi-line comments.

fix code & tests

I think that what you're trying to solve is not practically that important, is unlikely to improve the handling of comments, and will add a lot of complexity.

From a usability perspective, I think that people are happy enough when their comments don't exceed the line limit. I personally wouldn't want the opposite to happen. I've even seen style guides that have 80 columns limit for comments and 100 for code.

Comments are treated in a somewhat ad-hoc style in clang-format, which is because they are inherently free text and don't have a specific format. People just expect comments to be handled quite differently than source code. There are at least three separate parts in clang-format that make up the formatting of comments: the normal parsing and optimization pipeline, the BreakableLineCommentSection / BreakableBlockComment classes that are responsible for breaking and reflowing inside comment sections, and the consecutive trailing comment alignment in the WhitespaceManager. This patch takes into account the first aspect but not the consequences for the other aspects: for example it allows for the first line comment token in a multiline line comment section to get out of the column limit, but will reflow the rest of the lines. A WhitespaceManager-related issue is that because a trailing line comment for some declaration might not get split, it might not be aligned with the surrounding trailing line comments, which I think is a less desirable effect.

Optimizing the layout in comment sections by using the optimizing formatter sounds good, but because comments mostly contain free text that can be structured in unexpected ways, I think that the ad-hoc approach works better. Think of not destroying ASCII art and supporting bulleted and numbered lists for example. We don't really want to leak lots of comment-related formatting tweaks into the general pipeline itself.

I see you point that PenaltyBreakComment and PenaltyExcessCharacter are not taken into account while comment placement, but they are taken into account at a higher level by treating the whole comment section as a unit. For example, if you have a long declaration that has multiple potential split points followed by a long trailing comment section, the penalty induced by the number of lines that are needed and the number of unbroken characters for the formatting of the comment section is taken into account while optimizing the layout of the whole code fragment.

The formatted currently supports somewhat limited version of allowing comment lines exceeding the column limit, like long hyperlinks for example. I think that if there are some other examples which are both widespread and super annoying, we may consider them separately.

In D33589#769893, @krasimir wrote:

I think that what you're trying to solve is not practically that important, is unlikely to improve the handling of comments, and will add a lot of complexity.

Not sure the 'approach' I have in this patch is nice, but it does solve my problem: it is quite simple, and avoids reflowing (and actually, sometimes breaking, since comments are not so structured...) the comments.
e.g. if I have a line followed by a relatively long comment, with my patch it will really consider the penaltys properly, and not split a line which is slightly longer [with ColumnLimit=30] :

int a; //< a very long comment

int a; //< a very long
       //< comment

From a usability perspective, I think that people are happy enough when their comments don't exceed the line limit. I personally wouldn't want the opposite to happen. I've even seen style guides that have 80 columns limit for comments and 100 for code.

That is a matter of taste and coding style: I prefer overflowing by a few characters instead of introducing an extra line... I see no reason to allow customizing PenaltyExcessCharacter yet not use this value to make the decision.

Comments are treated in a somewhat ad-hoc style in clang-format, which is because they are inherently free text and don't have a specific format. People just expect comments to be handled quite differently than source code. There are at least three separate parts in clang-format that make up the formatting of comments: the normal parsing and optimization pipeline, the BreakableLineCommentSection / BreakableBlockComment classes that are responsible for breaking and reflowing inside comment sections, and the consecutive trailing comment alignment in the WhitespaceManager. This patch takes into account the first aspect but not the consequences for the other aspects: for example it allows for the first line comment token in a multiline line comment section to get out of the column limit, but will reflow the rest of the lines. A WhitespaceManager-related issue is that because a trailing line comment for some declaration might not get split, it might not be aligned with the surrounding trailing line comments, which I think is a less desirable effect.

Not sure I understand the case you are speaking about, can you please provide an exemple?

(on a separate topic, I could indeed not find a penalty for breaking alignment; I think the optimizer should account for that as well... any hint on where to look for adding this?)

Optimizing the layout in comment sections by using the optimizing formatter sounds good, but because comments mostly contain free text that can be structured in unexpected ways, I think that the ad-hoc approach works better. Think of not destroying ASCII art and supporting bulleted and numbered lists for example. We don't really want to leak lots of comment-related formatting tweaks into the general pipeline itself.

Good, one less choice :-)

And I indeed agree optimizing lines 'individually' may be much worse than the current ad-hoc approach, unless we could depend on extra syntax in comments (like full understanding of markdown and doxygen syntax), but that is out of topic here.

But this patch does try to move away from the adhoc approach: i am just trying to allow *not* wrapping the comments when it gives better score, based on the PenaltyBreakComment and PenaltyExcessCharacter values.

I see you point that PenaltyBreakComment and PenaltyExcessCharacter are not taken into account while comment placement, but they are taken into account at a higher level by treating the whole comment section as a unit. For example, if you have a long declaration that has multiple potential split points followed by a long trailing comment section, the penalty induced by the number of lines that are needed and the number of unbroken characters for the formatting of the comment section is taken into account while optimizing the layout of the whole code fragment.

If you reduce PenaltyExcessCharacter you see that comments can never overflow: the optimizer will consider splitting the comments or splitting the remaining of the code, but will (currently) not allow the comment to overflow just a bit.

The formatted currently supports somewhat limited version of allowing comment lines exceeding the column limit, like long hyperlinks for example. I think that if there are some other examples which are both widespread and super annoying, we may consider them separately.

This patch does not try to address these special cases, and should not change the behavior in this case.

ping?

@krasimir : ping

@krasimir, @alexfh : can I get some feedback?

This patch solves a practical problem, i.e. allowing the comment to overflow a bit without triggering a reflow [according to the priorities which are in config, obviously]. It may not always provide the "best" wrapping, but as you pointed out comments are unstructured, so processing must indeed stay quite conservative: which is how this patch tries to behave.

Is there a better way to do this? or do you see something missing?

There are at least three separate parts in clang-format that make up the formatting of comments: the normal parsing and optimization pipeline, the BreakableLineCommentSection / BreakableBlockComment classes that are responsible for breaking and reflowing inside comment sections, and the consecutive trailing comment alignment in the WhitespaceManager. This patch takes into account the first aspect but not the consequences for the other aspects

You suggested there is an issue, but I could not isolate it. Can you provide an exemple?

alexfh edited reviewers, added: djasper; removed: alexfh.Jun 23 2017, 7:35 AM

Daniel is a better reviewer here than myself. A few cents from me though: why do we want to make an exception for comments and not for regular code?

In D33589#789002, @alexfh wrote:

why do we want to make an exception for comments and not for regular code?

This is not an exception for comments: the PenaltyExcessCharacter is used whenever the code is longer than the ColumnLimit, and used to compute the best solution.
The default value is extremely high, so there is no practical difference: code does not go beyond the column, except for specific exceptions.

However, the difference becomes apparent when reducing this penalty: in that case, code can break the column limit instead of wrapping, but it does not happen with comments. This patch tries to restore the parity, and let comments break the comment limit as well.

ping?

djasper added inline comments.Jul 4 2017, 1:51 AM

lib/Format/UnwrappedLineFormatter.cpp
723	This is almost certainly a no-go. It turns the algorithm from exploring a state space with a branching factor of 2 into something with a branching factor of 4. Even assuming we can perfectly reuse results from computations (or in other words hit the same state through different path using Dijkstra's algorithm), the additional bool in StateNode doubles the number of states we are going to visit. You don't even seem to make a distinction of whether the token under investigation can possibly be split. You do look at NoLineBreak(InOperand), but even if those are false, the vast majority of tokens will never be split. However, even if you narrow that down, I am not convinced that fixing this inconsistency around the formatting penalties is worth the runtime performance hit. I am generally fine with changing the behavior in the way you are proposing, but only if it has zero (negative) effect on performance.
764	It's not clear to me why you'd return here.

Typz added inline comments.Jul 13 2017, 5:17 AM

lib/Format/UnwrappedLineFormatter.cpp
723	Making the distinction to skip some path is done at the beginning of addNextStateToQueue(), though indeed not very precisely at the moment. I can improve the check (i.e. by copying all the early return conditions from the beginning of `ContinuationIndenter::breakProtrudingToken()`, which will greatly reduce the number of possible state, but stilll have a small impact. The alternative would be modify ContinuationIndenter::breakProtrudingToken(), so that it computes the penalty for reflowing as well as not-reflowing, and updates the LineState with the best solution. It would certainly have an impact on performance, but would not increase the size of the state space. Another issue with that approach is that the information is not passed from the DRYRUN phase to the actual rewriting phase. I could store this information in the LineState, to re-use it in the reconstruction phase, but this seems really wrong (and would work only in the exact case of the optimizing line formatter). What do you think? Should I keep the same structure but reduce the number of explored state; or move the decision into ContinuationIndenter, possibly storing the result in LineState?
764	This is needed to avoid trying to break when this is not supported: no point doing twice the same thing.

Typz marked 2 inline comments as done.Jul 13 2017, 6:30 AM

Typz added inline comments.

lib/Format/UnwrappedLineFormatter.cpp
723	Actually, it seems it cannot be done inside ContinuationIndenter::breakProtrudingToken(), since this causes the whitespace manager to be called twice for the same token.

Move code out of optimizer, directly into ContinuationIndenter::breakProtrudingToken(), to minimize impact on performance.
Comment reflowing of breakable items which break the limit will be slightly slower, since we now consider the two options; however this change has no impact on the performance of processing non-breakable items, and may actually increase the operformance of processing breakable items which do not need to be reflowed.

Harbormaster completed remote builds in B8190: Diff 106442.Jul 13 2017, 9:30 AM

Rebase

Harbormaster completed remote builds in B8691: Diff 108651.Jul 28 2017, 7:16 AM

ping?

I have a slightly hard time grasping what this patch now actually does? Doesn't it simply try to decide whether or not to make a split locally be comparing the PenaltyBreakComment against the penalty for the access characters? If so, couldn't we simply do that as an implementation detail of breakProtrudingToken() without needing to let anything outside of it now and without introducing State.Reflow?

lib/Format/ContinuationIndenter.cpp
1330	Can you create a patch that doesn't move the code around so much? Seems unnecessary and hard to review.
1437	I believe that this is incorrect. reflowProtrudingToken counts the length of the unbreakable tail and here you just remove the penalty of the token itself. E.g. in: string s = f("aaa"); the ");" is the unbreakable tail of the stringl

Typz added inline comments.Sep 13 2017, 7:50 AM

lib/Format/ContinuationIndenter.cpp
1330	Moving the code around is an unfortunate requirement for this patch: we must actually do the "reflowing" twice, to find out which solution (actually reflowing or not) gives the least penalty. Therefore the function that reflowing must be moved out of `breakProtrudingToken()`... All I can do is try moving the new function after `breakProtrudingToken()`, maybe Phabricator will better show the change.
1437	This behavior has not changed : before the commit, the last token was not included in the penalty [c.f. `if` at line 1338 in original code]. To make the comparison significative, the last token's penalty is included in the penalty returned by `reflowProtrudingToken()` (hence the removal of the test at line 1260); and it is removed before returning from this function, to keep the same behavior as before.

Reorder the functions to minimize diff.

Harbormaster completed remote builds in B10186: Diff 115051.Sep 13 2017, 8:28 AM

I still don't understand yet. breakProtrudingToken has basically two options:

Don't wrap/reflow: In this case the penalty is determined by the number of excess characters.
Wrap/reflow: I this case the penalty is determined by PenaltySplitComments plus the remaining excess characters.

My question is, why do you need to put anything into the state and do this outside of breakProtrudingToken. It seems to me that breakProtrudingToken can do this locally without putting anything into the State.

For one thing, we need to update the state to store the "decision" of the reflowing mode, which is performed only in DryRun=true mode, to avoid doing the computation multiple times.

Apart from this, the decision is conceptually internal to breakProtrudingToken(). But the 'computation' of either mode is very similar, and updates the state variable anyway (State.Column, Stack.back().LastSpace, State.Stack[i].BreakBeforeParameter, Token->updateNextToken(State)) : so it is simpler (code-wise) to split the function and call it twice, then decide which 'State' variable to use.

I think doing the computation twice is fine. Or at least, I'd need a test case where it actually shows substantial overhead before doing what you are doing here. Understand that creating more States and making the State object itself larger also has cost and that cost occurs in the combinatorial exploration of the solution space. Doing an additional computation at the end should be comparatively cheap. Look at it this way: During the exploration of the solution space, we might enter breakProtrudingToken many times for the same comment. One more time during reconstruction of the solution is not that harmful.

In D33589#875039, @djasper wrote:

I think doing the computation twice is fine. Or at least, I'd need a test case where it actually shows substantial overhead before doing what you are doing here. Understand that creating more States and making the State object itself larger also has cost and that cost occurs in the combinatorial exploration of the solution space. Doing an additional computation at the end should be comparatively cheap. Look at it this way: During the exploration of the solution space, we might enter breakProtrudingToken many times for the same comment. One more time during reconstruction of the solution is not that harmful.

I 've just tried to implement this (e.g. make the 2 calls also when DryRun=false), and I am running into an assertion in WhitespaceManager :

[ RUN      ] FormatTest.ConfigurableUseOfTab
FormatTests: /workspace/llvm/tools/clang/lib/Format/WhitespaceManager.cpp:112: void clang::format::WhitespaceManager::calculateLineBreakInformation(): Assertion `PreviousOriginalWhitespaceEndOffset <= OriginalWhitespaceStartOffset' failed.

This remind there was another "reason" for limiting this to DryRun (not sure it is a good or bad reason): it happens only in the optimizer, all other cases where the indenter is used are not affected.

I am still trying to get to the bottom of this assertion, any hint where to look for?

I am still trying to get to the bottom of this assertion, any hint where to look for?

OK, got it. The issue is that we will actually need to run the wrapping 3 times when DryRun = false : call reflowProtrudingToken() twice with DryRun=true to find out the better option, then call it once more with DryRun=false.

Remove Reflow from LineState, and perform the decision again during reconstruction phase.

Harbormaster completed remote builds in B10411: Diff 115830.Sep 19 2017, 6:06 AM

Typz marked 5 inline comments as done.Sep 19 2017, 6:08 AM

I find the current semantics of the functions a bit surprising, specifically:
... reflowProtrudingToken(..., bool Reflow)
is really confusing me :)

I'd have expected something like this where we currently call breakProtrudingToken():

if (CanBreak) {
  ReflowPenalty = breakProtrudingToken(Current, State, /*DryRun=*/true);
  FixedPenalty = getExcessPenalty(Current, State);
  Penalty = ReflowPenalty < FixedPenalty ? ReflowPenalty : FixedPenalty;
  if (!DryRun && ReflowPenalty < FixedPenalty
      breakProtrudingToken(Current, State, /*DryRun=*/false);
}

I haven't looked how we calculate the penalty currently if we can't break, as if we don't, that also seems ... wrong.
getExcessPenalty would be a new function that just focuses on getting the excess penalty for a breakable token.

This cannot be implemented where we currently call breakProtrudingToken(), since this function starts by 'creating' the BreakableToken and dealing with meany corner cases: so this should be done before in any case.
But the code at the end of breakProtrudingToken() is actually very similar to what you propose.

I can refactor the code to have two functions reflowProtrudingToken() and getExcessPenalty(), though that will add some significant redundancy: the problem is that even if we don't actually reflow, we still need (I think?) to handle whitespace replacement and update the state (to set Column, Stack.back().LastSpace, and Stack[i].BreakBeforeParameter, and call updateNextToken() ...). And therefore getExcessPenalty() should probably not just compute the penalty, it should also update the state and handle whitespace replacement...

In D33589#876196, @Typz wrote:

This cannot be implemented where we currently call breakProtrudingToken(), since this function starts by 'creating' the BreakableToken and dealing with meany corner cases: so this should be done before in any case.
But the code at the end of breakProtrudingToken() is actually very similar to what you propose.

I can refactor the code to have two functions reflowProtrudingToken() and getExcessPenalty(), though that will add some significant redundancy: the problem is that even if we don't actually reflow, we still need (I think?) to handle whitespace replacement and update the state (to set Column, Stack.back().LastSpace, and Stack[i].BreakBeforeParameter, and call updateNextToken() ...). And therefore getExcessPenalty() should probably not just compute the penalty, it should also update the state and handle whitespace replacement...

My question is: if CanBreak is false, we currently don't call breakProtrudingToken. So either we do something very wrong in that case (which might be true, but I'd like to understand why) or we should be able to just calculate the penalty by not breaking anything and go on.

My question is: if CanBreak is false, we currently don't call breakProtrudingToken. So either we do something very wrong in that case (which might be true, but I'd like to understand why) or we should be able to just calculate the penalty by not breaking anything and go on.

CanBreak is currently very short, it only verifies some very broad conditions. I initiallly tried patching at this level, but it really does not work.

Most conditions are actually tested at the beginning of breakProtrudingToken. That part of the code actually takes different branches for different kind of tokens (strings, block commands, line comments...), and handles many different cases for each. This goal is to create the actual BreakableToken object, but it has a few other side effects: it returns immediately in various corner cases (javascript, preprocessor, formatting macros....), and it tweaks some variables, e.g. ColumnLimit.

In addition, replacement is non trivial even in case we choose not to reflow: it is required in order to properly manage whitespaces. For this reason we really must pass through the loop in most cases.

ping ?

klimek mentioned this in D39900: Refactor ContinuationIndenter's breakProtrudingToken logic into slightly more orthogonal pieces..Nov 10 2017, 6:56 AM

klimek mentioned this in rL318141: Refactor ContinuationIndenter's breakProtrudingToken logic..Nov 14 2017, 1:20 AM

In D33589#920160, @Typz wrote:

ping ?

I'm working on understanding this better :) I've refactored the code a bit so I could fully understand the problem, which I now do (sorry for this taking a while, but it took me multiple hours to work through this). I'm now convinced that you're right that we need the reflow/breaking logic to make the decision about the trade-offs, as we need the penalty for the potentially reflown code, as opposed to the penalty for the code if we do nothing.
Given that, my current gut feeling is that we'll want to go a bit further in that direction and make the change more local in breakProtrudingToken. I'll give that a spin and report back with what I find.

One interesting trade-off I'm running into:
My gut feeling is that we really want to make local decisions about whether we want to break/reflow - this makes the code significantly simpler (IMO), and handles all tests in this patch correctly, but is fundamentally limiting the global optimizations we can do. Specifically, we would not correctly reflow this:

//       |< limit
// foo bar
// baz
// x

// foo
// bar
// baz x

when the excess character limit is low.

That would be a point for global optimization, but I find it really hard to understand exactly why it's ok to do it. Won't we get things like this wrong:

Limit: 13
// foo  bar baz
// bab      bob

as we'll not compress whitespace?

klimek mentioned this in D40068: Implement more accurate penalty & trade-offs while breaking protruding tokens..Nov 15 2017, 2:15 AM

I think this patch doesn't handle a couple of cases that I'd like to see handled. A counter-proposal with different trade-offs is in D40068.

In D33589#924716, @klimek wrote:
One interesting trade-off I'm running into:
My gut feeling is that we really want to make local decisions about whether we want to break/reflow - this makes the code significantly simpler (IMO), and handles all tests in this patch correctly, but is fundamentally limiting the global optimizations we can do. Specifically, we would not correctly reflow this:
//       |< limit
// foo bar
// baz
// x
to
// foo
// bar
// baz x
when the excess character limit is low.

As I can see with your patch, local decision does not account for accumulated penalty on multi-line comment, and will thus give unexpected (e.g. no change) result when each line overlaps by a few characters, but not enough to trigger a break at this line.

That would be a point for global optimization, but I find it really hard to understand exactly why it's ok to do it. Won't we get things like this wrong:
Limit: 13
// foo  bar baz
// bab      bob
as we'll not compress whitespace?

Indeed, this patch would not trigger whitespace compression when not reflowing; it would compare "not doing anything" (no reflow, no whitespace compression) with the complete reflowing (including whitespace compression). I don't think that would break anything, but indeed we could possibly get even better result by trying to apply whitespace compression in the no-reflow case [which should be simple, just a bit more code at line 1376 in the version of the patch].

In D33589#925903, @klimek wrote:

I think this patch doesn't handle a couple of cases that I'd like to see handled. A counter-proposal with different trade-offs is in D40068.

Can you provide more info on these cases? Are they all added to the tests of D40068?

It may be simpler (though not to my eyes, I am not knowledgeable enough to really understand how you go this fixed...), and works fine for "almost correct" comments: e.g. when there are indeed just a few extra characters overall. But it still procudes strange result when each line of the (long) comment is too long, but not enough to trigger a line-wrap by itself.

Since that version has landed already, not sure how to improve on this. I could probably rewrite my patch on master, but it seems a bit redundant. As a simpler fix, I could imagine adding a "total" overflow counter, to allow detecting the situation; but when this is detected (e.g. on subsequent lines) we would need to "backtrack" and revisit the initial decision...

Btw, another issue I am having is that reflowing does not respect the alignment. For exemple:

enum {
   Foo,    ///< This is a very long comment
   Bar,    ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;

will be reflown to :

enum {
   Foo, ///< This is a very long
        ///< comment
   Bar, ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;

when I would expect:

enum {
   Foo,    ///< This is a very
           ///< long comment
   Bar,    ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;

I see there is no penalty for breaking alignment, which may be accounted for when compressing the whitespace 'before' the comment... Or maybe simply the breaking should be smarter, to try to keep the alignment; but I am not sure it can be done without another kind of 'global' optimization of the comment reflow... Any idea/hint?

In D33589#931723, @Typz wrote:

In D33589#925903, @klimek wrote:

I think this patch doesn't handle a couple of cases that I'd like to see handled. A counter-proposal with different trade-offs is in D40068.

Can you provide more info on these cases? Are they all added to the tests of D40068?

It may be simpler (though not to my eyes, I am not knowledgeable enough to really understand how you go this fixed...), and works fine for "almost correct" comments: e.g. when there are indeed just a few extra characters overall. But it still procudes strange result when each line of the (long) comment is too long, but not enough to trigger a line-wrap by itself.

Since that version has landed already, not sure how to improve on this. I could probably rewrite my patch on master, but it seems a bit redundant. As a simpler fix, I could imagine adding a "total" overflow counter, to allow detecting the situation; but when this is detected (e.g. on subsequent lines) we would need to "backtrack" and revisit the initial decision...

Yes, I added all tests I was thinking of to the patch. Note that we can always go back on submitted patches / do things differently. As my patch fulfilled all your tests, I (perhaps incorrectly?) assumed it solved your use cases - can you give me a test case of what you would want to happen that doesn't happen with my patch?

Are you, for example, saying that in the case

Limit; 13
// foo bar baz foo bar baz foo bar baz
you'd not want
// foo bar baz
// foo bar baz
// foo bar baz
If the overall penalty of the protruding tokens is - what? More than 1 break penalty? If you care about more than 3x break penalty (which I would think is correct), the local algorithm will work, as local decisions will make sure the overall penalty cannot be exceeded.

In D33589#931802, @Typz wrote:
Btw, another issue I am having is that reflowing does not respect the alignment. For exemple:
enum {
   Foo,    ///< This is a very long comment
   Bar,    ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;
will be reflown to :
enum {
   Foo, ///< This is a very long
        ///< comment
   Bar, ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;
when I would expect:
enum {
   Foo,    ///< This is a very
           ///< long comment
   Bar,    ///< This is shorter  
   BarBar, ///< This is shorter
} Stuff;
I see there is no penalty for breaking alignment, which may be accounted for when compressing the whitespace 'before' the comment... Or maybe simply the breaking should be smarter, to try to keep the alignment; but I am not sure it can be done without another kind of 'global' optimization of the comment reflow... Any idea/hint?

I think that's just a bug in comment alignment, which is done at a different stage.

In D33589#933568, @klimek wrote:

In D33589#931723, @Typz wrote:

In D33589#925903, @klimek wrote:

I think this patch doesn't handle a couple of cases that I'd like to see handled. A counter-proposal with different trade-offs is in D40068.

Can you provide more info on these cases? Are they all added to the tests of D40068?

It may be simpler (though not to my eyes, I am not knowledgeable enough to really understand how you go this fixed...), and works fine for "almost correct" comments: e.g. when there are indeed just a few extra characters overall. But it still procudes strange result when each line of the (long) comment is too long, but not enough to trigger a line-wrap by itself.

Since that version has landed already, not sure how to improve on this. I could probably rewrite my patch on master, but it seems a bit redundant. As a simpler fix, I could imagine adding a "total" overflow counter, to allow detecting the situation; but when this is detected (e.g. on subsequent lines) we would need to "backtrack" and revisit the initial decision...

Yes, I added all tests I was thinking of to the patch. Note that we can always go back on submitted patches / do things differently. As my patch fulfilled all your tests, I (perhaps incorrectly?) assumed it solved your use cases - can you give me a test case of what you would want to happen that doesn't happen with my patch?

Are you, for example, saying that in the case
Limit; 13
// foo bar baz foo bar baz foo bar baz
you'd not want
// foo bar baz
// foo bar baz
// foo bar baz
If the overall penalty of the protruding tokens is - what? More than 1 break penalty? If you care about more than 3x break penalty (which I would think is correct), the local algorithm will work, as local decisions will make sure the overall penalty cannot be exceeded.

Ok, sorry, after reading the comment on the other patch I get it :)
In the above example, we add 3 line breaks, and we'd add 1 (or more) additional line breaks when reflowing below the column limit.
I agree that that can lead to different overall outcomes, but I don't see how the approach of this patch really fixes it - it will only ever reflow below the column limit, so it'll also lead to states for long lines where reflowing and leaving chars over the line limit might be the overall best choice (unless I'm missing something)?

@klimek wrote:
In the above example, we add 3 line breaks, and we'd add 1 (or more) additional line breaks when reflowing below the column limit.
I agree that that can lead to different overall outcomes, but I don't see how the approach of this patch really fixes it - it will only ever reflow below the column limit, so it'll also lead to states for long lines where reflowing and leaving chars over the line limit might be the overall best choice (unless I'm missing something)?

Definitely, this patch may not find the 'optimal' solution. What I mean is that we reduce the PenaltyExcessCharacter value to allow "occasionally" breaking the limit: instead of a hard limit, we want to allow lines to sometimes break the limit, but definitely not *all* the time. Both patch work fine when the code is "correct", i.e. there is indeed only a few lines which break the limit.

But the local decision approach behaves really wrong IMHO when the code is formatted beyond the column: it can very easily reformat in such a way that the comment is reflown to what looks like a longer column limit. I currently have a ratio of 10 between PenaltyBreakComment and PenaltyExcessCharacter (which empirically seemed to give a decent compromise, and match how our code is formatted; I need to try more with your patch, to see if we can get better values...): with this setting, a "non wrapped" comment will actually be reflown to ColumnLimit+10...

When we do indeed reflow, I think we may be stricter than this, to get something that really looks like it obeys the column limit. If this is 'optimal' in the sense that we may have some overflow still, that is fine, but really not the primary requirement IMHO.

In D33589#933746, @Typz wrote:

@klimek wrote:
In the above example, we add 3 line breaks, and we'd add 1 (or more) additional line breaks when reflowing below the column limit.
I agree that that can lead to different overall outcomes, but I don't see how the approach of this patch really fixes it - it will only ever reflow below the column limit, so it'll also lead to states for long lines where reflowing and leaving chars over the line limit might be the overall best choice (unless I'm missing something)?

Definitely, this patch may not find the 'optimal' solution. What I mean is that we reduce the PenaltyExcessCharacter value to allow "occasionally" breaking the limit: instead of a hard limit, we want to allow lines to sometimes break the limit, but definitely not *all* the time. Both patch work fine when the code is "correct", i.e. there is indeed only a few lines which break the limit.

But the local decision approach behaves really wrong IMHO when the code is formatted beyond the column: it can very easily reformat in such a way that the comment is reflown to what looks like a longer column limit. I currently have a ratio of 10 between PenaltyBreakComment and PenaltyExcessCharacter (which empirically seemed to give a decent compromise, and match how our code is formatted; I need to try more with your patch, to see if we can get better values...): with this setting, a "non wrapped" comment will actually be reflown to ColumnLimit+10...

To me the reverse intuitively makes sense:
When I see that
first line
second line
third line
stays the same, but
first line second line third line
gets reflown into something different, I get confused :)

The only way I see to consistently solve this is to optimize the reflow breaks like we do the main algorithm.

When we do indeed reflow, I think we may be stricter than this, to get something that really looks like it obeys the column limit. If this is 'optimal' in the sense that we may have some overflow still, that is fine, but really not the primary requirement IMHO.

In D33589#933746, @Typz wrote:

with this setting, a "non wrapped" comment will actually be reflown to ColumnLimit+10...

Isn't the same true for code then, though? Generally, code will protrude by 10 columns before being broken?

I think the difference between code and comments is that code "words" are easily 10 characters or more, whereas actual words (in comments) are very often less than 10 characters: so code overflowing by 10 characters is not very frequent. whereas small words in comment will often get closer to the "extra" limit.

That said, I tried with your latest change ("Restructure how we break tokens", sha1:64d42a2fb85ece5987111ffb908c6bc7f7431dd4). and it's working about fine now. For the most part it seems to wrap when I would expect it, great work!
I have seen 2 "issues" though:

Often I see that the last word before reflowing is not wrapped (eventhough it overlaps the line length); I did not count penalties so I cannot confirm this is really an issue or just a borderline scenario.
Alignment seems better than before, but since there is no penalty for breaking alignment it will always try to unindent to compensate for overflowing characters...

Seeing this, I guess this patch does not make much sense anymore, I'll see if I make some improvements for these two issues, in separate patches.

Typz abandoned this revision.Dec 1 2017, 5:50 AM

In D33589#941979, @Typz wrote:

I think the difference between code and comments is that code "words" are easily 10 characters or more, whereas actual words (in comments) are very often less than 10 characters: so code overflowing by 10 characters is not very frequent. whereas small words in comment will often get closer to the "extra" limit.

That said, I tried with your latest change ("Restructure how we break tokens", sha1:64d42a2fb85ece5987111ffb908c6bc7f7431dd4). and it's working about fine now. For the most part it seems to wrap when I would expect it, great work!
I have seen 2 "issues" though:

Often I see that the last word before reflowing is not wrapped (eventhough it overlaps the line length); I did not count penalties so I cannot confirm this is really an issue or just a borderline scenario.

Alignment seems better than before, but since there is no penalty for breaking alignment it will always try to unindent to compensate for overflowing characters...

Seeing this, I guess this patch does not make much sense anymore, I'll see if I make some improvements for these two issues, in separate patches.

Note that I just 10 mins ago landed another change (r319541) that should fix the main issue you raised, and which looks a lot like I wanted this change to look back when I first saw it (but of course all the underlying code made that impossible). Please give it a try and let me know what you think.

Indeed, looks good, thanks!

Though that exacerbates the alignment issue, I now get things like this:

enum {
  SomeLongerEnum, // comment
  SomeThing,      // comment
  Foo, // something
} 
                            ^ (column limit)

The comment on 'Foo' would overflow a bit, but it gets unindented during "alingment" stage, which kind of 'breaks' the decision that was made earlier on *not* to break the comment...

In D33589#942128, @Typz wrote:
Indeed, looks good, thanks!

Though that exacerbates the alignment issue, I now get things like this:
enum {
  SomeLongerEnum, // comment
  SomeThing,      // comment
  Foo, // something
} 
                            ^ (column limit)
The comment on 'Foo' would overflow a bit, but it gets unindented during "alingment" stage, which kind of 'breaks' the decision that was made earlier on *not* to break the comment...

Ok, that seems like a different bug that we can fix in alignment, though :)

Revision Contents

Path

Size

lib/

Format/

ContinuationIndenter.h

10 lines

ContinuationIndenter.cpp

29 lines

FormatToken.h

4 lines

FormatToken.cpp

5 lines

UnwrappedLineFormatter.cpp

35 lines

unittests/

Format/

FormatTest.cpp

29 lines

Diff 100379

lib/Format/ContinuationIndenter.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
// FIXME: canBreak and mustBreak aren't strictly indentation-related. Find a		// FIXME: canBreak and mustBreak aren't strictly indentation-related. Find a
// better home.		// better home.
/// \brief Returns \c true, if a line break after \p State is allowed.		/// \brief Returns \c true, if a line break after \p State is allowed.
bool canBreak(const LineState &State);		bool canBreak(const LineState &State);

/// \brief Returns \c true, if a line break after \p State is mandatory.		/// \brief Returns \c true, if a line break after \p State is mandatory.
bool mustBreak(const LineState &State);		bool mustBreak(const LineState &State);

		/// \brief Returns \c true, if the first token after \p State can be split.
		bool canSplit(LineState &State);

/// \brief Appends the next token to \p State and updates information		/// \brief Appends the next token to \p State and updates information
/// necessary for indentation.		/// necessary for indentation.
///		///
/// Puts the token on the current line if \p Newline is \c false and adds a		/// Puts the token on the current line if \p Newline is \c false and adds a
/// line break and necessary indentation otherwise.		/// line break and necessary indentation otherwise.
///		///
/// If \p DryRun is \c false, also creates and stores the required		/// If \p DryRun is \c false, also creates and stores the required
/// \c Replacement.		/// \c Replacement.
unsigned addTokenToState(LineState &State, bool Newline, bool DryRun,		unsigned addTokenToState(LineState &State, bool Newline, bool BreakToken, bool DryRun,
unsigned ExtraSpaces = 0);		unsigned ExtraSpaces = 0);

/// \brief Get the column limit for this line. This is the style's column		/// \brief Get the column limit for this line. This is the style's column
/// limit, potentially reduced for preprocessor definitions.		/// limit, potentially reduced for preprocessor definitions.
unsigned getColumnLimit(const LineState &State) const;		unsigned getColumnLimit(const LineState &State) const;

private:		private:
/// \brief Mark the next token as consumed in \p State and modify its stacks		/// \brief Mark the next token as consumed in \p State and modify its stacks
/// accordingly.		/// accordingly.
unsigned moveStateToNextToken(LineState &State, bool DryRun, bool Newline);		unsigned moveStateToNextToken(LineState &State, bool DryRun, bool Newline,
		bool BreakToken);

/// \brief Update 'State' according to the next token's fake left parentheses.		/// \brief Update 'State' according to the next token's fake left parentheses.
void moveStatePastFakeLParens(LineState &State, bool Newline);		void moveStatePastFakeLParens(LineState &State, bool Newline);
/// \brief Update 'State' according to the next token's fake r_parens.		/// \brief Update 'State' according to the next token's fake r_parens.
void moveStatePastFakeRParens(LineState &State);		void moveStatePastFakeRParens(LineState &State);

/// \brief Update 'State' according to the next token being one of "(<{[".		/// \brief Update 'State' according to the next token being one of "(<{[".
void moveStatePastScopeOpener(LineState &State, bool Newline);		void moveStatePastScopeOpener(LineState &State, bool Newline);
/// \brief Update 'State' according to the next token being one of ")>}]".		/// \brief Update 'State' according to the next token being one of ")>}]".
void moveStatePastScopeCloser(LineState &State);		void moveStatePastScopeCloser(LineState &State);
/// \brief Update 'State' with the next token opening a nested block.		/// \brief Update 'State' with the next token opening a nested block.
void moveStateToNewBlock(LineState &State);		void moveStateToNewBlock(LineState &State);

/// \brief If the current token sticks out over the end of the line, break		/// \brief If the current token sticks out over the end of the line, break
/// it if possible.		/// it if possible.
///		///
/// \returns An extra penalty if a token was broken, otherwise 0.		/// \returns An extra penalty if a token was broken, otherwise 0.
///		///
/// The returned penalty will cover the cost of the additional line breaks and		/// The returned penalty will cover the cost of the additional line breaks and
/// column limit violation in all lines except for the last one. The penalty		/// column limit violation in all lines except for the last one. The penalty
/// for the column limit violation in the last line (and in single line		/// for the column limit violation in the last line (and in single line
/// tokens) is handled in \c addNextStateToQueue.		/// tokens) is handled in \c addNextStateToQueue.
unsigned breakProtrudingToken(const FormatToken &Current, LineState &State,		unsigned breakProtrudingToken(const FormatToken &Current, LineState &State, bool BreakToken,
bool DryRun);		bool DryRun);

/// \brief Appends the next token to \p State and updates information		/// \brief Appends the next token to \p State and updates information
/// necessary for indentation.		/// necessary for indentation.
///		///
/// Puts the token on the current line.		/// Puts the token on the current line.
///		///
/// If \p DryRun is \c false, also creates and stores the required		/// If \p DryRun is \c false, also creates and stores the required
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

lib/Format/ContinuationIndenter.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	State.Stack.push_back(ParenState(FirstIndent, FirstIndent,
/NoLineBreak=/false));		/NoLineBreak=/false));
State.LineContainsContinuedForLoopSection = false;		State.LineContainsContinuedForLoopSection = false;
State.StartOfStringLiteral = 0;		State.StartOfStringLiteral = 0;
State.StartOfLineLevel = 0;		State.StartOfLineLevel = 0;
State.LowestLevelOnLine = 0;		State.LowestLevelOnLine = 0;
State.IgnoreStackForComparison = false;		State.IgnoreStackForComparison = false;

// The first token has already been indented and thus consumed.		// The first token has already been indented and thus consumed.
moveStateToNextToken(State, DryRun, /Newline=/false);		moveStateToNextToken(State, DryRun, /Newline=/false, /BreakToken/true);
return State;		return State;
}		}

bool ContinuationIndenter::canBreak(const LineState &State) {		bool ContinuationIndenter::canBreak(const LineState &State) {
const FormatToken &Current = *State.NextToken;		const FormatToken &Current = *State.NextToken;
const FormatToken &Previous = *Current.Previous;		const FormatToken &Previous = *Current.Previous;
assert(&Previous == Current.Previous);		assert(&Previous == Current.Previous);
if (!Current.CanBreakBefore &&		if (!Current.CanBreakBefore &&
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	if (Current.is(tok::lessless) &&
((Previous.is(tok::identifier) && Previous.TokenText == "endl") \|\|		((Previous.is(tok::identifier) && Previous.TokenText == "endl") \|\|
(Previous.Tok.isLiteral() && (Previous.TokenText.endswith("\\n\"") \|\|		(Previous.Tok.isLiteral() && (Previous.TokenText.endswith("\\n\"") \|\|
Previous.TokenText == "\'\\n\'"))))		Previous.TokenText == "\'\\n\'"))))
return true;		return true;

return false;		return false;
}		}

		bool ContinuationIndenter::canSplit(LineState & State)
		{
		return !State.Stack.back().NoLineBreak &&
		!State.Stack.back().NoLineBreakInOperand;
		}

unsigned ContinuationIndenter::addTokenToState(LineState &State, bool Newline,		unsigned ContinuationIndenter::addTokenToState(LineState &State, bool Newline,
bool DryRun,		bool BreakToken, bool DryRun,
unsigned ExtraSpaces) {		unsigned ExtraSpaces) {
const FormatToken &Current = *State.NextToken;		const FormatToken &Current = *State.NextToken;

assert(!State.Stack.empty());		assert(!State.Stack.empty());
if ((Current.is(TT_ImplicitStringLiteral) &&		if ((Current.is(TT_ImplicitStringLiteral) &&
(Current.Previous->Tok.getIdentifierInfo() == nullptr \|\|		(Current.Previous->Tok.getIdentifierInfo() == nullptr \|\|
Current.Previous->Tok.getIdentifierInfo()->getPPKeywordID() ==		Current.Previous->Tok.getIdentifierInfo()->getPPKeywordID() ==
tok::pp_not_keyword))) {		tok::pp_not_keyword))) {
unsigned EndColumn =		unsigned EndColumn =
SourceMgr.getSpellingColumnNumber(Current.WhitespaceRange.getEnd());		SourceMgr.getSpellingColumnNumber(Current.WhitespaceRange.getEnd());
if (Current.LastNewlineOffset != 0) {		if (Current.LastNewlineOffset != 0) {
// If there is a newline within this token, the final column will solely		// If there is a newline within this token, the final column will solely
// determined by the current end column.		// determined by the current end column.
State.Column = EndColumn;		State.Column = EndColumn;
} else {		} else {
unsigned StartColumn =		unsigned StartColumn =
SourceMgr.getSpellingColumnNumber(Current.WhitespaceRange.getBegin());		SourceMgr.getSpellingColumnNumber(Current.WhitespaceRange.getBegin());
assert(EndColumn >= StartColumn);		assert(EndColumn >= StartColumn);
State.Column += EndColumn - StartColumn;		State.Column += EndColumn - StartColumn;
}		}
moveStateToNextToken(State, DryRun, /Newline=/false);		moveStateToNextToken(State, DryRun, /Newline=/false, /BreakToken=/true);
return 0;		return 0;
}		}

unsigned Penalty = 0;		unsigned Penalty = 0;
if (Newline)		if (Newline)
Penalty = addTokenOnNewLine(State, DryRun);		Penalty = addTokenOnNewLine(State, DryRun);
else		else
addTokenOnCurrentLine(State, DryRun, ExtraSpaces);		addTokenOnCurrentLine(State, DryRun, ExtraSpaces);

return moveStateToNextToken(State, DryRun, Newline) + Penalty;		return moveStateToNextToken(State, DryRun, Newline, BreakToken) + Penalty;
}		}

void ContinuationIndenter::addTokenOnCurrentLine(LineState &State, bool DryRun,		void ContinuationIndenter::addTokenOnCurrentLine(LineState &State, bool DryRun,
unsigned ExtraSpaces) {		unsigned ExtraSpaces) {
FormatToken &Current = *State.NextToken;		FormatToken &Current = *State.NextToken;
const FormatToken &Previous = *State.NextToken->Previous;		const FormatToken &Previous = *State.NextToken->Previous;
if (Current.is(tok::equal) &&		if (Current.is(tok::equal) &&
(State.Line->First->is(tok::kw_for) \|\| Current.NestingLevel == 0) &&		(State.Line->First->is(tok::kw_for) \|\| Current.NestingLevel == 0) &&
▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	if (State.Stack.back().Indent == State.FirstIndent && PreviousNonComment &&
PreviousNonComment->isNot(tok::r_brace))		PreviousNonComment->isNot(tok::r_brace))
// Ensure that we fall back to the continuation indent width instead of		// Ensure that we fall back to the continuation indent width instead of
// just flushing continuations left.		// just flushing continuations left.
return State.Stack.back().Indent + Style.ContinuationIndentWidth;		return State.Stack.back().Indent + Style.ContinuationIndentWidth;
return State.Stack.back().Indent;		return State.Stack.back().Indent;
}		}

unsigned ContinuationIndenter::moveStateToNextToken(LineState &State,		unsigned ContinuationIndenter::moveStateToNextToken(LineState &State,
bool DryRun, bool Newline) {		bool DryRun, bool Newline,
		bool BreakToken) {
assert(State.Stack.size());		assert(State.Stack.size());
const FormatToken &Current = *State.NextToken;		const FormatToken &Current = *State.NextToken;

if (Current.isOneOf(tok::comma, TT_BinaryOperator))		if (Current.isOneOf(tok::comma, TT_BinaryOperator))
State.Stack.back().NoLineBreakInOperand = false;		State.Stack.back().NoLineBreakInOperand = false;
if (Current.is(TT_InheritanceColon))		if (Current.is(TT_InheritanceColon))
State.Stack.back().AvoidBinPacking = true;		State.Stack.back().AvoidBinPacking = true;
if (Current.is(tok::lessless) && Current.isNot(TT_OverloadedOperator)) {		if (Current.is(tok::lessless) && Current.isNot(TT_OverloadedOperator)) {
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	unsigned ContinuationIndenter::moveStateToNextToken(LineState &State,
else if (!Current.isOneOf(tok::comment, tok::identifier, tok::hash) &&		else if (!Current.isOneOf(tok::comment, tok::identifier, tok::hash) &&
!Current.isStringLiteral())		!Current.isStringLiteral())
State.StartOfStringLiteral = 0;		State.StartOfStringLiteral = 0;

State.Column += Current.ColumnWidth;		State.Column += Current.ColumnWidth;
State.NextToken = State.NextToken->Next;		State.NextToken = State.NextToken->Next;
unsigned Penalty = 0;		unsigned Penalty = 0;
if (CanBreakProtrudingToken)		if (CanBreakProtrudingToken)
Penalty = breakProtrudingToken(Current, State, DryRun);		Penalty = breakProtrudingToken(Current, State, BreakToken, DryRun);
if (State.Column > getColumnLimit(State)) {		if (State.Column > getColumnLimit(State)) {
unsigned ExcessCharacters = State.Column - getColumnLimit(State);		unsigned ExcessCharacters = State.Column - getColumnLimit(State);
Penalty += Style.PenaltyExcessCharacter * ExcessCharacters;		Penalty += Style.PenaltyExcessCharacter * ExcessCharacters;
}		}

if (Current.Role)		if (Current.Role)
Current.Role->formatFromToken(State, this, DryRun);		Current.Role->formatFromToken(State, this, DryRun);
// If the previous has a special role, let it consume tokens as appropriate.		// If the previous has a special role, let it consume tokens as appropriate.
// It is necessary to start at the previous token for the only implemented		// It is necessary to start at the previous token for the only implemented
// role (comma separated list). That way, the decision whether or not to break		// role (comma separated list). That way, the decision whether or not to break
// after the "{" is already done and both options are tried and evaluated.		// after the "{" is already done and both options are tried and evaluated.
// FIXME: This is ugly, find a better way.		// FIXME: This is ugly, find a better way.
if (Previous && Previous->Role)		if (Previous && Previous->Role)
Penalty += Previous->Role->formatAfterToken(State, this, DryRun);		Penalty += Previous->Role->formatAfterToken(State, this, BreakToken, DryRun);

return Penalty;		return Penalty;
}		}

void ContinuationIndenter::moveStatePastFakeLParens(LineState &State,		void ContinuationIndenter::moveStatePastFakeLParens(LineState &State,
bool Newline) {		bool Newline) {
const FormatToken &Current = *State.NextToken;		const FormatToken &Current = *State.NextToken;
const FormatToken *Previous = Current.getPreviousNonComment();		const FormatToken *Previous = Current.getPreviousNonComment();
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	unsigned ContinuationIndenter::addMultilineToken(const FormatToken &Current,

if (ColumnsUsed > getColumnLimit(State))		if (ColumnsUsed > getColumnLimit(State))
return Style.PenaltyExcessCharacter * (ColumnsUsed - getColumnLimit(State));		return Style.PenaltyExcessCharacter * (ColumnsUsed - getColumnLimit(State));
return 0;		return 0;
}		}

unsigned ContinuationIndenter::breakProtrudingToken(const FormatToken &Current,		unsigned ContinuationIndenter::breakProtrudingToken(const FormatToken &Current,
LineState &State,		LineState &State,
bool DryRun) {		bool BreakToken, bool DryRun) {
// Don't break multi-line tokens other than block comments. Instead, just		// Don't break multi-line tokens other than block comments. Instead, just
// update the state.		// update the state.
if (Current.isNot(TT_BlockComment) && Current.IsMultiline)		if (Current.isNot(TT_BlockComment) && Current.IsMultiline)
return addMultilineToken(Current, State);		return addMultilineToken(Current, State);

// Don't break implicit string literals or import statements.		// Don't break implicit string literals or import statements.
if (Current.is(TT_ImplicitStringLiteral) \|\|		if (Current.is(TT_ImplicitStringLiteral) \|\|
State.Line->Type == LT_ImportStatement)		State.Line->Type == LT_ImportStatement)
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	while (RemainingTokenColumns > RemainingSpace) {
if (RemainingTokenColumnsAfterCompression <= RemainingSpace) {		if (RemainingTokenColumnsAfterCompression <= RemainingSpace) {
RemainingTokenColumns = RemainingTokenColumnsAfterCompression;		RemainingTokenColumns = RemainingTokenColumnsAfterCompression;
ReflowInProgress = true;		ReflowInProgress = true;
if (!DryRun)		if (!DryRun)
Token->compressWhitespace(LineIndex, TailOffset, Split, Whitespaces);		Token->compressWhitespace(LineIndex, TailOffset, Split, Whitespaces);
break;		break;
}		}

		if (!BreakToken) {
		Penalty += Style.PenaltyExcessCharacter *
		(RemainingTokenColumns - RemainingSpace);
		break;
		}

unsigned NewRemainingTokenColumns = Token->getLineLengthAfterSplit(		unsigned NewRemainingTokenColumns = Token->getLineLengthAfterSplit(
LineIndex, TailOffset + Split.first + Split.second, StringRef::npos);		LineIndex, TailOffset + Split.first + Split.second, StringRef::npos);

// When breaking before a tab character, it may be moved by a few columns,		// When breaking before a tab character, it may be moved by a few columns,
// but will still be expanded to the next tab stop, so we don't save any		// but will still be expanded to the next tab stop, so we don't save any
// columns.		// columns.
if (NewRemainingTokenColumns == RemainingTokenColumns)		if (NewRemainingTokenColumns == RemainingTokenColumns)
break;		break;
Show All 31 Lines	if (BreakInserted) {
State.Stack.back().LastSpace = StartColumn;		State.Stack.back().LastSpace = StartColumn;
}		}

Token->updateNextToken(State);		Token->updateNextToken(State);

return Penalty;		return Penalty;
}		}

unsigned ContinuationIndenter::getColumnLimit(const LineState &State) const {		unsigned ContinuationIndenter::getColumnLimit(const LineState &State) const {
		djasperUnsubmitted Done Reply Inline Actions Can you create a patch that doesn't move the code around so much? Seems unnecessary and hard to review. djasper: Can you create a patch that doesn't move the code around so much? Seems unnecessary and hard to…
		TypzAuthorUnsubmitted Done Reply Inline Actions Moving the code around is an unfortunate requirement for this patch: we must actually do the "reflowing" twice, to find out which solution (actually reflowing or not) gives the least penalty. Therefore the function that reflowing must be moved out of `breakProtrudingToken()`... All I can do is try moving the new function after `breakProtrudingToken()`, maybe Phabricator will better show the change. Typz: Moving the code around is an unfortunate requirement for this patch: we must actually do the…
// In preprocessor directives reserve two chars for trailing " \"		// In preprocessor directives reserve two chars for trailing " \"
return Style.ColumnLimit - (State.Line->InPPDirective ? 2 : 0);		return Style.ColumnLimit - (State.Line->InPPDirective ? 2 : 0);
}		}

bool ContinuationIndenter::nextIsMultilineString(const LineState &State) {		bool ContinuationIndenter::nextIsMultilineString(const LineState &State) {
const FormatToken &Current = *State.NextToken;		const FormatToken &Current = *State.NextToken;
if (!Current.isStringLiteral() \|\| Current.is(TT_ImplicitStringLiteral))		if (!Current.isStringLiteral() \|\| Current.is(TT_ImplicitStringLiteral))
return false;		return false;
Show All 10 Lines	bool ContinuationIndenter::nextIsMultilineString(const LineState &State) {
if (Style.ColumnLimit != 0 &&		if (Style.ColumnLimit != 0 &&
State.Column + Current.ColumnWidth + Current.UnbreakableTailLength >		State.Column + Current.ColumnWidth + Current.UnbreakableTailLength >
Style.ColumnLimit)		Style.ColumnLimit)
return true; // String will be split.		return true; // String will be split.
return false;		return false;
}		}

} // namespace format		} // namespace format
} // namespace clang		} // namespace clang
		djasperUnsubmitted Done Reply Inline Actions I believe that this is incorrect. reflowProtrudingToken counts the length of the unbreakable tail and here you just remove the penalty of the token itself. E.g. in: string s = f("aaa"); the ");" is the unbreakable tail of the stringl djasper: I believe that this is incorrect. reflowProtrudingToken counts the length of the unbreakable…
		TypzAuthorUnsubmitted Done Reply Inline Actions This behavior has not changed : before the commit, the last token was not included in the penalty [c.f. `if` at line 1338 in original code]. To make the comparison significative, the last token's penalty is included in the penalty returned by `reflowProtrudingToken()` (hence the removal of the test at line 1260); and it is removed before returning from this function, to keep the same behavior as before. Typz: This behavior has not changed : before the commit, the last token was not included in the…

lib/Format/FormatToken.h

Show First 20 Lines • Show All 528 Lines • ▼ Show 20 Lines	virtual unsigned formatFromToken(LineState &State,
bool DryRun) {		bool DryRun) {
return 0;		return 0;
}		}

/// \brief Same as \c formatFromToken, but assumes that the first token has		/// \brief Same as \c formatFromToken, but assumes that the first token has
/// already been set thereby deciding on the first line break.		/// already been set thereby deciding on the first line break.
virtual unsigned formatAfterToken(LineState &State,		virtual unsigned formatAfterToken(LineState &State,
ContinuationIndenter *Indenter,		ContinuationIndenter *Indenter,
bool DryRun) {		bool BreakToken, bool DryRun) {
return 0;		return 0;
}		}

/// \brief Notifies the \c Role that a comma was found.		/// \brief Notifies the \c Role that a comma was found.
virtual void CommaFound(const FormatToken *Token) {}		virtual void CommaFound(const FormatToken *Token) {}

protected:		protected:
const FormatStyle &Style;		const FormatStyle &Style;
};		};

class CommaSeparatedList : public TokenRole {		class CommaSeparatedList : public TokenRole {
public:		public:
CommaSeparatedList(const FormatStyle &Style)		CommaSeparatedList(const FormatStyle &Style)
: TokenRole(Style), HasNestedBracedList(false) {}		: TokenRole(Style), HasNestedBracedList(false) {}

void precomputeFormattingInfos(const FormatToken *Token) override;		void precomputeFormattingInfos(const FormatToken *Token) override;

unsigned formatAfterToken(LineState &State, ContinuationIndenter *Indenter,		unsigned formatAfterToken(LineState &State, ContinuationIndenter *Indenter,
bool DryRun) override;		bool BreakToken, bool DryRun) override;

unsigned formatFromToken(LineState &State, ContinuationIndenter *Indenter,		unsigned formatFromToken(LineState &State, ContinuationIndenter *Indenter,
bool DryRun) override;		bool DryRun) override;

/// \brief Adds \p Token as the next comma to the \c CommaSeparated list.		/// \brief Adds \p Token as the next comma to the \c CommaSeparated list.
void CommaFound(const FormatToken *Token) override {		void CommaFound(const FormatToken *Token) override {
Commas.push_back(Token);		Commas.push_back(Token);
}		}
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

lib/Format/FormatToken.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
}		}

TokenRole::~TokenRole() {}		TokenRole::~TokenRole() {}

void TokenRole::precomputeFormattingInfos(const FormatToken *Token) {}		void TokenRole::precomputeFormattingInfos(const FormatToken *Token) {}

unsigned CommaSeparatedList::formatAfterToken(LineState &State,		unsigned CommaSeparatedList::formatAfterToken(LineState &State,
ContinuationIndenter *Indenter,		ContinuationIndenter *Indenter,
bool DryRun) {		bool BreakToken, bool DryRun) {
if (State.NextToken == nullptr \|\| !State.NextToken->Previous)		if (State.NextToken == nullptr \|\| !State.NextToken->Previous)
return 0;		return 0;

if (Formats.size() == 1)		if (Formats.size() == 1)
return 0; // Handled by formatFromToken		return 0; // Handled by formatFromToken

// Ensure that we start on the opening brace.		// Ensure that we start on the opening brace.
const FormatToken *LBrace =		const FormatToken *LBrace =
Show All 35 Lines	while (State.NextToken != LBrace->MatchingParen) {
}		}

if (Column == Format->Columns \|\| State.NextToken->MustBreakBefore) {		if (Column == Format->Columns \|\| State.NextToken->MustBreakBefore) {
Column = 0;		Column = 0;
NewLine = true;		NewLine = true;
}		}

// Place token using the continuation indenter and store the penalty.		// Place token using the continuation indenter and store the penalty.
Penalty += Indenter->addTokenToState(State, NewLine, DryRun, ExtraSpaces);		Penalty += Indenter->addTokenToState(State, NewLine, BreakToken, DryRun,
		ExtraSpaces);
}		}
return Penalty;		return Penalty;
}		}

unsigned CommaSeparatedList::formatFromToken(LineState &State,		unsigned CommaSeparatedList::formatFromToken(LineState &State,
ContinuationIndenter *Indenter,		ContinuationIndenter *Indenter,
bool DryRun) {		bool DryRun) {
// Formatting with 1 Column isn't really a column layout, so we don't need the		// Formatting with 1 Column isn't really a column layout, so we don't need the
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

lib/Format/UnwrappedLineFormatter.cpp

Show First 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	unsigned formatLine(const AnnotatedLine &Line, unsigned FirstIndent,
LineState State =		LineState State =
Indenter->getInitialState(FirstIndent, &Line, /DryRun=/false);		Indenter->getInitialState(FirstIndent, &Line, /DryRun=/false);
while (State.NextToken) {		while (State.NextToken) {
bool Newline =		bool Newline =
Indenter->mustBreak(State) \|\|		Indenter->mustBreak(State) \|\|
(Indenter->canBreak(State) && State.NextToken->NewlinesBefore > 0);		(Indenter->canBreak(State) && State.NextToken->NewlinesBefore > 0);
unsigned Penalty = 0;		unsigned Penalty = 0;
formatChildren(State, Newline, /DryRun=/false, Penalty);		formatChildren(State, Newline, /DryRun=/false, Penalty);
Indenter->addTokenToState(State, Newline, /DryRun=/false);		Indenter->addTokenToState(State, Newline, /BreakToken=/true,
		/DryRun=/false);
}		}
return 0;		return 0;
}		}
};		};

/// \brief Formatter that puts all tokens into a single line without breaks.		/// \brief Formatter that puts all tokens into a single line without breaks.
class NoLineBreakFormatter : public LineFormatter {		class NoLineBreakFormatter : public LineFormatter {
public:		public:
NoLineBreakFormatter(ContinuationIndenter *Indenter,		NoLineBreakFormatter(ContinuationIndenter *Indenter,
WhitespaceManager *Whitespaces, const FormatStyle &Style,		WhitespaceManager *Whitespaces, const FormatStyle &Style,
UnwrappedLineFormatter *BlockFormatter)		UnwrappedLineFormatter *BlockFormatter)
: LineFormatter(Indenter, Whitespaces, Style, BlockFormatter) {}		: LineFormatter(Indenter, Whitespaces, Style, BlockFormatter) {}

/// \brief Puts all tokens into a single line.		/// \brief Puts all tokens into a single line.
unsigned formatLine(const AnnotatedLine &Line, unsigned FirstIndent,		unsigned formatLine(const AnnotatedLine &Line, unsigned FirstIndent,
bool DryRun) override {		bool DryRun) override {
unsigned Penalty = 0;		unsigned Penalty = 0;
LineState State = Indenter->getInitialState(FirstIndent, &Line, DryRun);		LineState State = Indenter->getInitialState(FirstIndent, &Line, DryRun);
while (State.NextToken) {		while (State.NextToken) {
formatChildren(State, /Newline=/false, DryRun, Penalty);		formatChildren(State, /Newline=/false, DryRun, Penalty);
Indenter->addTokenToState(State, /Newline=/false, DryRun);		Indenter->addTokenToState(State, /Newline=/false, /BreakToken=/true,
		DryRun);
}		}
return Penalty;		return Penalty;
}		}
};		};

/// \brief Finds the best way to break lines.		/// \brief Finds the best way to break lines.
class OptimizingLineFormatter : public LineFormatter {		class OptimizingLineFormatter : public LineFormatter {
public:		public:
Show All 30 Lines	private:
/// In case of equal penalties, we want to prefer states that were inserted		/// In case of equal penalties, we want to prefer states that were inserted
/// first. During state generation we make sure that we insert states first		/// first. During state generation we make sure that we insert states first
/// that break the line as late as possible.		/// that break the line as late as possible.
typedef std::pair<unsigned, unsigned> OrderedPenalty;		typedef std::pair<unsigned, unsigned> OrderedPenalty;

/// \brief An edge in the solution space from \c Previous->State to \c State,		/// \brief An edge in the solution space from \c Previous->State to \c State,
/// inserting a newline dependent on the \c NewLine.		/// inserting a newline dependent on the \c NewLine.
struct StateNode {		struct StateNode {
StateNode(const LineState &State, bool NewLine, StateNode *Previous)		StateNode(const LineState &State, bool NewLine, bool BreakToken, StateNode *Previous)
: State(State), NewLine(NewLine), Previous(Previous) {}		: State(State), NewLine(NewLine), BreakToken(BreakToken), Previous(Previous) {}
LineState State;		LineState State;
bool NewLine;		bool NewLine;
		bool BreakToken;
StateNode *Previous;		StateNode *Previous;
};		};

/// \brief An item in the prioritized BFS search queue. The \c StateNode's		/// \brief An item in the prioritized BFS search queue. The \c StateNode's
/// \c State has the given \c OrderedPenalty.		/// \c State has the given \c OrderedPenalty.
typedef std::pair<OrderedPenalty, StateNode *> QueueItem;		typedef std::pair<OrderedPenalty, StateNode *> QueueItem;

/// \brief The BFS queue type.		/// \brief The BFS queue type.
Show All 13 Lines	unsigned analyzeSolutionSpace(LineState &InitialState, bool DryRun) {

// Increasing count of \c StateNode items we have created. This is used to		// Increasing count of \c StateNode items we have created. This is used to
// create a deterministic order independent of the container.		// create a deterministic order independent of the container.
unsigned Count = 0;		unsigned Count = 0;
QueueType Queue;		QueueType Queue;

// Insert start element into queue.		// Insert start element into queue.
StateNode *Node =		StateNode *Node =
new (Allocator.Allocate()) StateNode(InitialState, false, nullptr);		new (Allocator.Allocate()) StateNode(InitialState, false, true, nullptr);
Queue.push(QueueItem(OrderedPenalty(0, Count), Node));		Queue.push(QueueItem(OrderedPenalty(0, Count), Node));
++Count;		++Count;

unsigned Penalty = 0;		unsigned Penalty = 0;

// While not empty, take first element and follow edges.		// While not empty, take first element and follow edges.
while (!Queue.empty()) {		while (!Queue.empty()) {
Penalty = Queue.top().first.first;		Penalty = Queue.top().first.first;
Show All 9 Lines	while (!Queue.empty()) {
if (Count > 50000)		if (Count > 50000)
Node->State.IgnoreStackForComparison = true;		Node->State.IgnoreStackForComparison = true;

if (!Seen.insert(&Node->State).second)		if (!Seen.insert(&Node->State).second)
// State already examined with lower penalty.		// State already examined with lower penalty.
continue;		continue;

FormatDecision LastFormat = Node->State.NextToken->Decision;		FormatDecision LastFormat = Node->State.NextToken->Decision;
if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Continue)		if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Continue)
		djasperUnsubmitted Not Done Reply Inline Actions This is almost certainly a no-go. It turns the algorithm from exploring a state space with a branching factor of 2 into something with a branching factor of 4. Even assuming we can perfectly reuse results from computations (or in other words hit the same state through different path using Dijkstra's algorithm), the additional bool in StateNode doubles the number of states we are going to visit. You don't even seem to make a distinction of whether the token under investigation can possibly be split. You do look at NoLineBreak(InOperand), but even if those are false, the vast majority of tokens will never be split. However, even if you narrow that down, I am not convinced that fixing this inconsistency around the formatting penalties is worth the runtime performance hit. I am generally fine with changing the behavior in the way you are proposing, but only if it has zero (negative) effect on performance. djasper: This is almost certainly a no-go. It turns the algorithm from exploring a state space with a…
		TypzAuthorUnsubmitted Not Done Reply Inline Actions Making the distinction to skip some path is done at the beginning of addNextStateToQueue(), though indeed not very precisely at the moment. I can improve the check (i.e. by copying all the early return conditions from the beginning of `ContinuationIndenter::breakProtrudingToken()`, which will greatly reduce the number of possible state, but stilll have a small impact. The alternative would be modify ContinuationIndenter::breakProtrudingToken(), so that it computes the penalty for reflowing as well as not-reflowing, and updates the LineState with the best solution. It would certainly have an impact on performance, but would not increase the size of the state space. Another issue with that approach is that the information is not passed from the DRYRUN phase to the actual rewriting phase. I could store this information in the LineState, to re-use it in the reconstruction phase, but this seems really wrong (and would work only in the exact case of the optimizing line formatter). What do you think? Should I keep the same structure but reduce the number of explored state; or move the decision into ContinuationIndenter, possibly storing the result in LineState? Typz: Making the distinction to skip some path is done at the beginning of addNextStateToQueue()…
		TypzAuthorUnsubmitted Not Done Reply Inline Actions Actually, it seems it cannot be done inside ContinuationIndenter::breakProtrudingToken(), since this causes the whitespace manager to be called twice for the same token. Typz: Actually, it seems it cannot be done inside ContinuationIndenter::breakProtrudingToken(), since…
addNextStateToQueue(Penalty, Node, /NewLine=/false, &Count, &Queue);		addNextStateToQueue(Penalty, Node, /NewLine=/false,
		/BreakToken=/true, &Count, &Queue);
if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Break)		if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Break)
addNextStateToQueue(Penalty, Node, /NewLine=/true, &Count, &Queue);		addNextStateToQueue(Penalty, Node, /NewLine=/true,
		/BreakToken=/true, &Count, &Queue);
		if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Continue)
		addNextStateToQueue(Penalty, Node, /NewLine=/false,
		/BreakToken=/false, &Count, &Queue);
		if (LastFormat == FD_Unformatted \|\| LastFormat == FD_Break)
		addNextStateToQueue(Penalty, Node, /NewLine=/true,
		/BreakToken=/false, &Count, &Queue);
}		}

if (Queue.empty()) {		if (Queue.empty()) {
// We were unable to find a solution, do nothing.		// We were unable to find a solution, do nothing.
// FIXME: Add diagnostic?		// FIXME: Add diagnostic?
DEBUG(llvm::dbgs() << "Could not find a solution.\n");		DEBUG(llvm::dbgs() << "Could not find a solution.\n");
return 0;		return 0;
}		}

// Reconstruct the solution.		// Reconstruct the solution.
if (!DryRun)		if (!DryRun)
reconstructPath(InitialState, Queue.top().second);		reconstructPath(InitialState, Queue.top().second);

DEBUG(llvm::dbgs() << "Total number of analyzed states: " << Count << "\n");		DEBUG(llvm::dbgs() << "Total number of analyzed states: " << Count << "\n");
DEBUG(llvm::dbgs() << "---\n");		DEBUG(llvm::dbgs() << "---\n");

return Penalty;		return Penalty;
}		}

/// \brief Add the following state to the analysis queue \c Queue.		/// \brief Add the following state to the analysis queue \c Queue.
///		///
/// Assume the current state is \p PreviousNode and has been reached with a		/// Assume the current state is \p PreviousNode and has been reached with a
/// penalty of \p Penalty. Insert a line break if \p NewLine is \c true.		/// penalty of \p Penalty. Insert a line break if \p NewLine is \c true.
void addNextStateToQueue(unsigned Penalty, StateNode *PreviousNode,		void addNextStateToQueue(unsigned Penalty, StateNode *PreviousNode,
bool NewLine, unsigned Count, QueueType Queue) {		bool NewLine, bool BreakToken, unsigned Count, QueueType Queue) {
if (NewLine && !Indenter->canBreak(PreviousNode->State))		if (NewLine && !Indenter->canBreak(PreviousNode->State))
return;		return;
if (!NewLine && Indenter->mustBreak(PreviousNode->State))		if (!NewLine && Indenter->mustBreak(PreviousNode->State))
return;		return;
		if (!BreakToken && !Indenter->canSplit(PreviousNode->State))
		djasperUnsubmitted Done Reply Inline Actions It's not clear to me why you'd return here. djasper: It's not clear to me why you'd return here.
		TypzAuthorUnsubmitted Done Reply Inline Actions This is needed to avoid trying to break when this is not supported: no point doing twice the same thing. Typz: This is needed to avoid trying to break when this is not supported: no point doing twice the…
		return;

StateNode *Node = new (Allocator.Allocate())		StateNode *Node = new (Allocator.Allocate())
StateNode(PreviousNode->State, NewLine, PreviousNode);		StateNode(PreviousNode->State, NewLine, BreakToken, PreviousNode);
if (!formatChildren(Node->State, NewLine, /DryRun=/true, Penalty))		if (!formatChildren(Node->State, NewLine, /DryRun=/true, Penalty))
return;		return;

Penalty += Indenter->addTokenToState(Node->State, NewLine, true);		Penalty += Indenter->addTokenToState(Node->State, NewLine, BreakToken, true);

Queue->push(QueueItem(OrderedPenalty(Penalty, *Count), Node));		Queue->push(QueueItem(OrderedPenalty(Penalty, *Count), Node));
++(*Count);		++(*Count);
}		}

/// \brief Applies the best formatting by reconstructing the path in the		/// \brief Applies the best formatting by reconstructing the path in the
/// solution space that leads to \c Best.		/// solution space that leads to \c Best.
void reconstructPath(LineState &State, StateNode *Best) {		void reconstructPath(LineState &State, StateNode *Best) {
std::deque<StateNode *> Path;		std::deque<StateNode *> Path;
// We do not need a break before the initial token.		// We do not need a break before the initial token.
while (Best->Previous) {		while (Best->Previous) {
Path.push_front(Best);		Path.push_front(Best);
Best = Best->Previous;		Best = Best->Previous;
}		}
for (std::deque<StateNode *>::iterator I = Path.begin(), E = Path.end();		for (std::deque<StateNode *>::iterator I = Path.begin(), E = Path.end();
I != E; ++I) {		I != E; ++I) {
unsigned Penalty = 0;		unsigned Penalty = 0;
formatChildren(State, (I)->NewLine, /DryRun=*/false, Penalty);		formatChildren(State, (I)->NewLine, /DryRun=*/false, Penalty);
Penalty += Indenter->addTokenToState(State, (*I)->NewLine, false);		Penalty += Indenter->addTokenToState(State, (I)->NewLine, (I)->BreakToken, false);

DEBUG({		DEBUG({
printLineState((*I)->Previous->State);		printLineState((*I)->Previous->State);
if ((*I)->NewLine) {		if ((*I)->NewLine) {
llvm::dbgs() << "Penalty for placing "		llvm::dbgs() << "Penalty for placing "
<< (*I)->Previous->State.NextToken->Tok.getName() << ": "		<< (*I)->Previous->State.NextToken->Tok.getName() << ": "
<< Penalty << "\n";		<< Penalty << "\n";
}		}
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

unittests/Format/FormatTest.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,536 Lines • ▼ Show 20 Lines
	}			}

	TEST_F(FormatTest, UnderstandPragmaOption) {			TEST_F(FormatTest, UnderstandPragmaOption) {
	verifyFormat("#pragma option -C -A");			verifyFormat("#pragma option -C -A");

	EXPECT_EQ("#pragma option -C -A", format("#pragma option -C -A"));			EXPECT_EQ("#pragma option -C -A", format("#pragma option -C -A"));
	}			}

				TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
				FormatStyle Style = getLLVMStyle();
				Style.ColumnLimit = 20;

				verifyFormat("int a; // the\n"
				" // comment", Style);
				verifyFormat("int a; /* first line\n"
				" * second\n"
				" * line third\n"
				" * line\n"
				" */", Style);

				Style.PenaltyExcessCharacter = 30;
				verifyFormat("int a; // the comment", Style);
				EXPECT_EQ("int a; // the\n"
				" // comment aa",
				format("int a; // the comment aa", Style));
				verifyFormat("int a; /* first line\n"
				" * second line\n"
				" * third line\n"
				" */", Style);
				EXPECT_EQ("int a; /* first line\n"
				" * second\n"
				" * line third\n"
				" * line\n"
				" */",
				format("int a; /* first line second line third line */", Style));
				TypzAuthorUnsubmitted Done Reply Inline Actions This is not working as expected, format return: int a; /* first line * second * line third * line / Any clue how things could go so wrong? Typz:* This is not working as expected, format return: int a; /* first line * second *…
				TypzAuthorUnsubmitted Done Reply Inline Actions This was a bug in tests: should not use test::messUp() with multi-line comments. Typz: This was a bug in tests: should not use test::messUp() with multi-line comments.
				}

	#define EXPECT_ALL_STYLES_EQUAL(Styles) \			#define EXPECT_ALL_STYLES_EQUAL(Styles) \
	for (size_t i = 1; i < Styles.size(); ++i) \			for (size_t i = 1; i < Styles.size(); ++i) \
	EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \			EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
	<< " differs from Style #0"			<< " differs from Style #0"

	TEST_F(FormatTest, GetsPredefinedStyleByName) {			TEST_F(FormatTest, GetsPredefinedStyleByName) {
	SmallVector<FormatStyle, 3> Styles;			SmallVector<FormatStyle, 3> Styles;
	Styles.resize(3);			Styles.resize(3);
	▲ Show 20 Lines • Show All 1,669 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

clang-format: consider not splitting tokens in optimizationAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100379

lib/Format/ContinuationIndenter.h

lib/Format/ContinuationIndenter.cpp

lib/Format/FormatToken.h

lib/Format/FormatToken.cpp

lib/Format/UnwrappedLineFormatter.cpp

unittests/Format/FormatTest.cpp

clang-format: consider not splitting tokens in optimization
AbandonedPublic