This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Ignore UnbreakableTailLength sometimes during breaking
ClosedPublic

Authored by krasimir on Jan 22 2018, 7:45 AM.

Details

Summary

This patch fixes an issue where the UnbreakableTailLength would be counted towards
the length of a token during breaking, even though we can break after the token.

For example, this proto text with column limit 20

# ColumnLimit: 20  V
foo: {
  bar: {
    bazoo: "aaaaaaa"
  }
}

was broken:

# ColumnLimit: 20  V
foo: {
  bar: {
    bazoo:
        "aaaaaaa"
  }
}

because the 2 closing } were counted towards the string literal's UnbreakableTailLength.

Diff Detail

Event Timeline

krasimir created this revision.Jan 22 2018, 7:45 AM
krasimir edited the summary of this revision. (Show Details)Jan 22 2018, 7:47 AM
krasimir added a reviewer: djasper.
djasper added inline comments.Jan 22 2018, 3:48 PM
lib/Format/ContinuationIndenter.cpp
1579

I think a comment might help here. Specifically, it should mention that this is required for the special case where there is an unbreakable tail only if certain other formatting decisions have been taken. The UnbreakableTailLength is an overapproximation in that case and we need to be correct here.

Thinking about this some more, there might actually be cases where this is still not correct as the unbreakable tail is neither 0 nor the precomputed value.

E.g. can we construct a case where there is a trailing comma in a braced list? Maybe this:

vector<string> x = {"aaaaaa",};

In this case, I think the comma will always be part of the string literal, but the "};" only get part of the tail if we don't wrap after the "{".

krasimir updated this revision to Diff 131012.Jan 23 2018, 2:19 AM
  • Address review comments
krasimir marked an inline comment as done.Jan 23 2018, 2:19 AM
krasimir added inline comments.
lib/Format/ContinuationIndenter.cpp
1579

In the vector<string> x = {"aaaaa",}; only the comma is part of the UnbreakableTailLength of the string:

M=1 C=1 T=Unknown S=0 B=0 BK=0 P=59 Name=string_literal L=100 PPK=2 FakeLParens=1/ FakeRParens=0 Text='"aaaaa"'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=41 Name=comma L=101 PPK=2 FakeLParens= FakeRParens=1 Text=','
M=1 C=1 T=Unknown S=1 B=0 BK=0 P=41 Name=r_brace L=181 PPK=2 FakeLParens= FakeRParens=1 Text='}'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=23 Name=semi L=182 PPK=2 FakeLParens= FakeRParens=0 Text=';'

breakProtrudingToken: CanBreak=0, UnbreakableTailLength=1
krasimir marked an inline comment as done.Jan 23 2018, 2:53 AM
djasper added inline comments.Jan 23 2018, 2:58 AM
lib/Format/ContinuationIndenter.cpp
1579

Ok. The comma works around this by forcing a break before the "}". But what about:

vector<string> x = {"abc" /*comment*/};

The comment should be an unbreakable tail.

krasimir added inline comments.Jan 23 2018, 3:08 AM
lib/Format/ContinuationIndenter.cpp
1579

As you discovered, the comment-in-tail case is special-cased in the unbreakable tail computation, so this is not an issue.

djasper accepted this revision.Jan 23 2018, 3:10 AM

Happy to go forward with this. I think we might also wanna investigate whether entirely removing UnbreakableTailLength would be beneficial. I think we implemented it as an optimization, but I can actually imagine it saving much. Plus the code would be simpler and we would conserve some memory.

This revision is now accepted and ready to land.Jan 23 2018, 3:10 AM
krasimir updated this revision to Diff 131023.Jan 23 2018, 3:10 AM
krasimir marked 3 inline comments as done.
  • Update comment
This revision was automatically updated to reflect the committed changes.