This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Do not merge very long C# automatic properties
AbandonedPublic

Authored by jbcoe on Feb 26 2020, 9:29 AM.

Details

Summary

Wrap default value assignment onto the next line in cases where the line limit would be exceeded.

MyVeryLongTypeName MyVeryLongVariableName { internal get; internal set; }
  = new MyVeryLongTypeName();

This is a best-effort heuristic and covers some but not all cases where line length is exceeded.

Diff Detail

Event Timeline

jbcoe created this revision.Feb 26 2020, 9:29 AM
jbcoe retitled this revision from clang-format] Do not merge very long C# autoamtic properties to clang-format] Do not merge very long C# automatic properties.
jbcoe edited the summary of this revision. (Show Details)
jbcoe edited the summary of this revision. (Show Details)
jbcoe edited the summary of this revision. (Show Details)
Eugene.Zelenko set the repository for this revision to rG LLVM Github Monorepo.
Eugene.Zelenko edited subscribers, added: cfe-commits; removed: MyDeveloperDay.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2020, 9:57 AM
Eugene.Zelenko retitled this revision from clang-format] Do not merge very long C# automatic properties to [clang-format] Do not merge very long C# automatic properties.Feb 26 2020, 9:58 AM
krasimir added inline comments.Feb 28 2020, 4:32 AM
clang/lib/Format/UnwrappedLineFormatter.cpp
489

Why not always (not just when Limit is broken) merge the whole automatic property declaration into 1 line?
Syntactically, this whole thing is a unit:
MyVeryLongTypeName Name { get; set } = new MyVeryLongTypeName();
My C# is rusty, what are the things after the = that are syntactically allowed in general?

jbcoe marked an inline comment as done.Feb 28 2020, 4:53 AM
jbcoe added inline comments.
clang/lib/Format/UnwrappedLineFormatter.cpp
489

I think any expression is valid.

jbcoe planned changes to this revision.Mar 2 2020, 11:10 AM

I will try to handle this in unwrapped line parser.

Here's some empirical ideas about the approach: In clang-format, braces can be handled in two quite distinct ways, controlled by BraceBlockKind: BK_Block and BK_BracedInit.
BK_Block is for braces that open blocks that usually are at one level deeper and consist of a sequence of statements and other constructs.
BK_BracedInit is for initializer lists, dictionaries and other similar syntactics that are somewhat more appropriate to put together into a line.
The level of granularity of detailed formatting in clang-format is an unwrapped line, which is conceptually a sequence of tokens that "make sense together as a single line" (without going much into style details and ignoring column limits). Separate statements are for example put on separate unwrapped lines. The formatting information flowing between unwrapped lines includes just higher-level data such as the current nesting depth.
The brace detection is handled primarily in UnwrappedLineParser::calculateBraceTypes, which happens quite early during the parsing of the input sequence.

If an opening brace is marked BK_Block there, later the stuff between it and the matching closing brace is parsed as a block: multiple "statements" are put on their own unrwapped lines, inside the block.
If the brace is marked BK_BracedInit, the staff following it is parsed more like dictionary-struct-array-literal stuff, and importantly is kept on the same unwrapped line as the surrounding code (as a braced list may occur as a subexpression of a larger expression, and the formatting of the larger expression may depend non-trivially by the formatting of the braced list).

For example, consider the following pseudo-C-family fragment:

% cat test.cc                                          

int f() {
  block_example {
    get;
    set;
  };

  int init_list_example({1, 2, {more}}, other);
}

If we examine the parsed unwrapped lines (clang-format -debug), they look like:

Line(0, FSC=0): int[T=81, OC=0] identifier[T=81, OC=4] l_paren[T=81, OC=5] r_paren[T=81, OC=6] l_brace[T=21, OC=8] 
Line(1, FSC=0): identifier[T=81, OC=2] l_brace[T=21, OC=16] 
Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
Line(1, FSC=0): r_brace[T=81, OC=2] semi[T=81, OC=3] 
Line(1, FSC=0): int[T=81, OC=2] identifier[T=81, OC=6] l_paren[T=81, OC=23] l_brace[T=81, OC=24] numeric_constant[T=81, OC=25] comma[T=81, OC=26] numeric_constant[T=81, OC=28] comma[T=81, OC=29] l_brace[T=81, OC=31] identifier[T=81, OC=32] r_brace[T=81, OC=36] r_brace[T=81, OC=37] comma[T=81, OC=38] identifier[T=81, OC=40] r_paren[T=81, OC=45] semi[T=81, OC=46] 
Line(0, FSC=0): r_brace[T=81, OC=0] 
Line(0, FSC=0): eof[T=81, OC=0]

The block-like thing is put on separate lines at level 2; the braced-list-like thing is kept on the same unwrapped line.

Currently, C# auto property get/set "blocks" are parsed as BK_Block, hence the need to later merge lines using a heuristic.
If you could instead mark those as BK_BracedInit in calculateBraceTypes, that will have the effect of keeping them inline with the surrounding code and might produce good formatting of the whole line with a few tweaks.
There's another complication: syntactically in C++, a semicolon is very special. For example, C/C++ requires a semicolon after class declarations. There's a bunch of places in clang-formatthat use the presence of semicolons to determine "end of statement/end of line". So the semicolons in { get; set; } might cause a bit of trouble and throw the parser off a bit. Fortunately, clang-format already has some code that deals with similar complications for javascript, where stuff like X<{a: string; b: number}> is correctly handled as a braced list, even in the presence of semicolons inside. You can look at how this is handled for inspiration (I'm not 100% sure these are the only places in code that contribute to the formatting of these constructs):

I hope this is helpful, but please take it with a grain of salt as I'm not very familiar with those parts of clang-format and am mostly poking around it and looking at how similar constructs for other languages are handled.

MyDeveloperDay added a comment.EditedMar 3 2020, 1:22 PM

@krasimir, well I learned something new, so thanks for taking the time to explain that to help improve our understanding.

jbcoe abandoned this revision.May 4 2020, 3:00 AM