This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add BeforeStructInitialization option in BraceWrapping configuration
AbandonedPublic

Authored by HazardyKnusperkeks on Nov 23 2020, 2:02 AM.

Details

Summary

If `true`, struct left brace will be placed after line breaks.
true:
struct new_struct struct_name =

{...};

false:
struct new_struct struct_name = {
...};

Diff Detail

Event Timeline

anastasiia_lukianenko requested review of this revision.Nov 23 2020, 2:02 AM
Eugene.Zelenko edited reviewers, added: MyDeveloperDay; removed: Restricted Project.Nov 23 2020, 6:56 AM
Eugene.Zelenko set the repository for this revision to rG LLVM Github Monorepo.
Eugene.Zelenko added a subscriber: cfe-commits.
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2020, 6:56 AM

Could you make you patch with a full context diff, plus we really want unit tests for all changes

if you change Format.h you need to regenerate the ClangFormatStyleOptions.rst by running dump_style.py in clang/docs/tools

MyDeveloperDay requested changes to this revision.Nov 23 2020, 7:05 AM
This revision now requires changes to proceed.Nov 23 2020, 7:05 AM
anastasiia_lukianenko updated this revision to Diff 307797.EditedNov 26 2020, 2:12 AM
  1. Diff was made by git diff with full context
  2. Added description in documentation
  3. Added unit test

Ok, thank you for making the changes so its easier to review, Do you think this style should be part of the "BraceMapping" style?

Firstly thank you for the patch, and believe me I don't want to discourage you (as I remember my first patches), I think the rational is ok, but I think you might be laser focused on your use case and not looking at the impact on the change of the wider clang-format.

To be honest I can't really tell without checking it out and building it but certainly I think its missing some initialization which perhaps might mean its not actually doing anything, (perhaps in debug mode it is) but in release mode it could end up making random changes if it gets initialized incorrectly.

I think it "does what it says on the tin", but I think it could have some other consequences and side effects which we need to explore first, these are best exposed by adding some more tests to show other areas where the same pattern of tokens occurs do not change by the setting of this configuration option

clang/lib/Format/ContinuationIndenter.cpp
957 ↗(On Diff #307797)

gosh! this seems like it could throw a spanner in the works if we turn this on..

clang/lib/Format/TokenAnnotator.cpp
3492 ↗(On Diff #307797)

This feels quite general and as its high up in this function I'm thinking its going to get hit alot...

To me it feels like any construct like this could potentially fall into this trap correct

= {

what would happen to

std::vector<int> v = {2, 1};

did the FormatTests run cleanly, (assuming they did wow!)

clang/unittests/Format/FormatTest.cpp
5072

you are missing a PARSE check test for this

5075

This is never initialized in the main code? so what value will it have by default?

MyDeveloperDay requested changes to this revision.Nov 26 2020, 4:56 AM
This revision now requires changes to proceed.Nov 26 2020, 4:56 AM

Thanks for the detailed review. After more detailed testing, I will upload a new version of the patches, which will fix configuration influence on other formatting options.

clang/lib/Format/TokenAnnotator.cpp
3492 ↗(On Diff #307797)

Thank you for paying attention to this problem. I tested these configurations mostly on C code and now I see that it has undefined behavior in other languages... I think this condition must have more specific parameters that will format the code only for structures.

clang/unittests/Format/FormatTest.cpp
5075

I'll improve tests and upload new version.

anastasiia_lukianenko retitled this revision from [clang-format] Add BreakBeforeStructInitialization configuration to [clang-format] Add BeforeStructInitialization option in BraceWrapping configuration.
  1. Configuration became BraceWrapping option
  2. Added CHECK_PARSE_NESTED_BOOL test
  3. Fixed bugs with unwanted formatting
anastasiia_lukianenko marked 4 inline comments as done.Dec 3 2020, 12:48 PM
MyDeveloperDay requested changes to this revision.Dec 4 2020, 2:01 AM

Its close

clang/lib/Format/Format.cpp
828

I think you may need to rebase, this code looks to be from before a relatively recent change

974

I believe there are 3 places where BraceWrapping is initialized you seem to only be doing 2 of them

clang/unittests/Format/FormatTest.cpp
5086

could you add an additional test without the trailing ,

This revision now requires changes to proceed.Dec 4 2020, 2:01 AM
clang/lib/Format/Format.cpp
974

Sorry, I didn't find one more place. Can you please say where it is?

clang/unittests/Format/FormatTest.cpp
5086

Yes, I can add the test, but before I want to discuss the expected result.
Now with my patch and BeforeStructInitialization = false the behavior is the following:

struct new_struct struct_name = {a = 1};
struct new_struct struct_name = {a = 1, b = 2};

And with BeforeStructInitialization = true:

struct new_struct struct_name =
{a = 1};
struct new_struct struct_name =
{a = 1, b = 2};

Is it okay?

MyDeveloperDay added inline comments.Dec 7 2020, 4:29 AM
clang/lib/Format/Format.cpp
974

expandPresets() - twice
getLLVMStyle() - once

clang/unittests/Format/FormatTest.cpp
5086

that is kind of what I expected, and adding the , is a known trick to cause it to expand the struct out.

I think its worth putting a test in, but I'll ask the same question what do you think it should look like?

clang/lib/Format/Format.cpp
974

Ok, thank you.

clang/unittests/Format/FormatTest.cpp
5086

I agree with you. And I'll add these tests to patch.

MyDeveloperDay added inline comments.Dec 7 2020, 10:14 AM
clang/lib/Format/ContinuationIndenter.cpp
979 ↗(On Diff #309939)

I personally can't tell if this is the correct place to do this, was there a reason why you couldn't add it to mustBreakBefore()?

clang/unittests/Format/FormatTest.cpp
5098

I wonder if there should be other options

i.e.

struct new_struct struct_name = {
    a = 1
};
struct new_struct struct_name = { a = 1 };
struct new_struct struct_name = 
{ 
    a = 1,
};

Again I don't want to discourage you, but this is very much related to one of the key problem areas of clang-format that it really doesn't do well, so its one of the places users ALWAY have to add //clang-format off

e.g.

Status GradForBinaryCwise(FunctionDef* g, std::vector<FDH::Node> body) {
  // clang-format off
  std::vector<FDH::Node> nodes = {
    {{"sx"}, "Shape", {"x"}},
    {{"sy"}, "Shape", {"y"}},
  };
  // clang-format on

If we are going to fix this to some extent we should fix it for all those cases too. I understand that that might be beyond what you are trying to achieve, but I think its important that we explore what might be needed to satisfy being able to get rid of all these //clang-format off cases as some point and you've woken the beast in this area ;-)

To be honest fixing this is one of biggest things (and very commendable if you want to focus on it) and whilst your fix is along the right lines, it only addresses the struct cases (and not the class case example (as above) and it perhaps doesn't cover where there might be something before the struct as in

static struct X Y = {
}

If your interested in pursuing this still, I'm happy to try and help you walk through it, where I can. However you'll have to bare with me as it might be beyond my skill set.

MyDeveloperDay added a comment.EditedDec 7 2020, 10:16 AM

Also how does this relate to the AfterStruct setting?

bool AfterStruct Wrap struct definitions.

true:
struct foo
{
  int x;
};

false:
struct foo {
  int x;
};

I presume this only applied to definition and not instantiation

AfterStruct setting doesn't affect struct initialization. Your presumption is right.
About these examples... I think that second and third variants are the right ones.

struct new_struct struct_name = {
    a = 1
};
struct new_struct struct_name = { a = 1 };
struct new_struct struct_name = 
{ 
    a = 1,
};

There is a test which shows that first example is not expected in case without break after '='

TEST_F(FormatTest, UnderstandContextOfRecordTypeKeywords) {
  // Elaborate type variable declarations.
  verifyFormat("struct foo a = {bar};\nint n;");
MyDeveloperDay requested changes to this revision.Jan 17 2021, 3:22 AM

I think the revision whilst it does what is needed to structs doesn't address the many other times this forms appear. I think we need something a little more extensive. It can't just be when a line starts with struct

This revision now requires changes to proceed.Jan 17 2021, 3:22 AM

I think the revision whilst it does what is needed to structs doesn't address the many other times this forms appear. I think we need something a little more extensive. It can't just be when a line starts with struct

I did it in the same way as existing options made in MustBreakBefore function (here we can see that we check only Line.startsWith(tok::kw_struct), no more extensive conditions):

if (isAllmanBrace(Left) || isAllmanBrace(Right))
  return (Line.startsWith(tok::kw_enum) && Style.BraceWrapping.AfterEnum) ||
         (Line.startsWith(tok::kw_typedef, tok::kw_enum) &&
          Style.BraceWrapping.AfterEnum) ||
         (Line.startsWith(tok::kw_class) && Style.BraceWrapping.AfterClass) ||
         (Line.startsWith(tok::kw_struct) && Style.BraceWrapping.AfterStruct);
clang/unittests/Format/FormatTest.cpp
5098

I understand what you mean, but at the moment options such as "AfterStruct" are also implemented with checking the first word in a line (not considering cases with "static" word). It seems to me that the option that I add should only concern structures, and new configurations should be created for the class, etc. (BreakBeforeClassInit, BreakBeforeEnumInit), so that the formatting would be more flexible.

A quick search of github shows over 1 billion hits to the word struct, in a variety of flavours, I'm just not convinced we want a separate option for each and every case,

Could we not look for a sequence of "tok::kw_struct,tok::identifier,tok::lbrace" or kw_strcut,tok::lbrace? rather than using the rather fragile Line->StartsWith which is only going to capture some of the cases.

typedef struct  Foo{
   ..
}

static struct Foo {
   ..
}

static constexpr struct Foo {
   ..
}

template <> struct  Foo{
  ...
}

struct {
}
anastasiia_lukianenko marked an inline comment as not done.
HazardyKnusperkeks requested changes to this revision.Apr 23 2021, 8:35 AM

Looks okay for me, but please fix the formatting notes.

This revision now requires changes to proceed.Apr 23 2021, 8:35 AM
clang/lib/Format/UnwrappedLineParser.cpp
2705–2709

shouldn't this be

FormatTok->Tok.is(tok::equal) && Style.BraceWrapping.BeforeStructInitialization

Otherwise you won't go into the else if (FormatTok->Tok.is(tok::l_brace)) { if its not turned on?

clang/lib/Format/UnwrappedLineParser.cpp
2705–2709

Is there a test case which would fail without that change? (I think no.) Should there be one? (I think yes!)

Looks good, but please wait for others to state their opinion.

MyDeveloperDay added a comment.EditedMay 8 2021, 2:59 AM

I'd like @curdeius and @krasimir input (as they tend to have more insight than me! ;-) ), my concern is its too simplistic a rule and won't cover all the uses cases, it may cover yours but does it cover others that are unwanted?

if (FormatTok->Tok.is(tok::equal) &&
      Style.BraceWrapping.BeforeStructInitialization) {
    nextToken();
    addUnwrappedLine();
  } else if (FormatTok->Tok.is(tok::l_brace)) {

I don't see how this change won't try and change other code (not just structs, but maybe it would impact classes and braced initialization too? (not sure about that), but perhaps I'm missing something. It tough for us sometimes to put our names to this as reviewers without understanding the implications to other code, but in principle, I'm not opposed to this idea just nervous we'll break other things

As its off by default I guess its not terrible. If the others are ok with this, I'd be happy to mark it good.

Do we have some widely used code style that requires the new option (https://clang.llvm.org/docs/ClangFormatStyleOptions.html#adding-additional-style-options)?
I agree with @MyDeveloperDay that this option as-is is likely not addressing all the cases where we would want this to apply.

I don't think we should later be adding something like BreakBeforeClassInit -- IMO whichever mechanism is used to control struct cases should also apply to class cases (modulo commonly used style guide rules).

Note that there is a big difference in context between this and, e.g., AfterStruct: when you're defining a struct, generally the struct keyword is required, and that's why the check Line.startsWith(tok::kw_struct) in that context is appropriate. In the context of initialization this is not the case, so, as a user I'd expect BreakBeforeStructInit to work in cases like this where the struct keyword is not present:

struct point {
  int x, y;
};

int f() {
  // here:
  point p =
  {
    x = 1,
    y = 2,
  };
}

Since clang-format doesn't really do any semantic analysis, it might be more appropriate to generalize this concept in a way that more closely matches the C++ syntax structures, e.g., to something that controls the breaking behavior before { in appropriate cases of some copy-list-initialization forms (https://en.cppreference.com/w/cpp/language/list_initialization), but without some style guide to drive the design it's hard to tell what would make sense.

clang/unittests/Format/FormatTest.cpp
5086

I'd expect the lines starting with { to be indented in those cases, probably by +4 spaces in LLVM-like style (ContinuationIndent), consistently with other places where the RHS of a binary expression gets wrapped on a new line:
Not:

struct new_struct struct_name =
{a = 1};
struct new_struct struct_name =
{a = 1, b = 2};

But:

struct new_struct struct_name =
    {a = 1};
struct new_struct struct_name =
    {a = 1, b = 2};

@krasimir could you please check new diff?

This patch still only considers structs? @krasimir and I are I think saying that really this should it support class in the same way? I mean isn't a struct and a class almost identical in this context why would we want treat them differently.

class new_struct struct_name =
    {a = 1};

struct new_struct struct_name =
    {a = 1};

Sorry to push you on the patch quality (this and your other patch), I'm personally always a little extra cautious of peoples first patches, especially where they are adding new functionality.

This is because we don't need drive by patches where people come in drop just the functionality they require then disappear, I'm not saying you'll do that but we have to think about the longer term maintainability of any patch, we have to understand it and it has to give us the capability that we think the whole community requires. Your use case is valid, but such a change like this will live forever so ideally we want it to address everything in a consistent way. (I'm not even convinced by the Struct in BeforeStructInitialization name, just saying)

As these are your first patches I'm personally likely to have extra levels of scrutiny. I'm sorry for that, (its self preservation), I want to know you are willing to be a contributor who comes back to help and maintain your own patch as well as others.

This is why we always advise people to start by looking at a few bugs from the bug tracker, nothing says I'm here to stay like fixing other peoples issues and not your own requirements. Plus you don't have to justify the change when its fixing an issue or a regression. Once people have established a "reputation" for being a consistent/semi consistent contributor then getting new functionality in is always a little easier. (that may not be fair, but its just advice on how to get stuff in!)

As I look back at my own first patches they took months to get in (and lots of diffs), I made the the mistake of wanting to add a new feature ("modernize-use-nodiscard" clang-tidy check), I felt frustration at my patch constantly coming back, but I stuck at it.

Bug fixes can often be rubber stamped in days and that can be a good learning curve for those wanting to get started with clang, as you can make 2 or 3 commits quite rapidly, which makes you feel good about contributing. It also builds trust and confidence so when you want to add something more ambitious, reviewers have more faith and likely cut you a little more slack. (just saying, some advice on starting)

anastasiia_lukianenko edited the summary of this revision. (Show Details)

Are we happy with the way this behaves? I realise some of this needs semantic information but I think the fact it only handles 1 of these 4 cases leaves me feeling a little cold. (I just don't want to be having to defend this as to why it doesn't work in all cases.)

---
Language: Cpp
BasedOnStyle: LLVM
BreakBeforeBraces: Custom
BraceWrapping:
    BeforeClassStructInit: true
struct S2 {
  int x, y;
};

void foo() {
  struct S2 a1 =
      {1, 2};
  struct S2 a2[3] = {{1, 2}, {3, 4}, 5, 6};

  S2 a3 = {1, 2};
  S2 a4[3] = {{1, 2}, {3, 4}, 5, 6};
}

int main(int argc, char **argv) { return 0; }
MyDeveloperDay requested changes to this revision.Nov 4 2021, 2:42 AM
This revision now requires changes to proceed.Nov 4 2021, 2:42 AM

Do we have some widely used code style that requires the new option (https://clang.llvm.org/docs/ClangFormatStyleOptions.html#adding-additional-style-options)?

I have the same question.

HazardyKnusperkeks commandeered this revision.Nov 8 2022, 12:54 PM
HazardyKnusperkeks edited reviewers, added: anastasiia_lukianenko; removed: HazardyKnusperkeks.
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 12:54 PM

More than one year silence.