This is an archive of the discontinued LLVM Phabricator instance.

[clang-format]: Add NonEmptyParentheses spacing option
ClosedPublic

Authored by reuk on Dec 1 2018, 3:25 PM.

Details

Summary

This patch aims to add support for the following rules from the JUCE coding standards:

  • Always put a space before an open parenthesis that contains text - e.g. foo (123);
  • Never put a space before an empty pair of open/close parenthesis - e.g. foo();

The entire set of JUCE coding guidelines can be found here. Unfortunately, clang-format can't enforce all of these style rules at the moment, so I'm trying to add support for them.

Patch by Reuben Thomas

Diff Detail

Repository
rL LLVM

Event Timeline

reuk created this revision.Dec 1 2018, 3:25 PM
reuk added a subscriber: klimek.Dec 10 2018, 3:32 AM

Would someone review this please? I'm not sure who to add for review (sorry), maybe one of the following?
@klimek @Typz @krasimir

MyDeveloperDay added a project: Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2019, 2:23 PM

Hi @reuk,

I'm trying to go over old reviews and see if they are still desired, I stumbled on your review and want to see if its still important. Here is some initial feedback.

I know its ironic that the clang-format code isn't clang-formatted itself, but it might help if your changes didn't also include additional unrelated clang-formatting changes because its a little harder to see what actually changed, and your review would be much smaller.

perhaps the next steps could be:

  1. you rebase to current head (since they've left you hanging her for 3 months!)
  2. you use something like git difftool to interactively put back the "formatting" of code which isn't part of this change (yes we should really clang-format all of clang-format, because I know I've hit this myself recently)
  3. The one thing I struggle with when looking at the clang-format code and this review (and its probably part of why the owners seem a bit reluctant to let anything else in), is encapsulated by this part of the change
return Line.Type == LT_ObjCDecl || Left.is(tok::semi) ||
       (Style.SpaceBeforeParens != FormatStyle::SBPO_Never &&
        (Left.isOneOf(tok::kw_if, tok::pp_elif, tok::kw_for, tok::kw_while,
                      tok::kw_switch, tok::kw_case, TT_ForEachMacro,
                      TT_ObjCForIn) ||
         Left.endsSequence(tok::kw_constexpr, tok::kw_if) ||
         (Left.isOneOf(tok::kw_try, Keywords.kw___except, tok::kw_catch,
                       tok::kw_new, tok::kw_delete) &&
          (!Left.Previous || Left.Previous->isNot(tok::period))))) ||
       (Style.SpaceBeforeParens == FormatStyle::SBPO_Always &&
        (Left.is(tok::identifier) || Left.isFunctionLikeKeyword() ||
         Left.is(tok::r_paren) ||
         (Left.Tok.getIdentifierInfo() &&
          Left.Tok.getIdentifierInfo()->isKeyword(
              getFormattingLangOpts(Style)))) &&
        Line.Type != LT_PreprocessorDirective) ||
       (Style.SpaceBeforeParens == FormatStyle::SBPO_NonEmptyParentheses &&
        Right.ParameterCount >= 1 &&
        (Left.is(tok::identifier) || Left.isFunctionLikeKeyword() ||
         Left.is(tok::r_square) || Left.is(tok::r_paren) ||
         (Left.Tok.getIdentifierInfo() &&
          Left.Tok.getIdentifierInfo()->isKeyword(
              getFormattingLangOpts(Style)))) &&
        Line.Type != LT_PreprocessorDirective);

There is nothing wrong with it per say I'm sure it works beautifully but its a little hard to reason about.

I know you only adding to the end of it, but I think if this was in clang-tidy we'd be asked to write it as a function/matcher to encapsulate this kind of functionality. Its also an indictment of clang format that this isn't aligning in such a way that it can be more easily read, I mean I know its done its job, but its quite a challenge to look at! ;-)

To me clang-format needs more functions like

Left.isIdentifier()
Left.isParen()
Left.isLSquare() 
Right.hasParameters()

in order to help break complex conditional like this down into a readable form which can be rationalised about.

This would help hide some of the tok::XXX complexity which can make my eyes at least go funny.. and code like this (which I think you added)

Left.Tok.getIdentifierInfo() && Left.Tok.getIdentifierInfo()->isKeyword(getFormattingLangOpts(Style)))   (especially as its repeated in the clause)

It looks like it means some means something functional which perhaps could be combined:

(sorry I don't know what its doing but I think you do)

isLanguageKeyWord(Left,Style)

or

Left.isLanguageKeyWord(Style);

If you could add the part you added, is a more succinct way,

|| isFunctionWithNoArguments(Left,Right)

rather than

||
             (Left.Tok.getIdentifierInfo() &&
              Left.Tok.getIdentifierInfo()->isKeyword(
                  getFormattingLangOpts(Style)))) &&
            Line.Type != LT_PreprocessorDirective) ||
           (Style.SpaceBeforeParens == FormatStyle::SBPO_NonEmptyParentheses &&
            Right.ParameterCount >= 1 &&
            (Left.is(tok::identifier) || Left.isFunctionLikeKeyword() ||
             Left.is(tok::r_square) || Left.is(tok::r_paren) ||
             (Left.Tok.getIdentifierInfo() &&
              Left.Tok.getIdentifierInfo()->isKeyword(
                  getFormattingLangOpts(Style)))) &&
            Line.Type != LT_PreprocessorDirective);

I think there would be a stronger argument that it should go in, given that you have a publicly available style guide etc... (owners may not agree!)

Perhaps too much of my own opinion here... but maybe the "development costs" would go down if thing like

Left.endsSequence(tok::kw_constexpr, tok::kw_if)

where written as

Left.isConstExprIf()

if that is what its doing?

sorry to be critical, just trying to help get traction on your review...

lib/Format/TokenAnnotator.cpp
65 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

526 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

1392 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

2420 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

2545 ↗(On Diff #176272)

would be nice to pull the 3 items into a function because you repeat it on line 2551

2548 ↗(On Diff #176272)

something similar here when combined with line 2540, looks a lot like the 2 patterns below?

almost like its:

bool allowSpaceBeforeIfFunctionHasParameters(Style,Right);
2617 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

2833 ↗(On Diff #176272)

can this and the above section on line 2745 be combined? I'm not sure if the order of the "if" indicates precedence

3138 ↗(On Diff #176272)

unrelated change, consider backing out, perhaps we could do a clang-format of the file as a separate pass

reuk updated this revision to Diff 189942.Mar 8 2019, 2:46 PM

I've rebased onto master, and removed unrelated formatting changes. I've also tried to remove some of the duplicate parens-related expressions.

I agree that the heavy nested boolean expressions are a bit painful to read, but I'm not sure whether a more general tidy-up falls within the scope of this PR.

MyDeveloperDay accepted this revision.Mar 9 2019, 2:44 AM

I'm not the code owner but this LGTM,

Assuming the unit tests remain passing, I'd like to see more items like this in clang-format addressed, at present we seem to lack getting even bug fixes reviewed!

Thank you for breaking out the common code, I do think more functions like this in clang-format would help make the code more readable. I also think the various if clauses having comment helps break up these huge monolithic if statements

Thanks for the patch.

This revision is now accepted and ready to land.Mar 9 2019, 2:44 AM
reuk added a comment.Mar 9 2019, 12:34 PM

Thanks for the review, and for approving this PR. It's very much appreciated!

I don't have merge rights - could someone commit this for me please?

Thanks for the review, and for approving this PR. It's very much appreciated!

I don't have merge rights - could someone commit this for me please?

I'd actively encourage you to go get commit access, I think its much better when the commit comes from the author.

There just isn't enough people actively involved in clang-format (and other tools) for us to get reviews even looked at (even for defects), we need more people involved fixing defects and doing reviews, getting commit access shows an intent to fix anything in that comes from your own review which is one of the things the code owners push as being an objection to adding more code.

Thanks for the review, and for approving this PR. It's very much appreciated!

I don't have merge rights - could someone commit this for me please?

I'd actively encourage you to go get commit access, I think its much better when the commit comes from the author.

There just isn't enough people actively involved in clang-format (and other tools) for us to get reviews even looked at (even for defects), we need more people involved fixing defects and doing reviews, getting commit access shows an intent to fix anything in that comes from your own review which is one of the things the code owners push as being an objection to adding more code.

I'm sorry for not having a positive answer here, but I'm not in favor of this change. The style rule looks like it introduces arbitrary complexity at a point where I don't understand at all how it matters. We cannot possibly support all style guides that come up with random variance. I do not think this option pulls its weight.

MyDeveloperDay requested changes to this revision.Mar 14 2019, 12:32 PM

Pulling this revision down to build it caused build issues with missing SBBLO_Always, could you investigate before committing?

lib/Format/TokenAnnotator.cpp
2607 ↗(On Diff #189942)

Where are these SBBLO defined? I cannot find them in Format.h from what I can SpaceBeforeCpp11BracedList is a bool

This revision now requires changes to proceed.Mar 14 2019, 12:32 PM
reuk added a comment.Mar 15 2019, 3:17 PM

@MyDeveloperDay I'm not sure you built this branch. Perhaps you applied this patch to an older version of the repo. For me, SpaceBeforeCpp11BracedListOptions is defined at Format.h:1570.

This builds fine for me, on macOS 10.14 with Xcode 10.1's clang. I just rebased onto the most recent clang master and updated llvm. I've been testing/building with this command:

cmake --build build --target FormatTests && ./build/tools/clang/unittests/Format/FormatTests && cmake --build build --target clang-format

@klimek That's disappointing, I thought other JUCE users might benefit from this patch (it's quite an established library), and the amount of code that I'm a adding is not that high.

I'm happy to be wrong, but In current master, SpaceBeforeCpp11BracedList is a boolean not an enum (but I think I've seen a review changing it somewhere which maybe isn't landed)

https://reviews.llvm.org/source/llvm-github/browse/master/clang/include/clang/Format/Format.h$1578

I'm sorry for not having a positive answer here, but I'm not in favor of this change. The style rule looks like it introduces arbitrary complexity at a point where I don't understand at all how it matters. We cannot possibly support all style guides that come up with random variance. I do not think this option pulls its weight.

So is this a personal opinion or based on a technical reason (other than the normal mantra about risk & cost), Could we give @reuk some positive feedback about what he needs to change in this review to get this accepted? He has the other boxes checked about public style etc..

It may not be your preference but its not without some merit for those of us who like to fine tune our style, if that can be done with minimal impact to other styles and is well covered by tests, then can we at least have some discussion first

@reuk please lets continue to fix this revision so it builds as I want to pull it into

https://github.com/mydeveloperday/clang-experimental

As an aside, Wouldn't it be great if JUCE was a public style so you could just say

BasedOnStyle: JUCE

Where that style defines what JUCE uses? I've always found this odd, I've never understood why we only have Google,WebKit,Mozilla,LLVM, I would have thought it would have been nice to have other big guns Microsoft,Qt generally its only going to be a single function setting the defaults.

I actually think a change like that would really help people so they don't have to work out all the individual styles, it would help keep .clang-format files small and neat and not full of a bunch of different options.

Even if JUCE was based on another style say "Google" the code should really be saying

if (Style.isGoogleDerivedStyle())

......

and not

Style == FormatStyle::Google

When its specific to JUCE it would say

if (Style.isJUCEDerivedStyle())

......

I know when using LLVM, that being able to have a .clang-format file that just says

BasedOnStyle: LLVM

is really nice, I don't have to go work out what the LLVM style is, I just get what the project defines, I think it would be great that anything that passes the public style test for submissions should really be able to add that kind of configuration.

reuk updated this revision to Diff 191015.Mar 17 2019, 5:56 AM

@MyDeveloperDay I'm sorry, you're absolutely correct. I'd got some other JUCE-related changes mixed up in this PR. Should be fixed now.

MyDeveloperDay added inline comments.Mar 17 2019, 3:20 PM
include/clang/Format/Format.h
1578 ↗(On Diff #191015)

boolean here not enum see comment below

lib/Format/TokenAnnotator.cpp
2608 ↗(On Diff #191015)

Did the patch upload correctly? this is still not showing a bool for SpaceBeforeCpp11BrackedList

reuk updated this revision to Diff 191050.Mar 17 2019, 3:24 PM

I'm sorry for not having a positive answer here, but I'm not in favor of this change. The style rule looks like it introduces arbitrary complexity at a point where I don't understand at all how it matters. We cannot possibly support all style guides that come up with random variance. I do not think this option pulls its weight.

So is this a personal opinion or based on a technical reason (other than the normal mantra about risk & cost),

I don't think risk is what matters - in the end it's about cost, and cost is a very technical reason.

Could we give @reuk some positive feedback about what he needs to change in this review to get this accepted? He has the other boxes checked about public style etc..

It may not be your preference but its not without some merit for those of us who like to fine tune our style, if that can be done with minimal impact to other styles and is well covered by tests, then can we at least have some discussion first

I'm fine having a discussion - my main question is why this matters. This seems a lot more arbitrary than other things we have.

@reuk please lets continue to fix this revision so it builds as I want to pull it into

https://github.com/mydeveloperday/clang-experimental

As an aside, Wouldn't it be great if JUCE was a public style so you could just say

BasedOnStyle: JUCE

Where that style defines what JUCE uses? I've always found this odd, I've never understood why we only have Google,WebKit,Mozilla,LLVM, I would have thought it would have been nice to have other big guns Microsoft,Qt generally its only going to be a single function setting the defaults.

I actually think a change like that would really help people so they don't have to work out all the individual styles, it would help keep .clang-format files small and neat and not full of a bunch of different options.

Even if JUCE was based on another style say "Google" the code should really be saying

if (Style.isGoogleDerivedStyle())

......

and not

Style == FormatStyle::Google

When its specific to JUCE it would say

if (Style.isJUCEDerivedStyle())

......

I know when using LLVM, that being able to have a .clang-format file that just says

BasedOnStyle: LLVM

is really nice, I don't have to go work out what the LLVM style is, I just get what the project defines, I think it would be great that anything that passes the public style test for submissions should really be able to add that kind of configuration.

reuk added a comment.Mar 18 2019, 2:14 AM

@klimek I agree that the rule is somewhat arbitrary. However, it's the style rule of an established codebase with many users (I don't have a concrete number, but the project has 1400 stars on github). I've found this patch useful when contributing to JUCE and I thought others might too.

I don't think risk is what matters - in the end it's about cost, and cost is a very technical reason

I'm really sorry I'm not trying to be difficult its just over the years I keep seeing this being said in reviews? but what is the cost? I assume you don't mean a financial cost? could you define cost as a problem so we can address it? what would it take to make cost not an issue?

This revision is now accepted and ready to land.Mar 19 2019, 5:24 AM

@klimek I agree that the rule is somewhat arbitrary. However, it's the style rule of an established codebase with many users (I don't have a concrete number, but the project has 1400 stars on github). I've found this patch useful when contributing to JUCE and I thought others might too.

Oh wow, wasn't aware how popular this is. Sigh. I'd like us to push them to change their style guide, but I guess that's not going to fly.

I don't think risk is what matters - in the end it's about cost, and cost is a very technical reason

I'm really sorry I'm not trying to be difficult its just over the years I keep seeing this being said in reviews? but what is the cost? I assume you don't mean a financial cost? could you define cost as a problem so we can address it? what would it take to make cost not an issue?

The cost is financial, as it's developer time, which costs real money to companies. In the end, to support this, people like myself who are doing this as part of their job spend time that they'd otherwise spend to make other things better that might be more important. To make cost not an issue we'd need to change clang-format to make changes like this significantly simpler to review & maintain. I'll add a code comment in a moment, given that this is probably something that we'll allow given it fulfills the criteria we've set ourselves.

klimek added inline comments.Mar 20 2019, 3:14 AM
lib/Format/TokenAnnotator.cpp
2546–2560 ↗(On Diff #191050)

I'm really confused about the changes os parens. It seems like this should just change the spaceRequiredBeforeParens(...), but instead seems to change parens in a way that completely changes the logic? (or alternatively is phabricator showing this wrong?)

Oh wow, wasn't aware how popular this is. Sigh. I'd like us to push them to change their style guide, but I guess that's not going to fly.

IDK. If the criteria is "public coding standard for widely used project", supporting all such things may be unsustainable. Some projects may need to fork or find another formatter. I speak as someone that maintains an internal fork with features that I do not deem worthy of foisting on trunk.
(And BTW @MyDeveloperDay, whoever you are, Thanks for all of your recent activity!)

reuk marked an inline comment as done.Mar 20 2019, 6:47 AM
reuk added inline comments.
lib/Format/TokenAnnotator.cpp
2546–2560 ↗(On Diff #191050)

I think this looks like a bigger change than it really is because I added an open-parens on line 2548, which then affected the formatting of the following lines. I also added the isSimpleTypeSpecifier check, so that functional-style casts (int (foo)) are given the correct spacing.

klimek added inline comments.Mar 20 2019, 7:08 AM
lib/Format/TokenAnnotator.cpp
2546–2560 ↗(On Diff #191050)

a) why did you add the one in 2548? you didn't change any of the logic, right?
b) there are 3 extra closing parens in 2560, I see one more new opening in 2556, but that also seems superfluous?
One question is whether we should pull this apart into bools with names that make sense :)

The cost is financial, as it's developer time, which costs real money to companies. In the end, to support this, people like myself who are doing this as part of their job spend time that they'd otherwise spend to make other things better that might be more important.

Don't get me wrong I totally appreciate what you do,

But what you mean is it costs your employer. That I understand, but not all of us are doing this on behalf of a company (more specially not yours), so you could also say that your employer benefits the other way from those contributors who give their time without them having to spend a dime.

I guess my question would be, should the cost to your employer really come into the decision about whether something goes in or not? Other than of course we are totally grateful to them for giving us so much of your time, but that shouldn't have impact on what is worthy to go in should it? or am I wrong?

lib/Format/TokenAnnotator.cpp
2546–2560 ↗(On Diff #191050)

Isn't this just a classic example of where rewriting this as a series of static functions e.g.

static bool someBehavior(Line, Left)
{
     if  (Line.Type == LT_ObjCDecl)
          return true;
     if  (Left.is(tok::semi)
          return true;
    ......

    return false;         
}

would be so much easier to understand?

reuk updated this revision to Diff 191560.Mar 20 2019, 12:37 PM

Removed unnecessary parens.

The cost is financial, as it's developer time, which costs real money to companies. In the end, to support this, people like myself who are doing this as part of their job spend time that they'd otherwise spend to make other things better that might be more important.

Don't get me wrong I totally appreciate what you do,

But what you mean is it costs your employer. That I understand, but not all of us are doing this on behalf of a company (more specially not yours), so you could also say that your employer benefits the other way from those contributors who give their time without them having to spend a dime.

This is not a good argument.
It only considers the one-time cost of adding one particular thing.
It does not consider the cost it incurs on the codebase.

There are likely always ways to implement the same external side-effects
in a way that has near-zero new cost, and ways to implement it in a way
that completely shatters any hope for any further changes.
It is vitally important to avoid the latter.

I guess my question would be, should the cost to your employer really come into the decision about whether something goes in or not? Other than of course we are totally grateful to them for giving us so much of your time, but that shouldn't have impact on what is worthy to go in should it? or am I wrong?

@lebedev.ri Are we talking about a general ideology of the long term cost to allow any new things in? or to not allow things in this specific case?

because in this specific case all the changes are based on what is really a single clause that was already there before, namely

Style.SpaceBeforeParens == FormatStyle::SBPO_Always

Now that clause is abstracted away to be (the abstraction in my view being a nice part of this change, becuase it avoided duplication and made it easier to reason about!)

Style.SpaceBeforeParens == FormatStyle::SBPO_Always  ||
(Style.SpaceBeforeParens == FormatStyle::SBPO_NonEmptyParentheses && Right.ParameterCount > 0);

i.e. always add a space, or add a space only when the number of parameters is > 0 and I don't want a space on empty parameters. (simple enough!)

So whilst I understand the arguments of code complexity, I feel it is sometimes used to discourage, what is in at least my view seemingly reasonable requests. (not my choice but reasonable all the same)

This was why I asked what was meant by "cost", but you raise the other cost, and that is one that is more legitimate... or is it?. have we seen it here as a problem or do we just fear it?

Having come from almost nothing to understanding a certain amount of of the code recently in a relatively short space of time (enough to at least contribute a few bug fixes), I'm confident that the "up to speed" costs are actually currently relatively low, and as the Format Library is only 20,000 lines and Format.cpp being only ~2500, I don't consider that to be large enough for the code to have come intractable already.

Of course I understand when someone starts asking for an Option e.g.

AllowSpaceAfterSecondCommaOfArgumentListButNotBeforeAnyOfTheOthers: true

We may have gone too far ;-), but I almost think that perhaps those cases might be more limited than we think in the end.

From my recent experiences one nod to complexity was from the C++ language itself.. when trying to fix a bug in isFunctionDeclarationName() e,g,

A foo(abc);

does the mean a function foo() taking an argument abc and returning A or an instance of class A called foo being constructed with an argument abc.. This can have all the difference if you want to put a BraceAfterReturnType

The "cost" as you describe it here, over our ability to reason about the code in my view is more likely to come from having to analyze snippets of languages without the full parser to reason with.

@lebedev.ri Are we talking about a general ideology of the long term cost to allow any new things in? or to not allow things in this specific case?

because in this specific case all the changes are based on what is really a single clause that was already there before, namely

Style.SpaceBeforeParens == FormatStyle::SBPO_Always

Now that clause is abstracted away to be (the abstraction in my view being a nice part of this change, becuase it avoided duplication and made it easier to reason about!)

Style.SpaceBeforeParens == FormatStyle::SBPO_Always  ||
(Style.SpaceBeforeParens == FormatStyle::SBPO_NonEmptyParentheses && Right.ParameterCount > 0);

i.e. always add a space, or add a space only when the number of parameters is > 0 and I don't want a space on empty parameters. (simple enough!)

So whilst I understand the arguments of code complexity, I feel it is sometimes used to discourage, what is in at least my view seemingly reasonable requests. (not my choice but reasonable all the same)

This was why I asked what was meant by "cost", but you raise the other cost, and that is one that is more legitimate... or is it?. have we seen it here as a problem or do we just fear it?

This adds another condition to an already complex "if" that I think literally nobody fully understands, and I think it does so for very questionable value (why does it matter whether there is a space diff between the arg and no-arg case?).
Each change has a cost:

  1. cost of implementing
  2. cost of review & rework in review
  3. cost of making the code harder to understand for further changes
  4. cost of maintenance if the code is reworked, this feature needs to be kept working, thus slowing down new features

We *have* to offset this against the value of a feature, especially in a bikeshed coloring tool like clang-format.

Having come from almost nothing to understanding a certain amount of of the code recently in a relatively short space of time (enough to at least contribute a few bug fixes), I'm confident that the "up to speed" costs are actually currently relatively low, and as the Format Library is only 20,000 lines and Format.cpp being only ~2500, I don't consider that to be large enough for the code to have come intractable already.

I am working on a patch (for over a year now :( to generalize macro configuration instead of having special cases that don't really work well, and it's much harder than it should be, as the architecture is getting in my way.

Of course I understand when someone starts asking for an Option e.g.

AllowSpaceAfterSecondCommaOfArgumentListButNotBeforeAnyOfTheOthers: true

We may have gone too far ;-), but I almost think that perhaps those cases might be more limited than we think in the end.

From my recent experiences one nod to complexity was from the C++ language itself.. when trying to fix a bug in isFunctionDeclarationName() e,g,

A foo(abc);

does the mean a function foo() taking an argument abc and returning A or an instance of class A called foo being constructed with an argument abc.. This can have all the difference if you want to put a BraceAfterReturnType

The "cost" as you describe it here, over our ability to reason about the code in my view is more likely to come from having to analyze snippets of languages without the full parser to reason with.

Sure, that's where large part of the system complexity comes from, as tackling this is not easy.

lib/Format/TokenAnnotator.cpp
2546–2560 ↗(On Diff #191050)

Agreed, but I think even a:
bool someBehavior = (A || B) && (C || D);
would help :)

That said, parens look good now.

klimek accepted this revision.Mar 25 2019, 3:41 AM

Oh, and LG :)

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2019, 5:31 AM