Page MenuHomePhabricator

[CodingStandards] Add style guide rule about "if" statements and loops.
AbandonedPublic

Authored by jlebar on Nov 21 2016, 3:17 PM.

Details

Reviewers
MatzeB
rnk
Summary

It seems that most LLVM code omits braces on, and only on, single-line
"if" and loop bodies.

Event Timeline

jlebar updated this revision to Diff 78795.Nov 21 2016, 3:17 PM
jlebar retitled this revision from to [CodingStandards] Add style guide rule about "if" statements and loops..
jlebar updated this object.
jlebar added reviewers: rnk, MatzeB.
jlebar added subscribers: llvm-commits, arsenm.
rnk edited edge metadata.Nov 21 2016, 3:23 PM

I feel like this should live down near "Don’t use else after a return". Seems kind of weird to put this before an admonition to, you know, indent code at all. We should probably delete that too to make the document more concise.

MatzeB accepted this revision.Nov 21 2016, 3:23 PM
MatzeB edited edge metadata.

LGTM (though I would leave this open for a while until after the thanksgiving break to hear different opinions).

I would add an example. Maybe:

if (SomethingIsOdd)
  return;

if (CoolInstruction *CI = dyn_cast<CoolInstruction>(I)) {
  if (CI.IsBaz())
    ++BazInstructions;
}
This revision is now accepted and ready to land.Nov 21 2016, 3:23 PM

And this example:

if (I.IsEven()) {
  HandleEven(I);
} else {
  HandleOdd(I):
}

to make it clear we want braces for if else.

And this example:

if (I.IsEven()) {
  HandleEven(I);
} else {
  HandleOdd(I):
}

to make it clear we want braces for if else.

Or rather:

if (I.isEven()) {
  handleEven(I);
} else {
  handleOdd(I):
}

to be in line with function name conventions.

I feel like this should live down near "Don’t use else after a return"

I put it here because the overall section header is "Mechanical Source Issues". But now I see that e.g. "Spaces Before Parentheses" is down at the bottom of the doc, so...I now have no idea how this is structured.

to make it clear we want braces for if else.

We do? Elsewhere in the style guide they have unbraced if/else, e.g. following "or better yet (in this case)".

rnk added a comment.Nov 21 2016, 3:31 PM

I feel like this should live down near "Don’t use else after a return"

I put it here because the overall section header is "Mechanical Source Issues". But now I see that e.g. "Spaces Before Parentheses" is down at the bottom of the doc, so...I now have no idea how this is structured.

to make it clear we want braces for if else.

We do? Elsewhere in the style guide they have unbraced if/else, e.g. following "or better yet (in this case)".

There's definitely prior art in LLVM for unbraced else if chains.

Personally, I think I would prefer that we add braces only if one of the bodies of the code path requires it:

OK:

if (first)
  if (second)
    third();

!OK:

if (first)
  if (second) {
    third();
    fourth();
  }

OK:

if (first) {
  if (second) {
    third();
    fourth();
  }
}

I feel like this should live down near "Don’t use else after a return"

I put it here because the overall section header is "Mechanical Source Issues". But now I see that e.g. "Spaces Before Parentheses" is down at the bottom of the doc, so...I now have no idea how this is structured.

to make it clear we want braces for if else.

We do? Elsewhere in the style guide they have unbraced if/else, e.g. following "or better yet (in this case)".

Ok, I grepped around llvm source and the majority of cases is indeed without braces. Then add this to the examples:

if (I.isEven())
  handleEven(I);
else
  handleOdd(I);

Personally, I think I would prefer that we add braces only if one of the bodies of the code path requires it:

Just for the sake of discussion (I personally would prefer braces). This would result in this in the extreme case:

if (foo)
  for (Baz b : bazes)
    if (b)
      ++BCount;
    else
      ++NotBCount;
else
  foobar();

Personally, I think I would prefer that we add braces only if one of the bodies of the code path requires it:

Just for the sake of discussion (I personally would prefer braces). This would result in this in the extreme case:

if (foo)
  for (Baz b : bazes)
    if (b)
      ++BCount;
    else
      ++NotBCount;
else
  foobar();

+1 for braces

We do? Elsewhere in the style guide they have unbraced if/else, e.g. following "or better yet (in this case)".

Ok, I grepped around llvm source and the majority of cases is indeed without braces. Then add this to the examples:

if (I.isEven())
  handleEven(I);
else
  handleOdd(I);

I'm OK with all this. But can you clarify for these:

if (I.isEven()) {
  handleEven(I);
  handleEven(I);
} else    // Should we open a brace here?
  handleOdd(I);
if (I.isEven())     // Should we open a brace here?
  handleEven(I); 
else {
  handleOdd(I);
  handleOdd(I);
}

I remember @majnemer asking for bracing both path when one is braced.

rnk added a comment.Nov 23 2016, 9:00 AM

Although, for really simple cases it's probably fine to omit them:

if (auto Foo = bar())
  for (auto Baz : Foo.stuff())
    bazIt(Baz);

These types of constructs are one of my favorite parts of idiomatic LLVM code. I'd hate to lose them.

I'm sympathetic to this idiom, but I can't come up with an easy way to write a rule to allow it:

for (Foo *f : foos)
  if (f->isSomething())
    return true;
return false;

I also don't like it when it gets to three blocks.

for (Foo *f : foos)
  if (auto *b = dyn_cast<Bar>(f->getBar()))
    if (b->isBaz())
      return true;
return false;

It's too clever.

I'd rather have an easy rule of thumb that requires this:

for (Foo *f : foos) {
  if (f->isSomething())
    return true;
}
return false;

I've also seen this before, and I want to make sure I don't see it again:

for (Foo *f : foos)
  if (f->isSomething()) {
    stmt1();
    stmt2();
  }
In D26943#604404, @rnk wrote:

I'm sympathetic to this idiom, but I can't come up with an easy way to write a rule to allow it:

Does it need to be codified in full? I think that putting more braces should be allowed and the coding guidelines should dictate where they can be omitted.

(1) If a nested statement has braces, the enclosing statement should have braces.
(2) If the immediate nesting has only one indented line, braces can be omitted, unless (1) applies.

The intent of (2) is to require braces in cases like this:

for (auto a : Items)
  // Calculate foo for each item.
  a.foo();

Does it need to be codified in full?

I think we stand to benefit greatly from having an unambiguous statement of the rules, yes.

The cost of not having such a statement is that each reviewer has their own set of rules in their head, and so we waste human time during reviews with reviewers explaining their mental model and asking reviewees to change their code. And that's in the best case, where the reviewees don't argue back.

The cost of not having such a statement is that each reviewer has their own set of rules in their head,

That's a problem with reviewers.

The guidelines should be pretty clear: readability, and consistency. Regulating every little thing is unnecessary.

The cost of not having such a statement is that each reviewer has their own set of rules in their head,

That's a problem with reviewers.

Not only, this is part of the "consistency" you mentioned in your second sentence.

The guidelines should be pretty clear: readability, and consistency. Regulating every little thing is unnecessary.

As a few others have indicated, I really don't like the proposed rule as written. I very much want to continue to write:

if (...)
  for (...)
    ...;

and

if (...)
  for (...) {
    ...;
    ...;
  }

I don't think there is any interesting ambiguity in these cases, and the braces seem to add clutter with little value.

I'd like a rule that strikes at the heart of the issue: use braces when they make the structure more clear. But I understand that this is subjective and we really should have a rule that is relatively mechanical for consistency's sake and gets close enough to the spirit of this to be a good fit.

Here is my attempt:


Omit braces when there is no ambiguity due to nesting and else statements.

Common cases of this are flagged with compiler warnings:

if (...)
  if (...)
    ...;
  else // warning: ambiguous else!
    ...;

Even if the ambiguity happens to dodge the compiler warnings or other checking mechanism, you should still use braces:

if (...)
  for (...)
    if (...)
      ...;
    else // Still bad!
      ...;

I like this (or something like this) formulation because:

  1. It doesn't talk about number of lines. Do comments count as lines? what if the *condition* is multiple lines but the statement isn't? I don't like using line count here.
  1. It precisely matches what compilers are trying to warn on already.
  1. It gives grounds for a reviewer to argue that a highly confusing pattern, even if it doesn't literally result in ambiguity, is still sufficiently opaque as to require curlies. We could add some examples here if this actually comes up.

The primary alternative I see is tweaking the above to also require braces on all surrounding constructs once they are required for one. From discussions with Richard Smith, I think this is more common in Clang's code whereas the formulation I suggested is more common in LLVM code.

The reason I (quite strongly) prefer the formulation I proposed is in no small part because clang-format has completely removed the failure mode where I mis-indent things. As a consequence, I prefer the more minimal & brief approach to braces. I feel like it balances clarity and brevity well.

If we can't agree on a single consistent style between these two alternatives, then perhaps we end up saying both are acceptable and letting code reviewers choose, but I would generally prefer having clear guidance here.

My two cents.
-Chandler

Not only, this is part of the "consistency" you mentioned in your second sentence.

I meant consistency with the surrounding code---that is much less subjective.

If we can't agree on a single consistent style between these two alternatives, then perhaps we end up saying both are acceptable and letting code reviewers choose, but I would generally prefer having clear guidance here.

I agree with this, except that I'd say "letting the authors choose". There is more than one way to make code fit in with the rest and be readable.

Not only, this is part of the "consistency" you mentioned in your second sentence.

I meant consistency with the surrounding code---that is much less subjective.

Then I have a strong disagreement on this:

  1. clang-format totally changed the way I deal with formatting, and having rules that can be taught to an automatic tool removes any subjectivity and discussion about the formatting.
  2. What makes the code "clear" and "readable" beyond the various tastes of contributors, *is* consistency across the codebase. Less brain power used to adapt to different styles is a non-negligible productivity gain to me.

Then I have a strong disagreement on this:

  1. clang-format totally changed the way I deal with formatting, and having rules that can be taught to an automatic tool removes any subjectivity and discussion about the formatting.
  2. What makes the code "clear" and "readable" beyond the various tastes of contributors, *is* consistency across the codebase. Less brain power used to adapt to different styles is a non-negligible productivity gain to me.

My concern about automatic formatting is that it may damage context-specific formatting. Consider this example:

unsigned FlagAB = FlagAlpha | FlagBravo;
unsigned FlagAC = FlagAlpha             | FlagCharlie;
unsigned FlagBC =             FlagBravo | FlagCharlie;

Here the spacing is intended to separate the individual flags in columns, and doing do may improve readability of this piece of code, which could otherwise appear cluttered. Automatic formatting could remove all of it, since it is not aware of the author's intent. I know this is a trivial example, but I hope that it illustrates the concept.

In my experience, readability is affected to different degrees by different formatting elements. Some factors are highly important (like spacing, e.g (((x-1)|(x+1)*(x-2))+x)+(x-1)*(x+2) vs (((x-1) | (x+1)*(x-2)) + x) + (x-1)*(x+2), which makes a big difference in visual parsing), others are less so. While I agree with the insistence on the important ones, I prefer to be lax about the others. Small local differences are a natural consequence of having a large number of contributors and the evolution of the formatting guidelines themselves. If at one point we decided that single-line if/for/while statements should also be bracketed, would we require that in a new code added to a file that consistently didn't have them? If so, it would be inconsistent with the existing code, if not, it would contravene the guidelines. Sweeping changes throughout the code base are a real pain for out-of-tree developers, so hopefully that wouldn't be considered. IMO, it's best to allow a degree of flexibility and just live with it.

My concern about automatic formatting is that it may damage context-specific formatting. Consider this example:
[...]

For specific cases (formatting table, etc.), you can disable clang-format by marking the begin/end of the section with a comment, see for example https://github.com/llvm-mirror/lldb/blob/master/source/Commands/CommandObjectType.cpp#L101

If at one point we decided that single-line if/for/while statements should also be bracketed, would we require that in a new code added to a file that consistently didn't have them? If so, it would be inconsistent with the existing code, if not, it would contravene the guidelines.

I *always* clang-format the statement I change in my patch, irrespectively of the rest of the file. Hopefully with time we'll touch enough lines that non-conformant code will slowly disappear.
I've also seen people clang-formatting a file as a whole as a pre-step before performing large changes to it.

Sweeping changes throughout the code base are a real pain for out-of-tree developers, so hopefully that wouldn't be considered.

We are quite consistent about not limiting ourselves upstream to "please" out-of-tree developers, I wouldn't take this into consideration here either. (See also for example the recent massive clang-formatting change in LLDB, where they reformatted the full codebase in a single commit.)

jlebar abandoned this revision.Feb 11 2020, 7:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2020, 7:31 AM