This is an archive of the discontinued LLVM Phabricator instance.

Clarify a bit the guideline on omitting braces, including more examples (NFC)
ClosedPublic

Authored by mehdi_amini on Jun 25 2020, 1:11 PM.

Details

Summary

Like most readability rules, it isn't absolute and there is a matter of taste
to it. I think more recent part of the project may be more consistent in the
current application of the guideline. I suspect sources like
mlir/lib/Dialect/StandardOps/IR/Ops.cpp may be examples of this at the moment.

Diff Detail

Event Timeline

mehdi_amini created this revision.Jun 25 2020, 1:11 PM

A few small nits, but I don't sufficiently care enough about the rule, so long as there IS one (hence my reason for opening this can of worms). I added @lattner since he requested that he be added to changes to the coding standards in the past.

llvm/docs/CodingStandards.rst
1582

used when _a_ single-statement body
?

1611

end in a period?

1624

I'll note that this example has been VERY inconsistently enforced. Typically I've seen the rule in if/else chains be; "Once you START using braces, use them for the rest".

1628

Same here, end in a full stop?

mehdi_amini marked 2 inline comments as done.

Address minor comments

llvm/docs/CodingStandards.rst
1624

I don't mind much either way on this, I'm trying to make it simpler here. Also there is rarely vertical noise going from:

if (...)
  ...
else {
  ...
}

to

if (...) {
  ...
} else {
  ...
}

(only when adding the { after the if goes over 80 chars (same for } in case of else if)

MaskRay added inline comments.
llvm/docs/CodingStandards.rst
1617

Add a space after for. ditto below for at least two other instances.

1657

Add a space after if

hubert.reinterpretcast added inline comments.
llvm/docs/CodingStandards.rst
1575–1586

To reduce ambiguity as to whether "everywhere" refers to using braces whenever possible within the bodies or just around the bodies themselves, I suggest:
[ ... ] chain does not consistently use braced bodies for either all or none of its members, [ ... ]

1576

s/Some examples/The examples/;

1579

s/presence/the presence/;

1580

s/when/that is/;

1607

s/move/moved/;

1609

The comment still refers to itself as being for an "else case". It would help if the comment better matches an example of a comment that cannot be moved above the if. A plausible scenario would be one where there is a long comment above the if that explains the need to check for a condition in order to do something special and a long comment within the if that explains the specific implementation.

1648

s/more/there are more/;

1658

Should there be an else here? Otherwise, I think this example is very similar to the one that demonstrates a "potential dangling else situation".

This is looking directionally great, please ping me after hubert.reinterpretcast's comments are reviewed and resolved. Thanks!

llvm/docs/CodingStandards.rst
1657

and after the for. Also, I'd recommend using size_t instead of 'int', and != instead of <.

mehdi_amini marked 12 inline comments as done.Jul 1 2020, 8:33 PM
mehdi_amini added inline comments.
llvm/docs/CodingStandards.rst
1657

Why the size_t here? I thought best practice was rather to write for loops with int?

Address formatting and other minor comments (no major change)

slight tweak on the for loop

llvm/docs/CodingStandards.rst
1581

Typo fix: s/the is/that is/;

lattner added inline comments.Jul 1 2020, 10:02 PM
llvm/docs/CodingStandards.rst
1657

size_t is technically correct and the best type to use. LLVM historically has used 'unsigned' for most counted loops (this was my fault) which isn't great for induction variable optimizations on 64-bit machines. "int" is better for 64-bit machines because UB-on-overflow permits the important transformations, but this is technically incorrect and there is no obvious reason to prefer it.

MaskRay added inline comments.Jul 1 2020, 10:57 PM
llvm/docs/CodingStandards.rst
1657

"int" is better for 64-bit machines because UB-on-overflow permits the important transformations, but this is technically incorrect and there is no obvious reason to prefer it.

An unsigned index might be relevant for 16-bit computers. When we come to 32-bit and 64-bit era, ability to index more than half of the memory seems pretty irrelevant... Defined modulo 2 behavior impedes optimization and breeds bugs like for (size_t i = 0; i < a.size() - 1; ++i). Using int is fine, though I acknowledge that unsigned has been used a lot in LLVM's code base. C++20 even introduced std::ssize to make signed indexes more of "first-class" (I believe it is http://wg21.link/p1227r2 )

mehdi_amini marked an inline comment as done.Jul 2 2020, 9:20 AM
mehdi_amini added inline comments.
llvm/docs/CodingStandards.rst
1657

size_t is technically correct and the best type to use.

How so? This contradicts most of what I read in "recent" C++ talk / publications, I summarized it here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/132890.html

lattner added inline comments.Jul 2 2020, 9:21 PM
llvm/docs/CodingStandards.rst
1657

size_t is technically the correct type to use in all ways for an induction variable that starts at 0 and counts up. If you'd like to discuss this in depth, please (re) raise the discussion on llvm-dev. I can see some theoretical arguments for ssize_t, but int is inferior in every way I can see.

Thanks!

mehdi_amini marked an inline comment as done.Jul 2 2020, 10:28 PM
mehdi_amini added inline comments.
llvm/docs/CodingStandards.rst
1657

size_t is technically the correct type to use in all ways for an induction variable that starts at 0 and counts up.

It seems to me that you're repeating yourself here, I'd be happy to be convinced but right now it does not seem that you're really providing me with actual information?
I actually *already* brought up the topic on llvm-dev, as I linked in my previous comment, did you miss it?
My llvm-dev email is sourced and includes lengthy explanations, and Chandler also motivated it here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133023.html

The thread didn't build consensus one way or another, but most of the contributors voting for "unsigned" in the thread seemed to me to express a "personal preference" and I have yet to see a good technical explanation why size_t would be the obvious correct choice?

s/the is/that is/

mehdi_amini marked an inline comment as done.Jul 2 2020, 10:30 PM
MaskRay added inline comments.Jul 3 2020, 10:44 AM
llvm/docs/CodingStandards.rst
1657

The thread didn't build consensus one way or another, but most of the contributors voting for "unsigned" in the thread seemed to me to express a "personal preference" and I have yet to see a good technical explanation why size_t would be the obvious correct choice?

Since I work on binary utilities/ELF a lot with James & Jake (who favored "unsigned"), I can probably comment on their preference. I agree that in ELF it is common to use an unsigned type: it is mostly because the binary formats use unsigned types a lot: Elf32_Section. In that area, an unsigned index does make sense. For many other loops in other areas where there is no strong preference for an unsigned index, to break the tie, a signed type does work better ((1) optimization (2) gotcha of algebra near zero). Zachary's comment seems to be more about concern about using an unsigned index in an area where it does not fit in. However, as a "breaking the tie" rule, I have a strong +1 to preferring unsigned.

Anyhow, the updated revision sidesteps the potentially controversial topic by using for (int i : llvm::seq<int>(count))

MaskRay added inline comments.Jul 3 2020, 10:54 AM
llvm/docs/CodingStandards.rst
1657

However, as a "breaking the tie" rule, I have a strong +1 to preferring unsigned.

Sorry, typo, I of course mean a strong +1 to preferring a signed type :) (unsigned is problematic and many C++ standard committee members consider it "wrong". Many other languages (e.g. Swift) use Int as well :) )

I'm not sure what the contention here is. int is a clearly inferior choice and is just as unprincipled as unsigned is, we should use the correct type.

I get the sense that you're arguing for ssize_t or one of its analogues instead of size_t? Are you arguing for general principles or this specific case? In this specific case, using size_t seems completely reasonable, but I agree that ssize_t would also be fine.

I'm ok with either size_t or ssize_t here, but if you want to turn this into a general principle that we apply to the code base, then it should be discussed on llvm-dev first.

I'm not sure what the contention here is. int is a clearly inferior choice and is just as unprincipled as unsigned is, we should use the correct type.

I get the sense that you're arguing for ssize_t or one of its analogues instead of size_t? Are you arguing for general principles or this specific case? In this specific case, using size_t seems completely reasonable, but I agree that ssize_t would also be fine.

I'm ok with either size_t or ssize_t here, but if you want to turn this into a general principle that we apply to the code base, then it should be discussed on llvm-dev first.

ssize_t is inferior. POSIX.1-2017 says "The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}]." The implementation may not even support -2.

We could also use intptr_t to avoid sign extension on a 64-bit machine.

mehdi_amini added a comment.EditedJul 5 2020, 3:09 PM

int is a clearly inferior choice

It isn't clear to me, as I mention in my previous comment, I'd be happy to get more educated on this if you have resources you could point me to that would explain your point?

So far I based my reasoning on http://lists.llvm.org/pipermail/llvm-dev/2019-June/132890.html
(and Chandler here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133023.html )

I'm not sure what the contention here is. int is a clearly inferior choice and is just as unprincipled as unsigned is, we should use the correct type.

I get the sense that you're arguing for ssize_t or one of its analogues instead of size_t? Are you arguing for general principles or this specific case? In this specific case, using size_t seems completely reasonable, but I agree that ssize_t would also be fine.

I'm ok with either size_t or ssize_t here, but if you want to turn this into a general principle that we apply to the code base, then it should be discussed on llvm-dev first.

ssize_t is inferior. POSIX.1-2017 says "The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}]." The implementation may not even support -2.

We could also use intptr_t to avoid sign extension on a 64-bit machine.

Well sure, but C and POSIX are designed to support non-2's complement and other weird machines. LLVM wouldn't run in general on a machine that weird.

int is a clearly inferior choice

It isn't clear to me, as I mention in my previous comment, I'd be happy to get more educated on this if you have resources you could point me to that would explain your point?

I'm sorry I didn't unpack that. int is inferior on standard LP64 machines because it doesn't express anywhere near the range of subscripts required by some large loops / array subscripts. While it is surely fine in some cases, it is not a 'safe default' that we could advice be used everywhere.

-Chris

Use ssize_t instead of int in the example

Add an else in the last example

mehdi_amini marked 3 inline comments as done.Jul 14 2020, 2:01 PM

int is a clearly inferior choice

It isn't clear to me, as I mention in my previous comment, I'd be happy to get more educated on this if you have resources you could point me to that would explain your point?

I'm sorry I didn't unpack that. int is inferior on standard LP64 machines because it doesn't express anywhere near the range of subscripts required by some large loops / array subscripts. While it is surely fine in some cases, it is not a 'safe default' that we could advice be used everywhere.

Thanks for clarifying! (and sorry for the delay, too many other things in flight)

lattner accepted this revision.Jul 14 2020, 4:03 PM

This looks great to me, thanks!

This revision is now accepted and ready to land.Jul 14 2020, 4:03 PM
This revision was automatically updated to reflect the committed changes.