Like most readability rules, it isn't absolute and there is a matter of taste
to it. I think more recent part of the project may be more consistent in the
current application of the guideline. I suspect sources like
mlir/lib/Dialect/StandardOps/IR/Ops.cpp may be examples of this at the moment.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
A few small nits, but I don't sufficiently care enough about the rule, so long as there IS one (hence my reason for opening this can of worms). I added @lattner since he requested that he be added to changes to the coding standards in the past.
llvm/docs/CodingStandards.rst | ||
---|---|---|
1582 | used when _a_ single-statement body | |
1611 | end in a period? | |
1624 | I'll note that this example has been VERY inconsistently enforced. Typically I've seen the rule in if/else chains be; "Once you START using braces, use them for the rest". | |
1628 | Same here, end in a full stop? |
Address minor comments
llvm/docs/CodingStandards.rst | ||
---|---|---|
1624 | I don't mind much either way on this, I'm trying to make it simpler here. Also there is rarely vertical noise going from: if (...) ... else { ... } to if (...) { ... } else { ... } (only when adding the { after the if goes over 80 chars (same for } in case of else if) |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1575–1586 | To reduce ambiguity as to whether "everywhere" refers to using braces whenever possible within the bodies or just around the bodies themselves, I suggest: | |
1576 | s/Some examples/The examples/; | |
1579 | s/presence/the presence/; | |
1580 | s/when/that is/; | |
1607 | s/move/moved/; | |
1609 | The comment still refers to itself as being for an "else case". It would help if the comment better matches an example of a comment that cannot be moved above the if. A plausible scenario would be one where there is a long comment above the if that explains the need to check for a condition in order to do something special and a long comment within the if that explains the specific implementation. | |
1648 | s/more/there are more/; | |
1658 | Should there be an else here? Otherwise, I think this example is very similar to the one that demonstrates a "potential dangling else situation". |
This is looking directionally great, please ping me after hubert.reinterpretcast's comments are reviewed and resolved. Thanks!
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 | and after the for. Also, I'd recommend using size_t instead of 'int', and != instead of <. |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 | Why the size_t here? I thought best practice was rather to write for loops with int? |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1581 | Typo fix: s/the is/that is/; |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 | size_t is technically correct and the best type to use. LLVM historically has used 'unsigned' for most counted loops (this was my fault) which isn't great for induction variable optimizations on 64-bit machines. "int" is better for 64-bit machines because UB-on-overflow permits the important transformations, but this is technically incorrect and there is no obvious reason to prefer it. |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 |
An unsigned index might be relevant for 16-bit computers. When we come to 32-bit and 64-bit era, ability to index more than half of the memory seems pretty irrelevant... Defined modulo 2 behavior impedes optimization and breeds bugs like for (size_t i = 0; i < a.size() - 1; ++i). Using int is fine, though I acknowledge that unsigned has been used a lot in LLVM's code base. C++20 even introduced std::ssize to make signed indexes more of "first-class" (I believe it is http://wg21.link/p1227r2 ) |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 |
How so? This contradicts most of what I read in "recent" C++ talk / publications, I summarized it here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/132890.html |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 | size_t is technically the correct type to use in all ways for an induction variable that starts at 0 and counts up. If you'd like to discuss this in depth, please (re) raise the discussion on llvm-dev. I can see some theoretical arguments for ssize_t, but int is inferior in every way I can see. Thanks! |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 |
It seems to me that you're repeating yourself here, I'd be happy to be convinced but right now it does not seem that you're really providing me with actual information? The thread didn't build consensus one way or another, but most of the contributors voting for "unsigned" in the thread seemed to me to express a "personal preference" and I have yet to see a good technical explanation why size_t would be the obvious correct choice? |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 |
Since I work on binary utilities/ELF a lot with James & Jake (who favored "unsigned"), I can probably comment on their preference. I agree that in ELF it is common to use an unsigned type: it is mostly because the binary formats use unsigned types a lot: Elf32_Section. In that area, an unsigned index does make sense. For many other loops in other areas where there is no strong preference for an unsigned index, to break the tie, a signed type does work better ((1) optimization (2) gotcha of algebra near zero). Zachary's comment seems to be more about concern about using an unsigned index in an area where it does not fit in. However, as a "breaking the tie" rule, I have a strong +1 to preferring unsigned. Anyhow, the updated revision sidesteps the potentially controversial topic by using for (int i : llvm::seq<int>(count)) |
llvm/docs/CodingStandards.rst | ||
---|---|---|
1657 |
Sorry, typo, I of course mean a strong +1 to preferring a signed type :) (unsigned is problematic and many C++ standard committee members consider it "wrong". Many other languages (e.g. Swift) use Int as well :) ) |
I'm not sure what the contention here is. int is a clearly inferior choice and is just as unprincipled as unsigned is, we should use the correct type.
I get the sense that you're arguing for ssize_t or one of its analogues instead of size_t? Are you arguing for general principles or this specific case? In this specific case, using size_t seems completely reasonable, but I agree that ssize_t would also be fine.
I'm ok with either size_t or ssize_t here, but if you want to turn this into a general principle that we apply to the code base, then it should be discussed on llvm-dev first.
ssize_t is inferior. POSIX.1-2017 says "The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}]." The implementation may not even support -2.
We could also use intptr_t to avoid sign extension on a 64-bit machine.
It isn't clear to me, as I mention in my previous comment, I'd be happy to get more educated on this if you have resources you could point me to that would explain your point?
So far I based my reasoning on http://lists.llvm.org/pipermail/llvm-dev/2019-June/132890.html
(and Chandler here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133023.html )
Well sure, but C and POSIX are designed to support non-2's complement and other weird machines. LLVM wouldn't run in general on a machine that weird.
I'm sorry I didn't unpack that. int is inferior on standard LP64 machines because it doesn't express anywhere near the range of subscripts required by some large loops / array subscripts. While it is surely fine in some cases, it is not a 'safe default' that we could advice be used everywhere.
-Chris
s/Some examples/The examples/;