This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] adds warning to suggest users replace symbols with words
AbandonedPublic

Authored by PiotrZSL on Aug 2 2021, 10:14 AM.

Details

Summary

C++'s alternative tokens are a good way to improve readability and
minimise bugs by expressing intention using words instead of symbols
(which can both be more easily misspelt _and_ misread).

This warning analyses the binary logical and bitwise operators and warns
if they aren't spelt using the alternative tokens. Bitwise compound
assignment and != operators are not in scope since there's nothing to
misspel or misread.

Future work:

  • add a way for the compiler to detect when the pipe operator (|) is not being used for a bitwise-or operation
  • add preprocessor support
  • add config options

Diff Detail

Event Timeline

cjdb requested review of this revision.Aug 2 2021, 10:14 AM
cjdb created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 10:14 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

The usual community viewpoint on off-by-default diagnostics is that they're generally low value and not something that users will manually enable. Because I don't think this will ever be reasonable to enable by default due to it inherently being a matter of style, I think this functionality is better handled by clang-tidy as a check in the readability module. I would note that it should also be supported in C (C99 and later) by also inserting #include <iso646.h> using the usual fix-it utilities.

cjdb added a comment.Aug 2 2021, 11:11 AM

The usual community viewpoint on off-by-default diagnostics is that they're generally low value and not something that users will manually enable.

I was under the impression many warnings weren't enabled by default?

I think this functionality is better handled by clang-tidy

Ack. I've not touched clang-tidy yet; how different do you expect the code to look to what I've got right now?

The usual community viewpoint on off-by-default diagnostics is that they're generally low value and not something that users will manually enable.

I was under the impression many warnings weren't enabled by default?

We certainly have some! My understanding is that many of those were added before we realized most people don't actually enable off-by-default warnings, so it's only "recently" (within the past 5-10 years) that we've pushed back more consistently on off-by-default diagnostics (and even then, we still allow them sometimes; -Wfallthrough is a good example of that).

I think this functionality is better handled by clang-tidy

Ack. I've not touched clang-tidy yet; how different do you expect the code to look to what I've got right now?

Because you're doing this at the AST node level, I would expect it to be pretty easy. If you want to do it within preprocessor expressions (or other places where the information isn't in the AST node), it's still doable but might be a bit more complex.

cjdb updated this revision to Diff 363638.Aug 3 2021, 12:18 AM
cjdb retitled this revision from [clang] adds warning to suggest users replace symbols with alt tokens to [clang-tidy] adds warning to suggest users replace symbols with words.
cjdb edited the summary of this revision. (Show Details)

moves patch from Clang to clang-tidy

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 3 2021, 12:18 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.h
5–6

This seems to be very old code, there was a licence change approx. 2 years ago.

20–22

But also there is a convention for the documentation comment for the main class of checks, and the code here should adhere to that.

clang-tools-extra/docs/clang-tidy/checks/readability-use-alternative-tokens.rst
14

What does this refer to? Perhaps a code example and a clearly indicated Limitations heading would be more clear to indicate whether someone would want to enable this check.

21–27

Either use plural xor singular for all of these printouts, to keep consistency.

cjdb added inline comments.Aug 3 2021, 11:36 AM
clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.h
5–6

Yikes, that's what I get for not reading what I'm copying. Should there be an NFC to go and update the other checks' licences too?

20–22

Not sure I'm following you here: are you suggesting I put the contents of my rst file in a comment here?

clang-tools-extra/docs/clang-tidy/checks/readability-use-alternative-tokens.rst
21–27

negation is the plural in this case.

cjdb updated this revision to Diff 363831.Aug 3 2021, 11:57 AM
cjdb edited the summary of this revision. (Show Details)
  • updates licences
  • updates documentation
cjdb updated this revision to Diff 363850.Aug 3 2021, 1:01 PM

removes obsolete code

aaron.ballman added inline comments.Aug 4 2021, 5:57 AM
clang-tools-extra/clang-tidy/readability/ReadabilityTidyModule.cpp
133–134

I think this might be a case where we want the check to either recommend using alternative tokens or recommend against using alternative tokens via a configuration option (so users can control readability in either direction). If you agree that's a reasonable design, then I'd recommend we name this readability-alternative-tokens to be a bit more generic. (Note, we can always do the "don't use alternative tokens" implementation in a follow-up patch if you don't want to do that work immediately.)

clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.cpp
54–55
clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.h
5–6

Should there be an NFC to go and update the other checks' licences too?

I think that'd be useful!

26

We can support C for this as well, can't we? iso646.h has been around since C99.

Quuxplusone added inline comments.
clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.cpp
42–43

~ and ! are not binary operations, right?

clang-tools-extra/docs/clang-tidy/checks/readability-use-alternative-tokens.rst
7–8

I strongly recommend splitting this check in two parts:

  • Use and, or, not for logical operations.
  • Use bitand, bitor, xor, compl for non-logical (bitwise) operations.

The reason I recommend this is that I have seen people (never in a real codebase, mind you, but at least bloggers and such) recommend and/or/not for logical operations; this is a long-standing convention in other languages such as Python and Perl. Whereas I have never seen anyone recommend e.g. (x bitand compl mask) over (x & ~mask). So coupling the two ideas together seems counterproductive, because even if someone wanted to use the Python style in their codebase, they wouldn't be able to enforce it using this check, because this check would be full of false positives related to the bitwise operators.
The current check's recommendation to replace (a || b) => (a or b), (a ^ b) => (a xor b) is particularly pernicious to non-language-lawyers, who might assume that or and xor represented the same "part of speech" — either both bitwise or both logical. I think this is a major motivation for the Pythonic style: use English for logical and or not, and use symbols for mathematical & | ^ ~ << >> &= |= ~= <<= >>=.

50–53

"Program composition" is not a term of art (except in the extremely general sense of composing programs, i.e. programming). I recommend retitling this section "Use of | as a pipe" or "Use of | with C++20 Ranges".

It might be worth adding a brief mention that the same is true of any minigame/DSL embedded within C++; for example, this clang-tidy check will also recommend changes like

qi::rule<std::string::iterator, boost::variant<int, bool>(),
    ascii::space_type> value = qi::int_ | qi::bool_;
                                        ^
warning: use 'bitor' for bitwise disjunctions

and

std::fstream file("hello.txt", std::ios::in | std::ios::out);
                                            ^
warning: use 'bitor' for bitwise disjunctions

which are probably not desirable for the user-programmer.

59

s/logical/bitwise/

cjdb updated this revision to Diff 364307.Aug 4 2021, 6:10 PM
cjdb marked 5 inline comments as done.
cjdb edited the summary of this revision. (Show Details)
  • renames check to 'readability-alternative-tokens'
  • adds section in documentation about configurability
  • adds support for C source
  • corrects whitespace handling in fixits
cjdb added inline comments.Aug 4 2021, 6:12 PM
clang-tools-extra/clang-tidy/readability/ReadabilityTidyModule.cpp
133–134

Hrmm.... I'll do the rename now, but this might be better off as a later patch. I'd rather focus on getting what I have right (along with my teased extensions) and then work on the opposite direction. That will (probably) be an easy slip-in.

clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.cpp
42–43

This confused me too. binaryOperation also takes into account all overloaded ops.

54–55

Thanks! This was one of the blockers for getting me C support.

clang-tools-extra/clang-tidy/readability/UseAlternativeTokensCheck.h
26

I was having difficulty in my C test for this, wimped out, and put it into the "I'll do it later" basket. Lemme double check that's the case?

clang-tools-extra/docs/clang-tidy/checks/readability-use-alternative-tokens.rst
7–8

As agreed with Aaron above, I'll be making this configurable at a later date.

50–53

I think "program composition" is used enough in contemporary programming for readers to understand the text as it is.

whisperity added inline comments.Aug 5 2021, 6:29 AM
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
61 ↗(On Diff #364307)

What's N? It's not immediately apparent for me from the callsites of this function.

70–71 ↗(On Diff #364307)

Are we sure these will never return an invalid Loc, or dereference a null? Maybe it'd be worth to investigate, add an assertion (to help the people who might run static analysis on LLVM, if for nothing else), or an early return.

90–91 ↗(On Diff #364307)

Same comment as in the UnaryOperator case.

124–125 ↗(On Diff #364307)

Ditto. (Usually there might be issues if the location is coming from generated code or templates or macros or something... I fear this could be mostly prevalent in the user-defined operator case, i.e. this method.)

clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.h
20–21 ↗(On Diff #364307)

Following from gone thread due to file rename.

Not sure I'm following you here: are you suggesting I put the contents of my rst file in a comment here?

Not the entire RST, but the one-sentence or first-paragraph "pitch". For example, let's see bugprone-branch-clone's class's doc-comment:

/// A check for detecting if/else if/else chains where two or more branches are
/// Type I clones of each other (that is, they contain identical code), for
/// detecting switch statements where two or more consecutive branches are
/// Type I clones of each other, and for detecting conditional operators where
/// the true and false expressions are Type I clones of each other.
///
/// For the user-facing documentation see:
/// http://clang.llvm.org/extra/clang-tidy/checks/bugprone-branch-clone.html
class BranchCloneCheck : public ClangTidyCheck {

Or another one selected randomly, performance-no-automatic-move:

/// Finds local variables that cannot be automatically moved due to constness.
///
/// For the user-facing documentation see:
/// http://clang.llvm.org/extra/clang-tidy/checks/performance-no-automatic-move.html
class NoAutomaticMoveCheck : public ClangTidyCheck {

So there is a one-paragraph summary of the check itself (it could be shorter than here...), and there is a text that's generated from a template (I think add-new-check.py sets the new check's files as such when you run it), which basically just links the upstream official website render of your check's documentation HTML.

clang-tools-extra/clang-tidy/readability/ReadabilityTidyModule.cpp
133–134

(As someone who's had checkers on review for multiple years I've no hard feelings about the scheduling.)

But please do add a few FIXME: or TODO: or IDEA: or some similar comments to the code somewhere about the suggested follow-ups. (Just so they don't go away when this review is closed.)

clang-tools-extra/docs/clang-tidy/checks/readability-alternative-tokens.rst
6–7 ↗(On Diff #364307)

I think (for now, until the check is improved/extended) this first paragraph / oneliner added to the doccomment in the header shall suffice.

It's only to ensure that the automatically generated documentation for the check's class has a comment from which documentation readers can look into what the check actually does (from a user's standpoint), by reaching this file.

aaron.ballman added inline comments.Aug 5 2021, 8:19 AM
clang-tools-extra/clang-tidy/readability/ReadabilityTidyModule.cpp
133–134

Yeah, I wasn't trying to sign you up for the implementation effort, just making sure the name is somewhat more future-proofed. Adding a FIXME comment to the code to explain the potential next steps would be a good thing.

cjdb updated this revision to Diff 364555.Aug 5 2021, 11:05 AM
cjdb marked 10 inline comments as done.
  • adds TODOs
  • adds asserts
  • removes "imported" SourceManager (apparently there was already a pointer in Result)
  • fixes C++ documentation to match opening line of RST doc
cjdb marked an inline comment as done.Aug 5 2021, 11:07 AM
cjdb added inline comments.
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.h
20–21 ↗(On Diff #364307)

Thanks for the comprehensive examples!

aaron.ballman added inline comments.Aug 11 2021, 8:50 AM
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
51–54 ↗(On Diff #364555)
69 ↗(On Diff #364555)

Hrmm, I feel like std::isspace() is not going to properly handle all the Unicode whitespace, but I don't think we handle them all anyway as I notice the lexer is using isHorizontalWhitespace(), isVerticalWhitespace(), and isWhitespace() from CharInfo.h. Maybe we should use the same utility here just to match the lexer? (I don't feel strongly, just tickled my spidey senses.)

78–79 ↗(On Diff #364555)

Are you planning to do this as part of this patch (if not, should this switch to FIXME)?

85 ↗(On Diff #364555)

isLetter()? (or potentially isIdentifierHead() if we need to care about dollar-sign prefixed identifiers)

104 ↗(On Diff #364555)

Same here as above.

111 ↗(On Diff #364555)

Same here as above.

117–120 ↗(On Diff #364555)

I think these can be replaced with a helper function along these lines:

void emitDiag(const SourceManager &SM, SourceLocation Loc, StringRef Op, StringRef OpDesc, int Lookahead) {
  diag(Loc, ("use '" + Op + "' for " + OpDesc).str()) <<
    createReplacement(SM, Loc, Op, Lookahead) << includeIso646(SM, Loc);
}

WDYT?

144–148 ↗(On Diff #364555)

FIXMEs?

158–164 ↗(On Diff #364555)

Nit: given that these are only used once, I'd rather just spell out the case labels longhand.

clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.h
35 ↗(On Diff #364555)
clang-tools-extra/clang-tidy/readability/StrictConstCorrectness.h
9 ↗(On Diff #364555)

Did you mean to include this file in this review?

clang-tools-extra/clang-tidy/readability/StrictConstCorrectness.kpp
1 ↗(On Diff #364555)

Same for this one?

clang-tools-extra/docs/clang-tidy/checks/readability-alternative-tokens.rst
4 ↗(On Diff #364555)
clang/test/Analysis/diagnostics/readability-strict-const-correctness.cpp
1 ↗(On Diff #364555)

No idea how this managed to trigger a CI failure of line 1: fg: no job control :-D

cor3ntin added inline comments.Aug 12 2021, 11:09 AM
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
69 ↗(On Diff #364555)

I just saw this comment randomly in my mail.
I agree we should not use isspace - it's also probably less efficient because of locale. And we don't want lexing to depend on the host locale.
And makes it easier to extend the set of supported whitespace if we ever do thatt

cjdb added inline comments.Aug 12 2021, 10:31 PM
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
85 ↗(On Diff #364555)

Is this necessary in the given context? The reason I was confident in using isalpha is because we know this is an operator and the only values that this could possibly be are one of a, b, c, n, o, or x.

(cc @cor3ntin)

clang-tools-extra/clang-tidy/readability/StrictConstCorrectness.h
9 ↗(On Diff #364555)

Lol nope. My bad.

clang/test/Analysis/diagnostics/readability-strict-const-correctness.cpp
1 ↗(On Diff #364555)

How did these files even get into this branch!? Sorry for the noise.

cor3ntin added inline comments.Aug 12 2021, 11:40 PM
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
85 ↗(On Diff #364555)

Definitively.
Both because trying to make assumption based on the current set of leading symbols in alternative token seems brittle to me.
But mostly because isalpha answers "is this byte interpreted in the encoding associated to the current locale a letter". in some scenario (clang running on zOS for example) this will never give the right answer,

I agree with Aaron: use isIdentifierHead()

But maybe I'd go further: Have you considered first checking for the opcode first, then the spelling?

MTC added a subscriber: MTC.Aug 16 2021, 6:40 PM
PiotrZSL commandeered this revision.Mar 31 2023, 9:09 AM
PiotrZSL added a reviewer: cjdb.
Herald added a project: Restricted Project. · View Herald Transcript
PiotrZSL abandoned this revision.Mar 31 2023, 9:09 AM

Obsolete by D144522

The for Pipe/BitwiseOr issue. One heuristic could be:

  • If it has any kind of template dependence don't diagnost
  • If its a BinaryOperator, its safe to diagnose.
  • If its a CxxOperatorCallExpr, the simplest case I can think of is to check the spelling of the operator (operator| vs operator bitor)
clang-tools-extra/clang-tidy/readability/AlternativeTokensCheck.cpp
69 ↗(On Diff #364555)

Is it not just wiser to always add the spaces and just let clang-format do its thing.

117–120 ↗(On Diff #364555)
diag("use '%0' for %1") << Op << OpDesc << Fixes;