This is an archive of the discontinued LLVM Phabricator instance.

[clang] adds warning to alert user when they use alternative tokens in declarations
AbandonedPublic

Authored by cjdb on Aug 2 2021, 10:03 AM.

Details

Reviewers
aaron.ballman
Summary

Although the alternative tokens can be used in place of symbols outside
of bitwise/logical expressions, this is arguably feature misuse in
contemporary programming (e.g. int bitand does not communicate that
the type is a reference-to-int). As of this commit, Clang will warn—by
default—when a programmer attempts to declare something using
alternative tokens in place of symbols.

Diff Detail

Unit TestsFailed

Event Timeline

cjdb requested review of this revision.Aug 2 2021, 10:03 AM
cjdb created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 10:03 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cjdb added inline comments.Aug 2 2021, 10:16 AM
clang/test/Parser/warn-declare-references-with-symbols.cpp
3

I'm pretty confident there's a lit way for me to test specific lines with specific C++ modes, but I can't find it :(

aaron.ballman added inline comments.Aug 2 2021, 10:45 AM
clang/include/clang/Basic/DiagnosticParseKinds.td
1552

Pretty sure you don't need DefaultWarn here (I think you'd use that for warnings that are off by default, like ones spelled with Extension instead of Warning or ExtWarn).

clang/lib/Parse/ParseDecl.cpp
5862–5864

For consistency with the Lvalue case above.

clang/test/Parser/warn-declare-references-with-symbols.cpp
3

Are you thinking of -verify=prefix and // prefix-warning {{whatever}} combinations?

cjdb added inline comments.Aug 2 2021, 11:18 AM
clang/lib/Parse/ParseDecl.cpp
5862–5864

I originally had Rvalue, but this is technically diagnosing both rvalue refs and forwarding refs (hence the awkward name). I guess it doesn't really matter at the parsing stage though, so this is just commentary.

clang/test/Parser/warn-declare-references-with-symbols.cpp
3

I thought it was possible to do have a RUN line where C++98 mode would only check lines 4-17, for example.

aaron.ballman added inline comments.Aug 2 2021, 11:26 AM
clang/lib/Parse/ParseDecl.cpp
5862–5864

Oh, good point on forwarding refs. Maybe change the previous one to Amp in that case?

clang/test/Parser/warn-declare-references-with-symbols.cpp
3

Ah, I usually use #if __cplusplus < NNNNNN in the code for that sort of thing.

cjdb updated this revision to Diff 363546.Aug 2 2021, 12:31 PM
cjdb marked 5 inline comments as done.
cjdb edited the summary of this revision. (Show Details)

applies feedback

clang/test/Parser/warn-declare-references-with-symbols.cpp
3

Thanks!

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

cjdb added a comment.EditedAug 2 2021, 1:25 PM

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling). The motivation for this warning is essentially to catch someone who misunderstands/misreads the diagnostic of what's currently in D107294.

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

The motivation for this warning is essentially to catch someone who misunderstands/misreads the diagnostic of what's currently in D107294.

Ah, OK - I see some of the usual objections are being voiced over there so I'll leave it over there/subscribe there to follow that discussion. Thanks!

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

Large OSS? No, sorry. range-v3 used to use them, but it was having issues integrating with Folly (I think) and @eric_niebler had to abandon using them.

@dblaikie wrote:

Not a huge objection - but minor quandry: What's the motivation for this patch?

AIUI, this diagnostic is a PR stunt, similar to Clang's existing -Wunicode-homoglyph:

test.cpp:3:13: warning: treating Unicode character <U+037E> as identifier character rather than as ';' symbol [-Wunicode-homoglyph]
    return 0;
            ^

It's not that anyone would ever intentionally write code like void print(const std::string bitand s); it's that if someone copy-pastes such code from Reddit into Clang, Clang will helpfully explain the trick to them. I'm ambivalent as to whether this is worth it; but the patch is certainly tiny enough that it's not obviously not worth it...

clang/include/clang/Basic/DiagnosticParseKinds.td
1552

I suggest:
"use '%select{&|&&}0' when declaring an %select{lvalue|rvalue or forwarding}0 reference"
Besides the minor grammar tweak, this should avoid the need to pass the same argument twice (as both argument 0 and argument 1). It's possible that this ends up not working for some technical reason, but I'd be surprised if it didn't Just Work.

clang/lib/Parse/ParseDecl.cpp
5862–5864

In both cases, I'd write

Diag(Loc, diag::warn_declare_references_with_symbols)
            << (Kind == tok::ampamp);

or

<< static_cast<int>(Kind == tok::ampamp);

only if absolutely necessary (e.g. because the former doesn't build, or if it runtime-errors).

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

To me, the motivation comes from a signature like: void foo(int and); because it looks reasonable until you realize in C++ it means void foo(int &&); rather than declaring a parameter name of type int. What's worse, it can come up naturally in header files because it's a valid declaration in C if iso646.h is not included. So it's plausible to write the signature in a library exposing a C interface that changes the function signature when compiled in C++ -- this warning helps alert the programmer to such a situation.

I'm not aware of any popular coding guidelines that suggest using alternative tokens, but their use does show up in the wild.

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=bitand&search=Search

(Searching for 'and' is a bit less useful because of how much it shows up in assembly, comments, etc.)

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=bitand&search=Search

(Searching for 'and' is a bit less useful because of how much it shows up in assembly, comments, etc.)

Ah, cool. Only case I could find there (that wasn't C code or compiler test cases) was something called FuzzyLite (which looks like it hasn't been touched in 4 years or so).

I don't fundamentally object to this now it's being proposed as a clang-tidy thing - bar should be low/easy for experiments, getting user experience, adoption, etc.

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=bitand&search=Search

(Searching for 'and' is a bit less useful because of how much it shows up in assembly, comments, etc.)

Ah, cool. Only case I could find there (that wasn't C code or compiler test cases) was something called FuzzyLite (which looks like it hasn't been touched in 4 years or so).

I suspect that bitand is used less than and because I believe bitwise operations are less common than logical ones.

I don't fundamentally object to this now it's being proposed as a clang-tidy thing - bar should be low/easy for experiments, getting user experience, adoption, etc.

FWIW, my feeling is that *this* proposed patch is reasonable for Clang but the proposed patch for suggesting use of alternative tokens (or not) belongs in clang-tidy. This patch proposes a diagnostic that catches bugs and can be on by default without false positives, the other patch proposed diagnostics that were more about coding style.

Not a huge objection - but minor quandry: What's the motivation for this patch? I don't know of any codebase that encourages/uses the alternative tokens and I wonder if adding more usability to them is a worthwhile investment in clang's codebase complexity, etc.

There are codebases that use them (all of my non-Google, non-LLVM code does, for example, and I'm not the sole user: just a loud one who's also in a position to patch tooling).

Ah, any pointers to large open source projects that use this?

https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=bitand&search=Search

(Searching for 'and' is a bit less useful because of how much it shows up in assembly, comments, etc.)

Ah, cool. Only case I could find there (that wasn't C code or compiler test cases) was something called FuzzyLite (which looks like it hasn't been touched in 4 years or so).

I suspect that bitand is used less than and because I believe bitwise operations are less common than logical ones.

I don't fundamentally object to this now it's being proposed as a clang-tidy thing - bar should be low/easy for experiments, getting user experience, adoption, etc.

FWIW, my feeling is that *this* proposed patch is reasonable for Clang but the proposed patch for suggesting use of alternative tokens (or not) belongs in clang-tidy. This patch proposes a diagnostic that catches bugs and can be on by default without false positives, the other patch proposed diagnostics that were more about coding style.

Oh, sorry, got the two patches jumbled up. Right right.

Any data on a large codebase about whether this finds enough bugs* to be interesting? But yeah, I don't mind it terribly - seems low cost enough that I won't stand in the way of it if you're in favor.

  • for some value of bug - that's been a point of debate for a while, turning on warnings like parentheses by default is an example. If we have this on by default, for codebases where it fires at all, are we creating busywork for people to go and cleanup code that's weird, but wasn't actively harmful/likely to be causing real problems for them? If we are, then that feels like it's more a stylistic thing than a bug-finding warning.
cjdb added a comment.Aug 3 2021, 9:49 AM

<truncating previous conversation>

Yes, the intention is for this patch to remain a Clang warning since it's cheap and can be on by default.

rsmith added a subscriber: rsmith.Aug 3 2021, 10:58 AM

We should also warn on:

  • and or bitand used as a ref-qualifier in a member function declaration
  • xor used to declare a block pointer type or block pointer literal
  • bitand used as address-of and and used to form a GNU address-of-label
  • compl used to name a destructor

Of these, I think it's somewhat important to handle ref-qualifiers and probably block pointer types at around the same time that this patch lands (though it doesn't need to be part of this patch) for consistency among "type operators".

clang/include/clang/Basic/DiagnosticParseKinds.td
1551

Drop the "and forwarding" part; a forwarding reference is a kind of rvalue reference, and distinguishing them here will lead to confusion.

cjdb added a comment.EditedAug 3 2021, 12:41 PM

We should also warn on:

  • and or bitand used as a ref-qualifier in a member function declaration
  • xor used to declare a block pointer type or block pointer literal
  • bitand used as address-of and and used to form a GNU address-of-label
  • compl used to name a destructor

Of these, I think it's somewhat important to handle ref-qualifiers and probably block pointer types at around the same time that this patch lands (though it doesn't need to be part of this patch) for consistency among "type operators".

I'm happy to work on these patches too. I think this patch should concern itself with declarators, and a second patch should be for expressions. This does drastically expand the scope of the warning, so we should probably come up with a new name for it.

Edit:

To make it clearer, I think the breakdown of the two patches I have in mind reads as

# Patch 1: declarators

T bitand
T and
T xor
void C::f() bitand
void C::f() and
C::compl C

# Patch 2: expressions

xor {}
bitand x // warning will suggest std::addressof in C++ land
and label
cjdb updated this revision to Diff 363927.Aug 3 2021, 6:18 PM
cjdb retitled this revision from [clang] adds warning to alert user when they use alternative tokens as references to [clang] adds warning to alert user when they use alternative tokens in declarations.
cjdb edited the summary of this revision. (Show Details)

extends functionality to account for destructors, member functions, and block pointers

Patch 2: expressions

xor {}
bitand x // warning will suggest std::addressof in C++ land
and label

An additional expression to cover, not that I think anyone would be this awful by accident, is: foo->compl Foo(); // Pseudo-destructor call

One question I have about both declarations and expressions are whether we have an appetite to diagnose overloaded operators or not. Personally, I think it'd be reasonable to diagnose something like foo->operator bitand(); or operator not_eq(A, B); as expressions, but not reasonable to diagnose the declaration of the overloaded operators using alternative tokens.

clang/lib/Parse/ParseDecl.cpp
5830–5832

We typically either use comments for one-off uses, or we hoist this sort of thing to an enumeration associated with the diagnostic if we think the diagnostic will be used in enough places to warrant it. I think we should probably stick to comments for this one (e.g., Diag(Loc, diag::whatever) << /*frobble*/1;, but if you think a hoisted enum is a better approach I won't argue it.

(Same comment applies elsewhere in the patch.)

cjdb added a comment.Aug 12 2021, 10:39 PM

Patch 2: expressions

xor {}
bitand x // warning will suggest std::addressof in C++ land
and label

An additional expression to cover, not that I think anyone would be this awful by accident, is: foo->compl Foo(); // Pseudo-destructor call

Nice catch!

One question I have about both declarations and expressions are whether we have an appetite to diagnose overloaded operators or not. Personally, I think it'd be reasonable to diagnose something like foo->operator bitand(); or operator not_eq(A, B); as expressions, but not reasonable to diagnose the declaration of the overloaded operators using alternative tokens.

I agree that bool operator and(T, T); shouldn't be diagnosed on (and this patch's clang-tidy sibling will one day also diagnose that, but it's way off).

I think that foo->operator bitand() and operator not_eq(expr1, expr2) should only diagnose if foo->operator&() and operator!=(expr1, expr2) are diagnosed, and I think that should be a separate warning (I'm not saying that's a good or bad thing to do yet: let me sleep on that). I might be misunderstanding your intention though.

One question I have about both declarations and expressions are whether we have an appetite to diagnose overloaded operators or not. Personally, I think it'd be reasonable to diagnose something like foo->operator bitand(); or operator not_eq(A, B); as expressions, but not reasonable to diagnose the declaration of the overloaded operators using alternative tokens.

I agree that bool operator and(T, T); shouldn't be diagnosed on (and this patch's clang-tidy sibling will one day also diagnose that, but it's way off).

I think that foo->operator bitand() and operator not_eq(expr1, expr2) should only diagnose if foo->operator&() and operator!=(expr1, expr2) are diagnosed, and I think that should be a separate warning (I'm not saying that's a good or bad thing to do yet: let me sleep on that). I might be misunderstanding your intention though.

The situation I was thinking of was where the the declaration is for an operator&() function but the expression is calling operator bitand() (or vice versa), but 1) I don't feel strongly it needs to be diagnosed, mostly just exploring the shape of the diagnostic, and 2) I'd be fine if it was handled under a separate flag at some later date.

Quuxplusone added inline comments.Sep 4 2021, 7:27 AM
clang/include/clang/Basic/DiagnosticParseKinds.td
1553

@cjdb: I suggest splitting this diagnostic up, mainly for sanity's sake (it currently has 4x2x4 = 32 possible outputs, which is Too Much) but also to improve greppability (it took me five tries to find the source of the message "when declaring destructors" in this PR). I would do one diagnostic "use '%{&|&&}0' when declaring %{lvalue|rvalue}1 %{references|ref-qualified member functions}"; another diagnostic "use '~' when declaring destructors"; and a third "use '^' when declaring block pointers".

@aaron.ballman wrote:

One question I have about both declarations and expressions are whether we have an appetite to diagnose overloaded operators or not. Personally, I think it'd be reasonable to diagnose something like foo->operator bitand(); or operator not_eq(A, B); as expressions, but not reasonable to diagnose the declaration of the overloaded operators using alternative tokens.

AIUI, @cjdb is deferring all diagnostics related to expressions into a later PR, and I think that's reasonable. There's a subtlety to your example I can't tell if you intended: foo->operator bitand() is clearly wrong because the unary operator & being invoked there is not bitand; it's address-of! foo->operator bitand() will presumably fall into the same category as foo->compl Foo(). Whereas there's nothing so wrong with foo->operator not_eq(bar) or operator not_eq(foo, bar).

However, all that has at least two adjacent issues that I know are on @cjdb's radar: (1) auto transformed = foo bitor std::views::transform(bar) is "wrong", and in fact (2) any use of alternative tokens other than and, or, not is unstylish. If you diagnose all usage of bitand, bitor, compl, etc., then you don't get any of these subtle problems, because &&/||/! only have one meaning each (except for rvalue references, which is handled in this PR; and GCC-extension &&label, which should be easy to catch).

cjdb abandoned this revision.Apr 20 2023, 3:00 PM

I think there was a clang-tidy patch that handled this.

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 3:00 PM