This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Extend -Wself-assign and -Wself-assign-field to warn on overloaded self-assignment (classes)
ClosedPublic

Authored by lebedev.ri on Mar 25 2018, 3:33 PM.

Details

Summary

This has just bit me, so i though it would be nice to avoid that next time :)
Motivational case:

https://godbolt.org/g/cq9UNk

Basically, it's likely to happen if you don't like shadowing issues,
and use -Wshadow and friends. And it won't be diagnosed by clang.

The reason is, these self-assign diagnostics only work for builtin assignment
operators. Which makes sense, one could have a very special operator=,
that does something unusual in case of self-assignment,
so it may make sense to not warn on that.

But while it may be intentional in some cases, it may be a bug in other cases,
so it would be really great to have some diagnostic about it...

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Mar 25 2018, 3:33 PM
Quuxplusone added inline comments.
lib/Sema/SemaExpr.cpp
12087 ↗(On Diff #139750)

I understand why x &= x and x |= x are mathematically special for the built-in types, but surely x -= x and x ^= x and x /= x are just as likely to indicate programmer error. I would be happy if Clang either took the philosophical stance "We will diagnose x = x but uniformly never x op= x," or else took the pragmatic stance "We will diagnose any x op= x or x op x that seems likely to be a programming bug." This "middle way" of warning only for &= and |= is bothersome to me.

I'm not sure you really need to put these in their own warning sub-group just because they're user-defined operators. That's especially true because it appears we already have divisions in the warning group based on the form of the l-value; we don't want this to go combinatorial.

lib/Sema/SemaExpr.cpp
12087 ↗(On Diff #139750)

I think "we want to diagnose anything that seems likely to be a programming bug" is already our policy here. It's inevitable that we'll overlook examples of that. I agree that we should apply this warning to at least -=, ^=, and /=.

I'm not sure you really need to put these in their own warning sub-group just because they're user-defined operators. That's especially true because it appears we already have divisions in the warning group based on the form of the l-value; we don't want this to go combinatorial.

Several reasons:

  • The initial -Wself-assign was intentionally implemented not to warn on overloaded operators. https://github.com/llvm-mirror/clang/commit/9f7a6eeee441bcbb1b17208cb3abd65a0017525a#diff-e0deb7b32f28507a3044a6bf9a63b515R31 (rL122804)
  • While it is an obvious bug when self-operation happens with builtin operators, i'm less certain of that with overloaded operators. If you happen to be routinely using self-assignment via oh-so-very-special overloaded operator=, and you don't like to have this diagnostic, you could just disable it, and not loose the coverage of the -Wself-assign-builtin. If it is all in one group, you can't do that...
  • Based on previous expirience, separate diag groups are good, see e.g D37620, D37629
  • I'm failing to find the original quote, but i think @rsmith said something along the "diag groups are cheap, use them". But i may as well be mis-remembering/having false memories here, sorry.

TLDR: if you insist, sure, i can just cram it into the already-existing -Wself-assign,
but i believe that is the opposite of what should be done, and is against the way it was done previously.

lib/Sema/SemaExpr.cpp
12087 ↗(On Diff #139750)

Ok, will extend.

lebedev.ri marked 3 inline comments as done.
  • Also warn on BO_DivAssign, BO_SubAssign, BO_XorAssign.
  • Add releasenotes entry.

I'm not sure you really need to put these in their own warning sub-group just because they're user-defined operators. That's especially true because it appears we already have divisions in the warning group based on the form of the l-value; we don't want this to go combinatorial.

Several reasons:

That's interesting. Maybe Chandler remembers the rationale for that?

  • While it is an obvious bug when self-operation happens with builtin operators, i'm less certain of that with overloaded operators.

It's certainly more plausible that the programmer has a good reason to self-assign with a user-defined operator, yeah.

If you happen to be routinely using self-assignment via oh-so-very-special overloaded operator=, and you don't like to have this
diagnostic, you could just disable it, and not loose the coverage of the `-Wself-assign-builtin`.
If it is all in one group, you can't do that...

True.

  • Based on previous expirience, separate diag groups are good, see e.g D37620, D37629
  • I'm failing to find the original quote, but i think @rsmith said something along the "diag groups are cheap, use them". But i may as well be mis-remembering/having false memories here, sorry.

Yeah, like I said, I'm just worried about the usability of having multiple dimensions of warning group here. Like, do we need both -Wself-assign-field-builtin and -Wself-assign-field-overloaded? Because I do think we should warn on field self-assignments for user-defined operators.

I think the term ought to be "user-defined" rather than "overloaded", by the way. In a sense, these operators aren't really any more overloaded than the builtin operators (at least, not necessarily so) — C++ even formalizes the builtin operators as being a large family of overloads. And the reason we want to treat them specially is that they have user-provided definitions that might be doing special things, not because of anything to do with name resolution.

TLDR: if you insist, sure, i can just cram it into the already-existing -Wself-assign,
but i believe that is the opposite of what should be done, and is against the way it was done previously.

I'm not going to insist. But I would like to hear if Chandler remembers why his patch didn't warn about user-defined operators.

Two more high-level comments from me, and then I'll try to butt out.

Q1. I don't understand what field- is doing in the name of some diagnostics. Is there a situation where x = x; is "less/more of an error" just because x is a {data member, lambda capture, structured binding} versus not? Why should the "plain old variable" case be separately toggleable? (This question pre-dates Roman's proposed patch AFAICT. I think the answer is that self-assignment specifically of a data member specifically in a constructor is considered "more of an error" because it's likely to be a very particular kind of typo-bug. I doubt this answer is quite strong enough to justify multiple warning options, though.)

Q2. I don't understand why x &= x (and now x /= x, thanks) should be treated as "more of an error" than x & x (and now x / x), or for that matter x == x. The sin here is not "self-assignment" per se; it's "over-complicating a tautological mathematical operation." I admit I haven't thought about it as much as you folks, but the more I think about it, the more I think the only consistent thing for Clang to do would be to back off from warning about anything beyond plain old x = x, and leave *all* the mathematical-tautology cases to checkers like clang-tidy and PVS-Studio.

Two more high-level comments from me, and then I'll try to butt out.

Q1. I don't understand what field- is doing in the name of some diagnostics. Is there a situation where x = x; is "less/more of an error" just because x is a {data member, lambda capture, structured binding} versus not? Why should the "plain old variable" case be separately toggleable? (This question pre-dates Roman's proposed patch AFAICT. I think the answer is that self-assignment specifically of a data member specifically in a constructor is considered "more of an error" because it's likely to be a very particular kind of typo-bug. I doubt this answer is quite strong enough to justify multiple warning options, though.)

I think we should ask @thakis that - rL159394

Q2. I don't understand why x &= x (and now x /= x, thanks) should be treated as "more of an error" than x & x (and now x / x), or for that matter x == x. The sin here is not "self-assignment" per se; it's "over-complicating a tautological mathematical operation." I admit I haven't thought about it as much as you folks, but the more I think about it, the more I think the only consistent thing for Clang to do would be to back off from warning about anything beyond plain old x = x, and leave *all* the mathematical-tautology cases to checkers like clang-tidy and PVS-Studio.

(oh please no pvs-studio stuff here too, it's getting a 'bit' spammy lately)

Yeah, like I said, I'm just worried about the usability of having multiple dimensions of warning group here. Like, do we need both -Wself-assign-field-builtin and -Wself-assign-field-overloaded?

I'm not saying that is not a valid concern. I'm simply following the pre-existing practice, which is, as far i'm aware, to split the diag groups if it makes sense.

Because I do think we should warn on field self-assignments for user-defined operators.

I agree, as i said, i did not want to put everything into one bit differential.

I think the term ought to be "user-defined" rather than "overloaded", by the way. In a sense, these operators aren't really any more overloaded than the builtin operators (at least, not necessarily so) — C++ even formalizes the builtin operators as being a large family of overloads. And the reason we want to treat them specially is that they have user-provided definitions that might be doing special things, not because of anything to do with name resolution.

Can you elaborate a bit more, given this struct, which of these variants should be diagnosed as user-provided, and which as builtin?

struct S {}; // certainly builtin?
struct S {
  S() = default; // builtin?
  S &operator=(const S &) = default; // builtin?
  S &operator=(S &) = default;  // builtin?

  S &operator=(const S &); // certainly user-provided?
  S &operator=(S &); // certainly user-provided?
};

TLDR: if you insist, sure, i can just cram it into the already-existing -Wself-assign,
but i believe that is the opposite of what should be done, and is against the way it was done previously.

I'm not going to insist. But I would like to hear if Chandler remembers why his patch didn't warn about user-defined operators.

+1

Two more high-level comments from me, and then I'll try to butt out.

Q1. I don't understand what field- is doing in the name of some diagnostics. Is there a situation where x = x; is "less/more of an error" just because x is a {data member, lambda capture, structured binding} versus not? Why should the "plain old variable" case be separately toggleable? (This question pre-dates Roman's proposed patch AFAICT. I think the answer is that self-assignment specifically of a data member specifically in a constructor is considered "more of an error" because it's likely to be a very particular kind of typo-bug. I doubt this answer is quite strong enough to justify multiple warning options, though.)

I think we should ask @thakis that - rL159394

Q2. I don't understand why x &= x (and now x /= x, thanks) should be treated as "more of an error" than x & x (and now x / x), or for that matter x == x. The sin here is not "self-assignment" per se; it's "over-complicating a tautological mathematical operation." I admit I haven't thought about it as much as you folks, but the more I think about it, the more I think the only consistent thing for Clang to do would be to back off from warning about anything beyond plain old x = x, and leave *all* the mathematical-tautology cases to checkers like clang-tidy and PVS-Studio.

It's fine to warn about really obvious tautologies in the compiler. We don't need to go much further than this, though.

Yeah, like I said, I'm just worried about the usability of having multiple dimensions of warning group here. Like, do we need both -Wself-assign-field-builtin and -Wself-assign-field-overloaded?

I'm not saying that is not a valid concern. I'm simply following the pre-existing practice, which is, as far i'm aware, to split the diag groups if it makes sense.

I agree. In general, I think this would be fine; my only concern is because we do already have some splitting along a different dimension, so we do need to stop and think about it. Maybe the answer is that

Because I do think we should warn on field self-assignments for user-defined operators.

I agree, as i said, i did not want to put everything into one bit differential.

I don't think it would over-complicate this patch to handle all the cases at once.

I think the term ought to be "user-defined" rather than "overloaded", by the way. In a sense, these operators aren't really any more overloaded than the builtin operators (at least, not necessarily so) — C++ even formalizes the builtin operators as being a large family of overloads. And the reason we want to treat them specially is that they have user-provided definitions that might be doing special things, not because of anything to do with name resolution.

Can you elaborate a bit more, given this struct, which of these variants should be diagnosed as user-provided, and which as builtin?

That's an interesting question! You're absolutely right, I do think we should warn about trivial C++ assignments as part of the builtin group. If a C program includes a self-assignment that happens to be of struct type, we'll warn about that, and it seems to be that we shouldn't lose the warning just because that code is compiled as C++ instead.

As for your example, I think it should be based on whether the assignment operator selected is trivial. If it's non-trivial, even if it's defaulted, we should warn about it as a "user-defined" operator, since ultimately it does involve calling such an operator.

Yeah, like I said, I'm just worried about the usability of having multiple dimensions of warning group here. Like, do we need both -Wself-assign-field-builtin and -Wself-assign-field-overloaded?

I'm not saying that is not a valid concern. I'm simply following the pre-existing practice, which is, as far i'm aware, to split the diag groups if it makes sense.

I agree. In general, I think this would be fine; my only concern is because we do already have some splitting along a different dimension, so we do need to stop and think about it. Maybe the answer is that

Sorry, I didn't finish this thought. Maybe we should just treat -Wself-assign-field as an accident of history; we can roll it generally into -Wself-assign and just allow that specific warning to be separately enabled if they want. Or if it isn't a very old flag, maybe we can just remove it entirely.

Ok, once we understand why "field" diag is separate, and why since initial implementation it only warned for "builtin" operators, i'll work on the code further.

That's an interesting question! You're absolutely right, I do think we should warn about trivial C++ assignments as part of the builtin group. If a C program includes a self-assignment that happens to be of struct type, we'll warn about that, and it seems to be that we shouldn't lose the warning just because that code is compiled as C++ instead.

As for your example, I think it should be based on whether the assignment operator selected is trivial. If it's non-trivial, even if it's defaulted, we should warn about it as a "user-defined" operator, since ultimately it does involve calling such an operator.

To summarize:
https://godbolt.org/g/gwDASe - A, B and C case should be treated as built-in, and D, E and F as user-defined.

Is D not considered trivial there? Ugh, C++.

Well, we certainly could justify treating D as a builtin operator as well, since it doesn't involve running any user code, but I don't think you're obligated to write a whole bunch of analysis just to make that work if the information isn't already easily accessible.

I tracked Chandler down, and he doesn't remember why user-defined operators are excluded.

I tracked Chandler down, and he doesn't remember why user-defined operators are excluded.

Ok then. Unless reviewers think it would be best to have separate diag groups for "builtin" and "user-provided" binary operators,
i'll just extend the -Wself-assign / -Wself-assign-field / -Wself-move (i wonder, does it work both the fields and usual variables, or not).

(That means, stage2 self-hosting testing will be needed...)

Historically Clang's policy on warnings was, I think, much more
conservative than it seems to be today. There was a strong desire not to
implement off-by-default warnings, and to have warnings with an
exceptionally low false-positive rate - maybe the user-defined operator
detection was either assumed to, or demonstrated to, have a sufficiently
high false positive rate to not meet that high bar.

(as for the flag splitting - that was sometimes done if the new variant of
a flag had enough bug-finding power that an existing codebase using the
existing flag behavior would need significant cleanup - by having the new
functionality under another flag name, existing codebases upgrading to a
newer compiler wouldn't be forced to either do all that cleanup up-front or
disable the flag & risk regressions... )

nikola resigned from this revision.Mar 26 2018, 3:36 PM
lebedev.ri retitled this revision from [Sema] Extend -Wself-assign with -Wself-assign-overloaded to warn on overloaded self-assignment (classes) to [Sema] Extend -Wself-assign and -Wself-assign-field to warn on overloaded self-assignment (classes).
lebedev.ri edited the summary of this revision. (Show Details)
  • Dropped separate -Wself-assign-builtin and -Wself-assign-overloaded groups
  • Simply updated the existing -Wself-assign and -Wself-assign-field.
  • Significantly improved test coverage.
  • ninja check-clang passes. Did not do stage2 build yet.
  • -Wself-assign-field is rather neglected - unlike -Wself-assign, it does incorrectly warn on self-operations with volatile stores, and macros.
rjmccall added inline comments.Mar 27 2018, 11:04 AM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

I think doing this here can result in double-warning if the overload resolves to a builtin operator. Now, it might not actually be possible for that to combine with the requirements for self-assignment, but still, I think the right place to diagnose this for C++ is the same place we call DiagnoseSelfMove in CreateOverloadedBinOp.

Can CheckIdentityFieldAssignment just be integrated with DiagnoseSelfAssignment so that callers don't need to do call both?

lebedev.ri added inline comments.Mar 28 2018, 2:24 AM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

I think the right place to diagnose this for C++ is the same place we call DiagnoseSelfMove in CreateOverloadedBinOp.

switch (Opc) {
case BO_Assign:
case BO_DivAssign:
case BO_SubAssign:
case BO_AndAssign:
case BO_OrAssign:
case BO_XorAssign:
  DiagnoseSelfAssignment(Args[0], Args[1], OpLoc);
  CheckIdentityFieldAssignment(Args[0], Args[1], OpLoc);
  break;
default:
  break;
}

// Check for a self move.
if (Op == OO_Equal)
  DiagnoseSelfMove(Args[0], Args[1], OpLoc);

^ That does not appear to work. Pretty much all these tests start to fail.

rjmccall added inline comments.Mar 28 2018, 11:44 AM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

Okay. It's possible that my suggestion is wrong. Can you explain more how they fail?

lebedev.ri added inline comments.Mar 28 2018, 1:55 PM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

Right, i should have been verbose :)
There are no test changes as compared to the current diff.
Here is the output of $ ninja check-clang-sema check-clang-semacxx


It is also totally possible that i'm missing something obvious on my end...

rjmccall added inline comments.Mar 28 2018, 3:04 PM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

Oh, DiagnoseSelfAssignment disables itself during template instantiation, presumably because it's an easy syntactic check that will always warn on the parsed code and so doesn't need to warn again during instantiation.

In that case, I think the place you had the check is fine.

Will do stage2 testing next..

lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

Am i correctly reading that as "ok, keep it as it is, in BuildOverloadedBinOp()" ?

12087 ↗(On Diff #139750)

Would it make sense to also check %=?
It's a direct counterpart to /=, so i guess it would?

rjmccall added inline comments.Mar 28 2018, 3:31 PM
lib/Sema/SemaExpr.cpp
12093 ↗(On Diff #139948)

Yes, I think that's the right place for it, given that it's basically designed to only fire during parsing.

We could also just move the check (in all cases) to ActOnBinOp, which is not called by template instantiation.

Ok, tried llvm stage2 build, and it failed as expected, in code that intentionally does self-assignment:
https://github.com/llvm-mirror/llvm/blob/1aa8147054e7ba8f2b7ea25daaed804662b4c6b2/unittests/ADT/DenseMapTest.cpp#L249-L250

[2/311 1.1/sec] Building CXX object unittests/ADT/CMakeFiles/ADTTests.dir/DenseMapTest.cpp.o
FAILED: unittests/ADT/CMakeFiles/ADTTests.dir/DenseMapTest.cpp.o 
/build/llvm-build-Clang-release/bin/clang++  -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0 -DGTEST_LANG_CXX11=1 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iunittests/ADT -I/build/llvm/unittests/ADT -I/usr/include/libxml2 -Iinclude -I/build/llvm/include -I/build/llvm/utils/unittest/googletest/include -I/build/llvm/utils/unittest/googlemock/include -g0 -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -ffunction-sections -fdata-sections -g0   -UNDEBUG  -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -fno-rtti -MD -MT unittests/ADT/CMakeFiles/ADTTests.dir/DenseMapTest.cpp.o -MF unittests/ADT/CMakeFiles/ADTTests.dir/DenseMapTest.cpp.o.d -o unittests/ADT/CMakeFiles/ADTTests.dir/DenseMapTest.cpp.o -c /build/llvm/unittests/ADT/DenseMapTest.cpp
/build/llvm/unittests/ADT/DenseMapTest.cpp:250:11: error: explicitly assigning value of variable of type '(anonymous namespace)::DenseMapTest_AssignmentTest_Test::TypeParam' (aka 'gtest_TypeParam_') to itself [-Werror,-Wself-assign]
  copyMap = copyMap;
  ~~~~~~~ ^ ~~~~~~~
/build/llvm/unittests/ADT/DenseMapTest.cpp:265:11: error: explicitly assigning value of variable of type '(anonymous namespace)::DenseMapTest_AssignmentTestNotSmall_Test::TypeParam' (aka 'gtest_TypeParam_') to itself [-Werror,-Wself-assign]
  copyMap = copyMap;
  ~~~~~~~ ^ ~~~~~~~

The operator= for DenseMap/SmallDenseMap is user-provided, non-trivial:
https://github.com/llvm-mirror/llvm/blob/1aa8147054e7ba8f2b7ea25daaed804662b4c6b2/include/llvm/ADT/DenseMap.h#L693-L705
https://github.com/llvm-mirror/llvm/blob/1aa8147054e7ba8f2b7ea25daaed804662b4c6b2/include/llvm/ADT/DenseMap.h#L926-L938

So i guess this will have to be split after all.
I'll try what we have discussed (-Wself-assign, and -Wself-assign-nontrivial for non-trivial methods.)

Ok, tried llvm stage2 build, and it failed as expected, in code that intentionally does self-assignment:

Actually, this is not the sort of failure that I think should worry you. Any new warning has the ability to break -Werror builds; if I saw this failure in a build, I would totally consider it a project bug, not a false-positive.

In this case, you should just make sure there's a reasonable way to suppress the warning and then fix the code in LLVM by doing that. Explicitly casting the RHS to the right reference type should shut the compiler up.

  • Rebased
  • Created D45082 with llvm diff to prevent stage2 failure, and to disscuss the options. Similar diff will be needed at least for libc++ tests.
  • Rebased
  • Also test that *& contraption silences the warning.
rjmccall added inline comments.Mar 31 2018, 9:58 AM
docs/ReleaseNotes.rst
63 ↗(On Diff #140547)

Missing a final period. Also, these release notes serve as a kind of user guide, so I would recommend mentioning the suppression contraption here. Maybe something like "If you are doing such an assignment intentionally, e.g. in a unit test for a data structure, the warning can be suppressed by adding *& to the right-hand side or casting it to the appropriate reference type."

  • Revised release notes entry
  • Created D45128 with libcxx patch from stage-2 testing.
rjmccall accepted this revision.Mar 31 2018, 4:24 PM

Thanks. LGTM whenever you've cleared up the self-hosting problems.

This revision is now accepted and ready to land.Mar 31 2018, 4:24 PM
thakis added a comment.Apr 2 2018, 7:53 AM

Historically Clang's policy on warnings was, I think, much more
conservative than it seems to be today. There was a strong desire not to
implement off-by-default warnings, and to have warnings with an
exceptionally low false-positive rate - maybe the user-defined operator
detection was either assumed to, or demonstrated to, have a sufficiently
high false positive rate to not meet that high bar.

This is still the case. For a new warning, you should evaluate some large open-source codebase and measure true positive and false positive rate and post the numbers here.

(as for the flag splitting - that was sometimes done if the new variant of
a flag had enough bug-finding power that an existing codebase using the
existing flag behavior would need significant cleanup - by having the new
functionality under another flag name, existing codebases upgrading to a
newer compiler wouldn't be forced to either do all that cleanup up-front or
disable the flag & risk regressions... )

Historically Clang's policy on warnings was, I think, much more
conservative than it seems to be today. There was a strong desire not to
implement off-by-default warnings, and to have warnings with an
exceptionally low false-positive rate - maybe the user-defined operator
detection was either assumed to, or demonstrated to, have a sufficiently
high false positive rate to not meet that high bar.

This is still the case. For a new warning, you should evaluate some large open-source codebase and measure true positive and false positive rate and post the numbers here.

I did run it on stage-2 of LLVM itself. The only 'false-positives' so far are already present in trunk:
https://godbolt.org/g/SXc4Wd

void test_int () {
    int b;
    static_assert(noexcept(b &= b), "" );
}
void test_byte () {
    std::byte b;
    static_assert(noexcept(b &= (std::byte &)b), "" );
}
<source>:4:30: warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign]

    static_assert(noexcept(b &= b), "" );

                           ~ ^  ~

Should it really warn when in unevaluated context?
Dunno, but trunk already does that. Should that be changed?

Which other large codebase do you want me to evaluate, so we can talk facts?

(as for the flag splitting - that was sometimes done if the new variant of
a flag had enough bug-finding power that an existing codebase using the
existing flag behavior would need significant cleanup - by having the new
functionality under another flag name, existing codebases upgrading to a
newer compiler wouldn't be forced to either do all that cleanup up-front or
disable the flag & risk regressions... )

Right. I think it's fair to acknowledge that many data structure unit tests will contain a legitimate use of a user-defined self-assignment without feeling that that disqualifies the warning.

Note that the purpose of this kind of breadth testing is just to look for false positives. It shouldn't be discouraging if the warning doesn't find any real bugs in existing code.

EricWF added a subscriber: EricWF.Apr 2 2018, 3:28 PM

One thing I would like to see in this patch is ensuring the warning doesn't get generated when the expression appears in unevaluated contexts, such as decltype and noexcept. The warning is fundamentally about dataflow, but this doesn't apply to unevaluated expressions.
There are plenty of cases where a user might want to ask if assignment is well formed on noexcept using only one variable. For example:

template <class T> void foo(T& value) noexcept(noexcept(value = value)) { /* perform an assignment somewhere */ }

The warning is fundamentally about dataflow, but this doesn't apply to unevaluated expressions. There are plenty of cases where a user might want to ask if assignment is well formed on noexcept using only one variable. For example:

template <class T> void foo(T& value) noexcept(noexcept(value = value)) { /* perform an assignment somewhere */ }

My impression is that this case will not trigger the warning because the variable value is dependently typed.
I agree there should be a regression test for this exact case just to make sure of that.

@EricWF, is it important IYO that this warning not trigger in unevaluated contexts even for non-dependently-typed variables?
This is the case that seems to be coming up in practice in libc++ tests, but is hard to reason about because it's "only" deliberately contrived test code.

auto foo(std::exception& value)
    noexcept(noexcept(value = value)) // ok to diagnose?
    -> decltype(value = value)  // ok to diagnose?
{
    static_assert(noexcept(value = value));  // ok to diagnose?
}

No, the analysis is intentionally syntactic; it should apply even on dependently-typed arguments (but not get re-checked on instantiation). But I agree that the warning ought to be disabled in unevaluated contexts.

EricWF added a comment.Apr 2 2018, 4:40 PM

@EricWF, is it important IYO that this warning not trigger in unevaluated contexts even for non-dependently-typed variables?
This is the case that seems to be coming up in practice in libc++ tests, but is hard to reason about because it's "only" deliberately contrived test code.

auto foo(std::exception& value)
    noexcept(noexcept(value = value)) // ok to diagnose?
    -> decltype(value = value)  // ok to diagnose?
{
    static_assert(noexcept(value = value));  // ok to diagnose?
}

I don't think any of those cases should produce a warning. Self assignment doesn't actually take place, so it shouldn't produce a warning about a potential self-assignment.

lebedev.ri updated this revision to Diff 140787.Apr 3 2018, 7:47 AM
  • Rebased
  • Silence both of the diagnostics in an unevaluated context.
  • Fixed -Wself-assign-field preexisting problems:
    • False-positives on volatile stores.
    • Do not warn in template instantiations.
    • Handle all the macro cases (do not warn)
  • +Tests

I'm going to run it on some project next.

Historically Clang's policy on warnings was, I think, much more
conservative than it seems to be today. There was a strong desire not to
implement off-by-default warnings, and to have warnings with an
exceptionally low false-positive rate - maybe the user-defined operator
detection was either assumed to, or demonstrated to, have a sufficiently
high false positive rate to not meet that high bar.

This is still the case. For a new warning, you should evaluate some large open-source codebase and measure true positive and false positive rate and post the numbers here.

Just finished running it on chrome: (wow, that took a while!)

$ cat /tmp/test.cpp 
struct S {};

void test (S a) {
  a = a;
}
$ /build/llvm-build-Clang-release/bin/clang++ -c /tmp/test.cpp -Wall
/tmp/test.cpp:4:5: warning: explicitly assigning value of variable of type 'S' to itself [-Wself-assign]
  a = a;
  ~ ^ ~
1 warning generated.
$ ninja -C out/ClangStage2 chrome
ninja: Entering directory `out/ClangStage2'
[31309/31309] LINK ./chrome

Config:

So unless the config ^ is wrong, there have been no occurrences, no false-positives.

Down to one [trivial] prerequisite - D45082 - would be super nice if someone could review/accept it :)

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.

This landing made our clang trunk bots do an evaluation of this warning :-P It fired 8 times, all false positives, and all from unit tests testing that operator= works for self-assignment. (https://chromium-review.googlesource.com/c/chromium/src/+/1000856 has the exact details) It looks like the same issue exists in LLVM itself too, https://reviews.llvm.org/D45082

Now tests often need warning suppressions for things like this, and this in itself doesn't seem horrible. However, this change takes a warning that was previously 100% noise-free in practice and makes it pretty noisy – without a big benefit in practice. I get that it's beneficial in theory, but that's true of many warnings.

Based on how this warning does in practice, I think it might be better for the static analyzer, which has a lower bar for false positives.

This landing made our clang trunk bots do an evaluation of this warning :-P It fired 8 times, all false positives, and all from unit tests testing that operator= works for self-assignment. (https://chromium-review.googlesource.com/c/chromium/src/+/1000856 has the exact details) It looks like the same issue exists in LLVM itself too, https://reviews.llvm.org/D45082

Right, i guess i only built the chrome binary itself, not the tests, good to know...

Now tests often need warning suppressions for things like this, and this in itself doesn't seem horrible. However, this change takes a warning that was previously 100% noise-free in practice and makes it pretty noisy – without a big benefit in practice. I get that it's beneficial in theory, but that's true of many warnings.

Based on how this warning does in practice, I think it might be better for the static analyzer, which has a lower bar for false positives.

Noisy in the sense that it correctly diagnoses a self-assignment where one intended to have self-assignment.
And unsurprisingly, it happened in the unit-tests, as was expected ^ in previous comments.
So far there are no truly false-positives noise (at least no reports of it).

We could help workaround that the way i initially suggested, by keeping this new part of the diag under it's own sub-flag,
and suggest to disable it for tests. But yes, that

"False positive" means "warning fires but didn't find anything
interesting", not "warning fires while being technically correct". So all
these instances do count as false positives.

clang tries super hard to make sure that every time a warning fires, it is
useful for a dev to fix it. If you build with warnings enabled, that should
be a rewarding experience. Often, this means dialing back a warning to not
warn in cases where it would make sense in theory when in practice the
warning doesn't find much compared to the amount of noise it generates.
This is why for example clang's -Woverloaded-virtual is usable while gcc's
isn't (or wasn't last I checked a while ago) – gcc fires always when it's
technically correct to do so, clang only when it actually matters in
practice.

brooksmoses added a subscriber: brooksmoses.EditedApr 15 2018, 3:07 PM

I have noticed two things when attempting to release LLVM with this revision internally at Google:

  1. It's catching real bugs, all in constructors where someone wrote "member_ = member_" when they meant "member_ = member".
  1. It's catching at least as many cases of tests where people are intentionally testing that self-assignment doesn't corrupt the data values.

Thus, this seems valuable but problematic, and the problems mean that initially we're facing turning off -Wself-assign completely until this is resolved. That's definitely an issue, and at least means that this needs to be placed under its own -W option. And the real bugs it's finding seem to be very specific, and could be found by a more-focused warning.

EDIT: Looks like "var = *&var" is supposed to avoid the warning, so my (now-deleted) complaints about this not having a source workaround are erroneous. I still find the fact that "var = (var)" doesn't suppress this to be surprising, but it's not critical.

Further note: I'm noticing that nearly all the signal is from -Wself-assign-field and all the noise is from -Wself-assign. So that would be one obvious way to make this higher-signal in what's enabled in -Wall, if that were a desire.

I have noticed two things when attempting to release LLVM with this revision internally at Google:

  1. It's catching real bugs, all in constructors where someone wrote "member_ = member_" when they meant "member_ = member".

Nice, just like the one that caused me to write this :)

  1. It's catching at least as many cases of tests where people are intentionally testing that self-assignment doesn't corrupt the data values.

Thus, this seems valuable but problematic, and the problems mean that initially we're facing turning off -Wself-assign completely until this is resolved. That's definitely an issue, and at least means that this needs to be placed under its own -W option. And the real bugs it's finding seem to be very specific, and could be found by a more-focused warning.

See all the previous disscussion about that earlier in the differential / mail thread..
I guess some consensus needs to be found, and then i'll implement it.

EDIT: Looks like "var = *&var" is supposed to avoid the warning, so my (now-deleted) complaints about this not having a source workaround are erroneous.

See release notes.

I still find the fact that "var = (var)" doesn't suppress this to be surprising, but it's not critical.

Some further statistics, now that I've done a full cleanup on our code:

8 actual bugs newly found by -Wself-assign-field. (Thank you!)
2 actual bugs newly found by -Wself-assign
6 instances of redundant code newly found by -Wself-assign
90 (approximately) instances of intentional self assignments in tests that now need "*&" annotations.

That seems like an awfully large amount of noise for the -Wself-assign part of this, and I continue feeling rather dubious about whether it should be part of -Wall. By contrast, the -Wself-assign-field part has been entirely true positives.

A further concern about this in the general case from the reviewer of one of my test-cleanup changes: The "var = *&var" idiom is not necessarily equivalent to "var = var" in cases of user-defined types, because operator& may be overloaded.

Re false-positives - at least two [post-]reviewers need to agree on the way forward (see previous comments, mail thread), and then i will implement it.

A further concern about this in the general case from the reviewer of one of my test-cleanup changes: The "var = *&var" idiom is not necessarily equivalent to "var = var" in cases of user-defined types, because operator& may be overloaded.

Release notes state:

If you are doing such an assignment intentionally, e.g. in a unit test for
a data structure, the warning can be suppressed by adding ``*&`` to the
right-hand side or casting it to the appropriate reference type.

So it could also be var = static_cast<decltype(var) &>(var);

These kinds of improvements to warnings are awesome, but the way we are deploying them presents serious challenges to adoption which I think we need to address.

I think significant new warning functionality should as a matter of course go behind some warning group at least initially. I'd be fine with this being a generic "-beta" group or some such, and I would be very happy with a clear timeline for merging some of these flags away. But I think we need to give users of Clang the ability to stage the deployment of a new compiler and the cleanup of their code more effectively.

Secondly, there seems to be some interest in splitting off OP= combined assignement-and-functionality warnings, but delays around getting consensus. While we wait, users (including us) are forced to work around a problematic warning. This is pretty disruptive. I think we need to be much more rapid it moving this into a separate flag while it is under discussion because we don't have anything like the above policy in place and well followed, we are currently facing a really bad choice: disable the *entire* warning and potentially regress our codebase, or commit really ugly hacks to work around a bad warning. We can probably make it through this for this warning, but we need to prioritize cleaning this up because it didn't go behind any kind of staging, and going forward i strongly think we need a better way of staging these kinds of improvements.

Note that this isn't novel -- the thread-safety warnings created exactly such a staging arrangement to help address these kinds of rollout issues.

There are several options:

  1. @rjmccall's idea: -wtest (lowercase), which in this case will disable that new code in BuildOverloadedBinOp(). i quite like it actually.
  2. split it up like i had in the first revision - `-Wself-assign-builtin, -Wself-assign-field-builtin; -Wself-assign-overloaded, -Wself-assign-field-overloaded`
    • we could just assume that BuildOverloadedBinOp() implies overloaded,
    • or check that the particular operator is non-trivial
  3. ???

@rjmccall, @thakis, @dblaikie, @aaron.ballman, @brooksmoses, @chandlerc

I'm going to go ahead and look into 1., since it does not seem there will be any consensus in a timely manner.

There are several options:

  1. @rjmccall's idea: -wtest (lowercase), which in this case will disable that new code in BuildOverloadedBinOp(). i quite like it actually.
  2. split it up like i had in the first revision - `-Wself-assign-builtin, -Wself-assign-field-builtin; -Wself-assign-overloaded, -Wself-assign-field-overloaded`
    • we could just assume that BuildOverloadedBinOp() implies overloaded,
    • or check that the particular operator is non-trivial
  3. ???

@rjmccall, @thakis, @dblaikie, @aaron.ballman, @brooksmoses, @chandlerc

I'm going to go ahead and look into 1., since it does not seem there will be any consensus in a timely manner.

... and here it is https://reviews.llvm.org/D45685, please take a look

cfe/trunk/test/SemaCXX/warn-self-assign-overloaded.cpp