This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Self Assignment Checker
ClosedPublic

Authored by baloghadamsoftware on Apr 20 2016, 1:15 AM.

Details

Summary

This checker checks copy and move assignment operators whether they are protected against self-assignment. Since C++ core guidelines discourages explicit checking for `&rhs==this` in general we take a different approach: in top-frame analysis we branch the exploded graph for two cases, where &rhs==this and &rhs!=this and let existing checkers (e.g. unix.Malloc) do the rest of the work. It is important that we check all copy and move assignment operator in top frame even if we checked them already since self-assignments may happen undetected even in the same translation unit (e.g. using random indices for an array what may or may not be the same).

Diff Detail

Event Timeline

baloghadamsoftware retitled this revision from to Self Assignment Checker.
baloghadamsoftware updated this object.
baloghadamsoftware added a reviewer: dcoughlin.
xazax.hun retitled this revision from Self Assignment Checker to [analyzer] Self Assignment Checker.Apr 20 2016, 3:20 AM

Initial comments added to the checker and tests are converted from <CR><LF> (DOS) to <LF> (Unix) format.

dcoughlin edited edge metadata.May 17 2016, 9:09 AM

Thanks for the patch! This looks like it will catch some nasty bugs.

My comments (inline) are mostly nits. But two are more substantial: (1) you should add a path note where the self-assignment assumption is made explaining the assumption to the user and (2) the potential cost of the eager case split could be high -- can it be mitigated by changing the inlining mode?

lib/StaticAnalyzer/Checkers/SelfAssignmentChecker.cpp
13 ↗(On Diff #54968)

An interesting part of this checker is that it doesn't actually perform any checks itself. Instead, it helps other checkers finds bugs by causing the analyzer to explore copy and move assignment operators twice: once for when 'this' aliases with the parameter and once for when it may not.

I think it is worth mentioning this explicitly in the comment because it unusual. Most checkers don't work this way.

26 ↗(On Diff #54968)

What do you think about renaming this class to something like "CXXSelfAssignmentChecker"? I think name is a bit confusing because in Objective-C there is a notion of assigning to self (which is the ObjC equivalent of this).

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
434

It would be good to add a comment explaining why we re-analyze these methods at the top level.

435

The LLVM style is to put spaces after if and around ||. http://llvm.org/docs/CodingStandards.html#spaces-before-parentheses

452

One thing I am concerned about is that this will introduce an eager case split, which can be quite expensive. We could potentially mitigate this cost by changing getInliningModeForFunction() to return ExprEngine::Inline_Minimal for the copy and move assignment operators when analyzing them at the top level after they have already been inlined elsewhere. Inline_Minimal would prevent most inlining in these methods and thus reduce cost, but would also reduce coverage (leading to false negatives). Do you have a sense of how often self-assignment issues arise in inlined calls vs. in the operators themselves?

(Also the spacing around && doesn't match LLVM style here.)

test/Analysis/self-assign-unused.cpp
21 ↗(On Diff #54968)

I think it is important to add a path note saying something like "Assuming '*this' is equal to 'rhs'".

To do this, you can add a new BugReporterVisitor subclass to BugReporterVisitors.h that walks a error path backwards to add the note. There is an example of how to do this in NonLocalizedStringBRVisitor in LocalizationChecker.cpp.

Unlike the localization checker, however, the path note for the self assignment checker will be added to errors generated by *other* checkers, so you should register the new bug report visitor for all reports in GRBugReporter::generatePathDiagnostic() (similar to NilReceiverBRVisitor).

22 ↗(On Diff #54968)

I think it would be good to add;

clang_analyzer_eval(*this == rhs); // expected-warning{{TRUE}} expected-warning{{UNKNOWN}}

here to explicitly test for the case split. You'll need to add a prototype void clang_analyzer_eval(int); to use this.

26 ↗(On Diff #54968)

It would be good to also test the move assignment operator.

test/Analysis/self-assign-used.cpp
1 ↗(On Diff #54968)

Can you combine these into one test file? This way someone updating the tests will know there is only one place to look.

baloghadamsoftware edited edge metadata.

Issues fixed.

baloghadamsoftware marked 3 inline comments as done.Jul 8 2016, 5:35 AM

Debug line removed.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
452

I made a quick measurement on our build server using clang as target code. Real/user/system times were 75/575/18 for minimal and 76/587/18 for regular. This does not seem too much. However, since the semantics of a copy assignment operator is often composed of the semantics of a destructor and the semantics of a copy constructor implementations often put these two functionalities into separate functions and call them from the destructor, copy constructor and copy assignment operator. Such implementations requires regular inlining to be checked.

I added tests for move assignment operators, but I could not find out any other simple test case than memory leak. However memory leaks are currently only detected by Unix.malloc for malloc. So I tried to replace strdup with malloc, strlen and strcpy, but even in this case leak bug is only reported for string used but not for string unused. That would require a more complex leak checker which we currently do not have.

NoQ added a subscriber: NoQ.Jul 11 2016, 8:53 AM

Other than adding expected notes for the path notes to the test, this looks good to me. Thanks Ádám!

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1698

This is great.

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
452

Ok, makes sense to me.

test/Analysis/self-assign.cpp
1

I think it would be good to add -analyzer-output=text to this test line and check the expected output of that path notes.

baloghadamsoftware marked an inline comment as done.Jul 15 2016, 6:13 AM
dcoughlin accepted this revision.Jul 18 2016, 10:26 AM
dcoughlin edited edge metadata.
This revision is now accepted and ready to land.Jul 18 2016, 10:26 AM
This revision was automatically updated to reflect the committed changes.
dcoughlin reopened this revision.Jul 18 2016, 12:08 PM

Re-opening. Reverted in r275880. It was causing a failure on the bots: http://bb.pgr.jp/builders/cmake-clang-x86_64-linux/builds/51780

This revision is now accepted and ready to land.Jul 18 2016, 12:08 PM
baloghadamsoftware added a comment.EditedJul 19 2016, 1:15 AM

Do I use a non-portable way to concatenate strings? "Assuming rhs == *this" becomes "0*this" for some strange reason. I tested it again with the latest master branch and all tests are passing like earlier.

Here is the failing test output in case the bot log goes away:

  • TEST 'Clang :: Analysis/self-assign.cpp' FAILED ****

Script:

/home/bb/cmake-clang-x86_64-linux/build/./bin/clang -cc1 -internal-isystem /home/bb/cmake-clang-x86_64-linux/build/bin/../lib/clang/3.9.0/include -nostdsysteminc -std=c++11 -analyze -analyzer-checker=core,cplusplus,unix.Malloc,debug.ExprInspection /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp -verify -analyzer-output=text

Exit Code: 1

Command Output (stderr):

error: 'note' diagnostics expected but not seen:

File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: Assuming rhs != *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 38: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 38: Assuming rhs != *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: Assuming rhs != *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 72: Assuming rhs == *this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 72: Assuming rhs != *this

error: 'note' diagnostics seen but not expected:

File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 31: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 38: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 38: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 65: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 72: 0*this
File /home/bb/cmake-clang-x86_64-linux/llvm-project/clang/test/Analysis/self-assign.cpp Line 72: 0*this

20 errors generated.

Thx, I checked the output, but I do not understand why a simple string concatenation fails in your test environment. It works on our build server (Linux) with the latest master trunk.

xazax.hun added inline comments.Jul 21 2016, 12:22 AM
cfe/trunk/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1738 ↗(On Diff #64346)

getName will return a StringRef here. Contatenating const char * and StringRef will give you a Twine. So Msg will be a twine which refers to temporary objects. This will result in a use after free. You shoud convert the result of the concatenation (the Twine) to a std::string, to copy the data and avoid use after free.

baloghadamsoftware edited edge metadata.
baloghadamsoftware removed rL LLVM as the repository for this revision.

Bug path string fixed.

This revision was automatically updated to reflect the committed changes.