StaticAnalyzer - MisusedMovedObjectChecker
ClosedPublic

Authored by szepet on Sep 6 2016, 3:07 AM.

Details

Summary

A checker implementation which checks for various cases when a moved-from object is potentially misused.
That means That means method calls on the object or copying it in moved-from state.

Diff Detail

Repository
rL LLVM
szepet retitled this revision from to StaticAnalyzer - MisusedMovedObjectChecker.Sep 6 2016, 3:07 AM
szepet updated this object.
szepet added reviewers: xazax.hun, zaks.anna, NoQ.
szepet added a subscriber: o.gyorgy.
szepet updated this revision to Diff 70389.Sep 6 2016, 5:12 AM

removed empty file which was added by mistake

The clang-tidy check that Gabor has pointed to has a lot of great test cases!

The main differences I see are:

  • It would be much easier to write a static analyzer check for this since the infrastructure for reasoning about ordered events is already in place.
  • I suspect one of the examples the clang-tidy check will not handle well is something like this:

    A a(42, 42.0); A b; if (cond) b = std::move(a); if (!cond) a.foo();

This is a classical example where path sensitivity is needed not to report a false positive.

  • I am guessing the check is intra-procedural as well.

(It would be great if someone could confirm these are actually true:))

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
32 ↗(On Diff #70389)

nit: Please define MoveStmt before the constructor.

119 ↗(On Diff #70389)

Please, add text diagnostic tests for the bug visitor.

139 ↗(On Diff #70389)

When is this condition hit? Please, add a test case.

149 ↗(On Diff #70389)

nit: Would be good to special case and capitalize the case where the ObjectName is not known.

179 ↗(On Diff #70389)

"Unspecified usage of a 'moved-from' object" -> "Usage of a 'moved-from' object"

180 ↗(On Diff #70389)

For consistency with other category names, how about "C++ move semantics"?

257 ↗(On Diff #70389)

Why is this necessary? Why using checkPointerEscape is not enough?

test/Analysis/MisusedMovedObject.cpp
49 ↗(On Diff #70389)

Adding "//no-warning" on lines where you explicitly test that a warning is not produced will simplify with readability and maintainability.

108 ↗(On Diff #70389)

How about improving the wording a bit: "Method call on a 'moved-from' object 'a'". Note, all diagnostic messages should be capitalized.

szepet updated this revision to Diff 73216.EditedOct 2 2016, 6:03 AM
szepet marked 8 inline comments as done.
  • Some refactor based on comments.
  • Updated testfiles.

I checked the tidy checker and I can confirm Anna's statements. To get in details:

  • tidy checker is intraprocedural, so doesn't find the interprocedural case in this testfile (and false positives on right reference calls)
  • mutually exclusive conditions in if statements results false positive
  • In addition, because of no semantic check it does not find the pointer and array examples in this testcase - false negativ
  • However it can find the misuse if it comes from the parameter evaluation order, for example:
foobar(std::move(a), a.getI()); //both checker find the problem
foobar(a.getI(), std::move(a)); //only tidy report a warning

That is the case where we have a false negative.

szepet added inline comments.Oct 2 2016, 6:12 AM
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
257 ↗(On Diff #70389)

Based on my tests,checkPointerEscape works on only SymRegion but we needed some more generic approach. (In connection with MemRegions.)
It seems kind of hacking it into this checker and it's probably exactly that but I had no other ideas to implement it.

NoQ added inline comments.Oct 3 2016, 12:21 AM
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
257 ↗(On Diff #70389)

As far as i understand, at a glance, this code invalidates the program state trait for objects that have escaped into a function.

This makes me think you should be taking check::RegionChanges here. See CStringChecker as an example. This callback would save you a large amount of code by presenting a list of superregions that cover all invalidated regions (well, you can't list all of them). It automatically handles regions pointed to by contents of invalidated regions etc., which you'd still want to handle.

check::PointerEscape is truly not enough because it handles only symbolic pointers, while you are also interested in concrete pointers (such as VarRegion).

There's a bit of a mess with these callbacks which we're trying to sort out (http://lists.llvm.org/pipermail/cfe-dev/2016-September/050792.html), but it shouldn't affect you.

szepet added inline comments.Oct 6 2016, 8:09 AM
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
257 ↗(On Diff #70389)

I have tried this callback on some trivial examples.
It seems it solves this problem (function calls with no body), however, in cases when a constructor /destructor called, the callback provides only the members regions in the ExplicitRegions. In case of destructor the Regions contains the element's region, but unfortunately it is not always right to iterate over the whole Regions array.
For example: Destructor calls on an array element would cause invalidating the whole array.

So this callback can help on this one, and it would be structured invalidating instead of this hacking. However, the invalidating in the checkPreCall still needed. (constructorcall, destructorcall...)

Am I right or did I miss/misunderstand something?

NoQ added inline comments.Oct 7 2016, 3:47 AM
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
262 ↗(On Diff #73216)

By the way, we have C.wasInlined, would it be more useful/accurate here?

257 ↗(On Diff #70389)

Hmm, i think i want to know more about that. Could you post a version of the patch with checkRegionChanges and the failing test, which you are trying, in case you already have it?

Destructor calls on an array element would cause invalidating the whole array.

Yyeeeah, that's right, we're always invalidating the whole base region. Essentially, you can get from one array element to another array element with a reliable pointer arithmetic (eg. "this - 1" in your case). And i agree that for your case this ultraconservative invalidation is definitely worth lifting.

in cases when a constructor /destructor called, the callback provides only the members regions in the ExplicitRegions

That's probably because the constructor/destructor was inlined, otherwise how would he know what members are there? In this case, there is actually no invalidation, just separate callbacks for every store bind of every member. And in this case...

However, the invalidating in the checkPreCall still needed. (constructorcall, destructorcall...)

... in this case, yeah, this is definitely needed. This is private logic of your checker, the analyzer couldn't have guessed that a perfectly inlined call, without any invalidations, may have such special meaning for your checker.

However, if the constructor/destructor wasn't inlined, then i expect the object to be present in explicit regions. If it isn't there, it's probably worth adding on the core side rather than hacking in checkers.

So in my opinion this leaves us with standard checkRegionChanges and special code for inlined constructors/destructors.

Please, add the tests from test/clang-tidy/misc-use-after-move.cpp so that we have better test coverage.

Peter, Do you plan on addressing the review comments or can we commander the revision?

I wasn't able to find time for this checker the last few weeks, but my schedule became much lighter and I plan to commit the required changes (and a bugfix) next week. Starting from next week I can work on the checker on a daily basis. Hopefully it can make it into 4.0.

szepet updated this revision to Diff 82469.Dec 25 2016, 5:58 AM
szepet marked 2 inline comments as done.

Test cases updated.
Some invalidation moved to checkRegionChanges callback.
Missing feature added: do not report an SubRegion if any of its SuperRegion is already reported

NoQ added a comment.Jan 19 2017, 4:03 AM

Thanks for looking into the region-changes approach, and sorry for the delay in replies.
I think we're close to seeing this checker merged, as it looks great.
A few smaller comments.

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
154 ↗(On Diff #82469)

Could we use the static PathDiagnosticLocation::getStmt(N) function for obtaining S? That'd be the statement of the program point of the node. This way we don't need to store the statement in the program state. Or are you looking for a different statement?

230 ↗(On Diff #82469)

Hmm. Is this not caught by checkDeadSymbols? I expect parameter regions to die eventually. It should not only catch regions themselves, but also regions to which they contain pointers.

478 ↗(On Diff #82469)

I definitely admire this trick. However, it only works when the object from which we move is "opaque" enough: for example, if at least one body of its setter methods is available in the current translation unit, we could easily see an arbitrary symbol inside the object (not necessarily having this object as its origin region, probably reassigned from other regions or even conjured), and the code will have no effect.

szepet added inline comments.Feb 6 2017, 3:22 AM
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
230 ↗(On Diff #82469)

Lets consider the next case:

struct C {};

void f(C c)
{
  C b = std::move(c);
 //...
}

void g()
{
  C c;
  for(int i = 0;i<2;i++)
  {
    f(c);
  }
}

During the analysis of g() the checker will report that an already moved object will be copied. (Since it is passed by value).
So the main problem was these for loops and resulted false positives as well when i ran the checker on the llvm codebase.
The problem can be come from that tha analyzer doesnt see any difference between the call since not just the parameter but their location are the same and that cause the problem. This is reason why came up with that not so elegant solution.

Please correct me if I'm wrong or let me know in case you know a solution for that but I was not able to write checkDeadSymbols callback to handle that.

478 ↗(On Diff #82469)

Well I planned to remove this callback and only remained in the code by mistake. As I understand it is unnecessary since checkRegionChanges will handle these cases, I needed it just before it was implemented. So yes, you are right but this callback can (and should) be removed.

Do you agree?

szepet updated this revision to Diff 87218.Feb 6 2017, 6:25 AM
szepet marked an inline comment as done.

CheckPointerEscape callback removed since checkRegionChanges handle all the necessary cases.
Member MoveStmt removed from RegionState.
+Some typo fixes in comments.

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
154 ↗(On Diff #82469)

No, you are right. First I tried to compare the RegionStates so I the statements for operator== but then the code changed and I havent refactored this part.

NoQ added a comment.Feb 9 2017, 6:59 AM

This checker would need a bit of rebase before landing, but i think it's almost good to land. I'd even consider enabling by default after some evaluation, unless there are many known false positives - with an awesome test suite and very detailed bugreports, i think it's quite worth it.

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
230 ↗(On Diff #82469)

Hmm. Debugged this a bit.

Well, they are different objects. c from the old call and c from the new call are different objects, they may even have different addresses on the stack.

The analyzer should normally be noticing the difference between these two objects and represent them with different memory regions. Even though they're represented by VarRegions with same VarDecls, these regions should differ by having different super-regions, namely different StackArgumentsSpaceRegion instances, which, in turn, are different because they are built over different StackFrameContext instances.

But unfortunately in this case the StackFrameContext is the same, because it doesn't include the call site block count! So as long as the call site's statement is the same on subsequent calls, such as in the loop, we run into this problem.

This should be fixed on StackFrameContext side. D19979 should have addressed that, but it ran into problems and doesn't seem to be landing soon. For now, i guess i'd try to add a block count directly to StackFrameContext when i have time, it should be trivial. Then your checkEndFunction callback will be unnecessary.

I'm still curious why checkDeadSymbols doesn't kill these map items, but i guess we already have a bigger problem to tackle.

Could you add this test to your test set for now?

1 ↗(On Diff #87218)

This line needs some care (alignment, maybe a short description).

szepet updated this revision to Diff 88200.Feb 13 2017, 7:23 AM
szepet marked 2 inline comments as done.

Small infotext added and test cases updated. (More loop test to make sure the checkEndFunction workaround works fine.)

There is a CERT C++ Secure Coding Standard rule that is very similar to this check. I think you should mention this rule somewhere in the documentation or comments (but do not imply that this check conforms to that rule since it is not guaranteed that it finds all those issues).

The description of the rule also contains several useful exceptions that you can check whether your checker already has implemented: https://www.securecoding.cert.org/confluence/display/cplusplus/EXP63-CPP.+Do+not+rely+on+the+value+of+a+moved-from+object

NoQ accepted this revision.Feb 16 2017, 8:11 AM

I believe this checker is great and should be merged. Do you have commit access? If not, could you rebase the checker to the top of trunk? because the patch doesn't apply cleanly anymore.

This revision is now accepted and ready to land.Feb 16 2017, 8:11 AM

Great to see this converging? What is left before you would recommend turning this on by default?
You mention evaluating this on LLVM codebase. What were the results?

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
230 ↗(On Diff #82469)

But unfortunately in this case the StackFrameContext is the same, because it doesn't include the call site block count! So as long as the call site's statement is the same on subsequent calls, such as in the loop, we run into this problem.

This sounds scary! Can you file a bug/PR about it with a reduced test case.
Can we refer to the PR here if we think this code should change once the PR is fixed?

478 ↗(On Diff #82469)

Has this been resolved?

test/Analysis/MisusedMovedObject.cpp
230 ↗(On Diff #88200)

Can you please split these notes so that each one is on a separate line? You can use // expected-note@+1 to show that there is 1 line difference between the locations.

NoQ added a comment.Mar 7 2017, 2:24 AM

Hello,

I tested this checker on WebKit code. It seems to work as expected, and i liked the support for WTFMove(). However i suddenly realized that it is not always unsafe to use a moved-from object, as discussed in detail at, for example, http://stackoverflow.com/a/7028318 . The checker already includes a state-reset method list, however there are also methods that are generally safe to call and possibly useful (such as .size()) or methods that don't entirely reset the state but affect it (.resize()?). People may use these tricks to improve performance. We shouldn't flag these tricks in a checker that is enabled by default, because it'd make people change and regress working code.

I guess the solution is to come up with more detailed heuristics to figure out which methods are safe and which aren't - either larger lists of methods, or regex-based analysis over method names. Would you be willing to come up with those in a follow-up patch? We may also have a look eventually, because we believe this checker is useful :)

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
230 ↗(On Diff #82469)

Filed: http://bugs.llvm.org/show_bug.cgi?id=32163

Also recalled that fixing StackFrameContext is not enough to get rid of this problem entirely, because the inner function call is not bound to exist.

478 ↗(On Diff #82469)

Yep.

szepet added a subscriber: dkrupp.Mar 7 2017, 8:32 AM
szepet updated this revision to Diff 90978.Mar 7 2017, 6:58 PM
szepet marked an inline comment as done.

Done in this diff:

  • branch rebase
  • check comments in the test file separated to lines (only notes with the same messages are in the same line)

I have runned the checker on the LLVM source, it can be viewed here: http://cc.elte.hu:15001/#run=1 .
This checker reported 44 bugs, but let me summarize them:

  • 22 bug from the same location (/mnt/storage/szepet/dev/CodeChecker/llvm/include/llvm/Support/Error.h:841) which is a false positive and it was resulted by not tracking the bitfields (bool Unchecked : 1; ).
  • 6 from the toy.cpp-s and they are a true positve in my opinion (http://cc.elte.hu:15001/#run=1&report=351)
  • 5 from the test files where some specific class's function was tested in moved state (e.g. http://cc.elte.hu:15001/#run=1&report=48)
  • 11 various report but the checker worked as expected (it could be 4 less with better heuristics but probably it would be better to commit the heuristics in a follow up patch)
szepet added a comment.EditedMar 7 2017, 7:10 PM

I guess the solution is to come up with more detailed heuristics to figure out which methods are safe and which aren't - either larger lists of methods, or regex-based analysis over method names. Would you be willing to come up with those in a follow-up patch? We may also have a look eventually, because we believe this checker is useful :)

Yeah, for sure I would really like to work on that! Once this is accepted I will start to figure out some heuristics and experiment with them.

xazax.hun accepted this revision.Mar 13 2017, 7:25 AM

I only found two nits. In case you think this is ready to commit, let me know.

lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp
2 ↗(On Diff #90978)

Nit, it looks like the description is too long and the end of the first line is moved to the second.

test/Analysis/MisusedMovedObject.cpp
1 ↗(On Diff #90978)

Nit: due to the upcomming Z3 support (as a constraint solver), you should use %clang_analyze_cc1 instead of %clang_cc1

szepet updated this revision to Diff 92399.Mar 20 2017, 4:27 PM
szepet marked 5 inline comments as done.

Nit resolved.

This revision was automatically updated to reflect the committed changes.
NoQ added a comment.Mar 24 2017, 3:18 AM

I committed the checker. Thanks / congrats / all the stuff!

I've also been thinking that when doing move-after-move, we should probably say "Moving a 'moved-from' object..." rather than "Copying...".