A checker implementation which checks for various cases when a moved-from object is potentially misused.
That means That means method calls on the object or copying it in moved-from state.
Details
- Reviewers
dcoughlin zaks.anna NoQ xazax.hun - Commits
- rG356151ff5fd2: [analyzer] Add MisusedMovedObjectChecker for detecting use-after-move errors.
rC298698: [analyzer] Add MisusedMovedObjectChecker for detecting use-after-move errors.
rL298698: [analyzer] Add MisusedMovedObjectChecker for detecting use-after-move errors.
Diff Detail
Event Timeline
We should compare this patch with the just released tidy checker: https://github.com/llvm-mirror/clang-tools-extra/commit/5434827564c9c3741c63b4dc368b37a630857e29
The clang-tidy check that Gabor has pointed to has a lot of great test cases!
The main differences I see are:
- It would be much easier to write a static analyzer check for this since the infrastructure for reasoning about ordered events is already in place.
- I suspect one of the examples the clang-tidy check will not handle well is something like this:
A a(42, 42.0); A b; if (cond) b = std::move(a); if (!cond) a.foo();
This is a classical example where path sensitivity is needed not to report a false positive.
- I am guessing the check is intra-procedural as well.
(It would be great if someone could confirm these are actually true:))
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
32 | nit: Please define MoveStmt before the constructor. | |
119 | Please, add text diagnostic tests for the bug visitor. | |
139 | When is this condition hit? Please, add a test case. | |
149 | nit: Would be good to special case and capitalize the case where the ObjectName is not known. | |
179 | "Unspecified usage of a 'moved-from' object" -> "Usage of a 'moved-from' object" | |
180 | For consistency with other category names, how about "C++ move semantics"? | |
257 | Why is this necessary? Why using checkPointerEscape is not enough? | |
test/Analysis/MisusedMovedObject.cpp | ||
49 | Adding "//no-warning" on lines where you explicitly test that a warning is not produced will simplify with readability and maintainability. | |
108 | How about improving the wording a bit: "Method call on a 'moved-from' object 'a'". Note, all diagnostic messages should be capitalized. |
- Some refactor based on comments.
- Updated testfiles.
I checked the tidy checker and I can confirm Anna's statements. To get in details:
- tidy checker is intraprocedural, so doesn't find the interprocedural case in this testfile (and false positives on right reference calls)
- mutually exclusive conditions in if statements results false positive
- In addition, because of no semantic check it does not find the pointer and array examples in this testcase - false negativ
- However it can find the misuse if it comes from the parameter evaluation order, for example:
foobar(std::move(a), a.getI()); //both checker find the problem foobar(a.getI(), std::move(a)); //only tidy report a warning
That is the case where we have a false negative.
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
257 | Based on my tests,checkPointerEscape works on only SymRegion but we needed some more generic approach. (In connection with MemRegions.) |
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
257 | As far as i understand, at a glance, this code invalidates the program state trait for objects that have escaped into a function. This makes me think you should be taking check::RegionChanges here. See CStringChecker as an example. This callback would save you a large amount of code by presenting a list of superregions that cover all invalidated regions (well, you can't list all of them). It automatically handles regions pointed to by contents of invalidated regions etc., which you'd still want to handle. check::PointerEscape is truly not enough because it handles only symbolic pointers, while you are also interested in concrete pointers (such as VarRegion). There's a bit of a mess with these callbacks which we're trying to sort out (http://lists.llvm.org/pipermail/cfe-dev/2016-September/050792.html), but it shouldn't affect you. |
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
257 | I have tried this callback on some trivial examples. So this callback can help on this one, and it would be structured invalidating instead of this hacking. However, the invalidating in the checkPreCall still needed. (constructorcall, destructorcall...) Am I right or did I miss/misunderstand something? |
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
257 | Hmm, i think i want to know more about that. Could you post a version of the patch with checkRegionChanges and the failing test, which you are trying, in case you already have it?
Yyeeeah, that's right, we're always invalidating the whole base region. Essentially, you can get from one array element to another array element with a reliable pointer arithmetic (eg. "this - 1" in your case). And i agree that for your case this ultraconservative invalidation is definitely worth lifting.
That's probably because the constructor/destructor was inlined, otherwise how would he know what members are there? In this case, there is actually no invalidation, just separate callbacks for every store bind of every member. And in this case...
... in this case, yeah, this is definitely needed. This is private logic of your checker, the analyzer couldn't have guessed that a perfectly inlined call, without any invalidations, may have such special meaning for your checker. However, if the constructor/destructor wasn't inlined, then i expect the object to be present in explicit regions. If it isn't there, it's probably worth adding on the core side rather than hacking in checkers. So in my opinion this leaves us with standard checkRegionChanges and special code for inlined constructors/destructors. | |
263 | By the way, we have C.wasInlined, would it be more useful/accurate here? |
Please, add the tests from test/clang-tidy/misc-use-after-move.cpp so that we have better test coverage.
Peter, Do you plan on addressing the review comments or can we commander the revision?
I wasn't able to find time for this checker the last few weeks, but my schedule became much lighter and I plan to commit the required changes (and a bugfix) next week. Starting from next week I can work on the checker on a daily basis. Hopefully it can make it into 4.0.
Test cases updated.
Some invalidation moved to checkRegionChanges callback.
Missing feature added: do not report an SubRegion if any of its SuperRegion is already reported
Thanks for looking into the region-changes approach, and sorry for the delay in replies.
I think we're close to seeing this checker merged, as it looks great.
A few smaller comments.
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
155 | Could we use the static PathDiagnosticLocation::getStmt(N) function for obtaining S? That'd be the statement of the program point of the node. This way we don't need to store the statement in the program state. Or are you looking for a different statement? | |
231 | Hmm. Is this not caught by checkDeadSymbols? I expect parameter regions to die eventually. It should not only catch regions themselves, but also regions to which they contain pointers. | |
479 | I definitely admire this trick. However, it only works when the object from which we move is "opaque" enough: for example, if at least one body of its setter methods is available in the current translation unit, we could easily see an arbitrary symbol inside the object (not necessarily having this object as its origin region, probably reassigned from other regions or even conjured), and the code will have no effect. |
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
231 | Lets consider the next case: struct C {}; void f(C c) { C b = std::move(c); //... } void g() { C c; for(int i = 0;i<2;i++) { f(c); } } During the analysis of g() the checker will report that an already moved object will be copied. (Since it is passed by value). Please correct me if I'm wrong or let me know in case you know a solution for that but I was not able to write checkDeadSymbols callback to handle that. | |
479 | Well I planned to remove this callback and only remained in the code by mistake. As I understand it is unnecessary since checkRegionChanges will handle these cases, I needed it just before it was implemented. So yes, you are right but this callback can (and should) be removed. Do you agree? |
CheckPointerEscape callback removed since checkRegionChanges handle all the necessary cases.
Member MoveStmt removed from RegionState.
+Some typo fixes in comments.
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
155 | No, you are right. First I tried to compare the RegionStates so I the statements for operator== but then the code changed and I havent refactored this part. |
This checker would need a bit of rebase before landing, but i think it's almost good to land. I'd even consider enabling by default after some evaluation, unless there are many known false positives - with an awesome test suite and very detailed bugreports, i think it's quite worth it.
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
2 | This line needs some care (alignment, maybe a short description). | |
231 | Hmm. Debugged this a bit. Well, they are different objects. c from the old call and c from the new call are different objects, they may even have different addresses on the stack. The analyzer should normally be noticing the difference between these two objects and represent them with different memory regions. Even though they're represented by VarRegions with same VarDecls, these regions should differ by having different super-regions, namely different StackArgumentsSpaceRegion instances, which, in turn, are different because they are built over different StackFrameContext instances. But unfortunately in this case the StackFrameContext is the same, because it doesn't include the call site block count! So as long as the call site's statement is the same on subsequent calls, such as in the loop, we run into this problem. This should be fixed on StackFrameContext side. D19979 should have addressed that, but it ran into problems and doesn't seem to be landing soon. For now, i guess i'd try to add a block count directly to StackFrameContext when i have time, it should be trivial. Then your checkEndFunction callback will be unnecessary. I'm still curious why checkDeadSymbols doesn't kill these map items, but i guess we already have a bigger problem to tackle. Could you add this test to your test set for now? |
Small infotext added and test cases updated. (More loop test to make sure the checkEndFunction workaround works fine.)
There is a CERT C++ Secure Coding Standard rule that is very similar to this check. I think you should mention this rule somewhere in the documentation or comments (but do not imply that this check conforms to that rule since it is not guaranteed that it finds all those issues).
The description of the rule also contains several useful exceptions that you can check whether your checker already has implemented: https://www.securecoding.cert.org/confluence/display/cplusplus/EXP63-CPP.+Do+not+rely+on+the+value+of+a+moved-from+object
I believe this checker is great and should be merged. Do you have commit access? If not, could you rebase the checker to the top of trunk? because the patch doesn't apply cleanly anymore.
Great to see this converging? What is left before you would recommend turning this on by default?
You mention evaluating this on LLVM codebase. What were the results?
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
231 |
This sounds scary! Can you file a bug/PR about it with a reduced test case. | |
479 | Has this been resolved? | |
test/Analysis/MisusedMovedObject.cpp | ||
231 | Can you please split these notes so that each one is on a separate line? You can use // expected-note@+1 to show that there is 1 line difference between the locations. |
Hello,
I tested this checker on WebKit code. It seems to work as expected, and i liked the support for WTFMove(). However i suddenly realized that it is not always unsafe to use a moved-from object, as discussed in detail at, for example, http://stackoverflow.com/a/7028318 . The checker already includes a state-reset method list, however there are also methods that are generally safe to call and possibly useful (such as .size()) or methods that don't entirely reset the state but affect it (.resize()?). People may use these tricks to improve performance. We shouldn't flag these tricks in a checker that is enabled by default, because it'd make people change and regress working code.
I guess the solution is to come up with more detailed heuristics to figure out which methods are safe and which aren't - either larger lists of methods, or regex-based analysis over method names. Would you be willing to come up with those in a follow-up patch? We may also have a look eventually, because we believe this checker is useful :)
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
231 | Filed: http://bugs.llvm.org/show_bug.cgi?id=32163 Also recalled that fixing StackFrameContext is not enough to get rid of this problem entirely, because the inner function call is not bound to exist. | |
479 | Yep. |
Done in this diff:
- branch rebase
- check comments in the test file separated to lines (only notes with the same messages are in the same line)
I have runned the checker on the LLVM source, it can be viewed here: http://cc.elte.hu:15001/#run=1 .
This checker reported 44 bugs, but let me summarize them:
- 22 bug from the same location (/mnt/storage/szepet/dev/CodeChecker/llvm/include/llvm/Support/Error.h:841) which is a false positive and it was resulted by not tracking the bitfields (bool Unchecked : 1; ).
- 6 from the toy.cpp-s and they are a true positve in my opinion (http://cc.elte.hu:15001/#run=1&report=351)
- 5 from the test files where some specific class's function was tested in moved state (e.g. http://cc.elte.hu:15001/#run=1&report=48)
- 11 various report but the checker worked as expected (it could be 4 less with better heuristics but probably it would be better to commit the heuristics in a follow up patch)
I guess the solution is to come up with more detailed heuristics to figure out which methods are safe and which aren't - either larger lists of methods, or regex-based analysis over method names. Would you be willing to come up with those in a follow-up patch? We may also have a look eventually, because we believe this checker is useful :)
Yeah, for sure I would really like to work on that! Once this is accepted I will start to figure out some heuristics and experiment with them.
I only found two nits. In case you think this is ready to commit, let me know.
lib/StaticAnalyzer/Checkers/MisusedMovedObjectChecker.cpp | ||
---|---|---|
3 | Nit, it looks like the description is too long and the end of the first line is moved to the second. | |
test/Analysis/MisusedMovedObject.cpp | ||
2 | Nit: due to the upcomming Z3 support (as a constraint solver), you should use %clang_analyze_cc1 instead of %clang_cc1 |
I committed the checker. Thanks / congrats / all the stuff!
I've also been thinking that when doing move-after-move, we should probably say "Moving a 'moved-from' object..." rather than "Copying...".
This line needs some care (alignment, maybe a short description).