Add a new type of NonLoc SVal for pointer-to-member operations. This SVal supports both pointers to member functions and pointers to member data.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Yay, thanks for posting this! :)
I've got a bit of concern for some assert-suppressions.
lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2314 ↗ | (On Diff #74249) | Hmm. Why would anybody try to load anything from a plain pointer-to-member, rather than from a pointer-to-member-applied-to-an-object (which would no longer be represented by a PointerToMember value)? I suspect there's something wrong above the stack (or one of the sub-expression SVals is incorrect), because otherwise i think that making PointerToMember a NonLoc is correct - we cannot store things in it or load things from it. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
465 ↗ | (On Diff #74249) | cast<>()? It seems that all dynamic casts here are asserted to succeed. |
lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
69 ↗ | (On Diff #74249) | Same concern: Why are we copying a NonLoc? |
test/Analysis/pointer-to-member.cpp | ||
79 ↗ | (On Diff #74249) | In fact, maybe dereferencing a null pointer-to-member should produce an UndefinedVal, which could be later caught by core.uninitialized.UndefReturn. I wonder why doesn't this happen. |
lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2314 ↗ | (On Diff #74249) | Brief analysis shows that we call evalLoad of pointer-to-member SVal after ExprEngine::VisitCast call when cast kind is CK_LValueToRValue. Skipping this check leads to assertion fail in pointer-to-member.cpp test. I will investigate the other ways to supress this assertion. |
test/Analysis/pointer-to-member.cpp | ||
79 ↗ | (On Diff #74249) | In fact, I plan to caught dereferencing of null pointer-to-members by the checker which I intend to commit after this patch :) So, do you think that the check for dereferencing a null pointer-to-member should be a part of an analyzer core? |
test/Analysis/pointer-to-member.cpp | ||
---|---|---|
79 ↗ | (On Diff #74249) | That'd be great! However, we're doing a lot of such double-work in case a checker to find the defect is not enabled. For example, as seen by ExprEngine, result of division by zero, or of reading past an array bound, or of reading an unitialized variable, is UndefinedVal. For division by zero and for reading past an array bound, there's a checker that immediately terminates the analysis when the error occurs, which makes it completely irrelevant how exactly ExprEngine models the erroneous operation. In case of array bound, however, this checker is alpha and disabled by default. For reading an unitialized variable, no checker immediately warns, and UndefinedVal proceeds to exist until it is actually used anywhere. So i think this tradition is worth keeping, and the result of null pointer-to-member dereference should first be modeled as UndefinedVal. More importantly, however, it is worth investigating why doesn't it already magically happen. I suspect it might be related to the other problem of loading from a non-loc. |
Null pointer-to-member dereference is now modeled as UndefinedVal. Fixing this also releases us from loading from nonloc SVal.
I delete an assertion suppression in ExprEngine::performTrivialCopy because I can't reproduce its violation.
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
462 ↗ | (On Diff #75020) | I don't think pattern matching on the sub expression to find the referred-to declaration is the right thing to do here. It isn't always the case that the casted expression will be a unary pointer to member operation. For example, this is perfectly fine and triggers an assertion failure on your patch: struct B { int f; }; struct D : public B { int g; }; void foo() { D d; d.f = 7; int B::* pfb = &B::f; int D::* pfd = pfb; int v = d.*pfd; } Note that you can't just propagate the value already computed for the subexpression. Here is a particularly annoying example from the C++ spec: struct B { int f; }; struct L : public B { }; struct R : public B { }; struct D : public L, R { }; void foo() { D d; int B::* pb = &B::f; int L::* pl = pb; int R::* pr = pb; int D::* pdl = pl; int D::* pdr = pr; clang_analyzer_eval(pdl == pdr); // FALSE clang_analyzer_eval(pb == pl); // TRUE } My guess is this will require accumulating CXXBasePath s or something similar for each cast. I don't know how to do this efficiently. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
462 ↗ | (On Diff #75020) |
Jordan suggested storing this in a bump-pointer allocated object with a lifetime of the AnalysisDeclContext. The common case is no multiple inheritance, so that should be the fast case. Maybe the Data could be a pointer union between a DeclaratorDecl and an immutable linked list (with sharing) of CXXBaseSpecifiers from the CastExprs in the AST. The storage for this could be managed with a new manager in SValBuilder. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
462 ↗ | (On Diff #75020) | (The bump pointer-allocated thing would have to have the Decl as well.) This could also probably live in BasicValueFactory. The extra data would be similar to LazyCompoundValData. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
462 ↗ | (On Diff #75020) | My understanding is that PointerToMember SVal should be represented similar to LazyCompoundVal SVal. If I got you right, PointerToMember SVal should be constructed in VisitUnaryOperator and VisitCast (CK_BaseToDerivedMemberPointer and so on) just add CXXBaseSpecifiers to this SVal? |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
462 ↗ | (On Diff #75020) | SVal has very strict size limitations -- you can only store a single pointer for its Data. Since you'll need to have multiple CXXBaseSpecifiers in a single SVal (for example, consider a multiple inheritance hierarchy with a double diamond), the storage for this container will have to live elsewhere. One way to do this is to bump-pointer allocate linked list nodes for the "path" of casts that the value has taken. If you do this then you can share the memory for the nodes for a shared path prefix for two paths with a common prefix. |
According to dcoughlin suggestion PointerToMember SVal is now modeled as a PointerUnion of DeclaratorDecl and bump pointer-allocated object stored DeclaratorDecl and immutable linked list of CXXBaseSpecifiers.
This looks good to me (bikeshedded a bit), but i think Devin should have another look, because his comments were way deeper than mine.
include/clang/StaticAnalyzer/Core/PathSensitive/BasicValueFactory.h | ||
---|---|---|
217 ↗ | (On Diff #77041) | Hmm, is it "construct"? Or "constrain"? I think a verbose name wouldn't hurt. |
lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
882 ↗ | (On Diff #75020) | Hmm, do we need to cover CXXMethodDecl here? Or is it modeled as a function pointer anyway, so no action is needed? Worth commenting, i think. |
Thanks for your comments, Artem! Make function name less ambiguous. Also I take liberty to update analogical function name.
PointerToMemberData looks like it is on the right track! One thing that is still missing is using the base specifier list to figure out the correct subobject field to read and write from. This is important when there is non-virtual multiple inheritance, as there will be multiple copies of the same field in the same object.
Here are is an example where the current patch is not quite right; both of these should evaluate to true:
void clang_analyzer_eval(int); struct B { int f; }; struct L1 : public B { }; struct R1 : public B { }; struct M : public L1, R1 { }; struct L2 : public M { }; struct R2 : public M { }; struct D2 : public L2, R2 { }; void diamond() { M m; static_cast<L1 *>(&m)->f = 7; static_cast<R1 *>(&m)->f = 16; int L1::* pl1 = &B::f; int M::* pm_via_l1 = pl1; int R1::* pr1 = &B::f; int M::* pm_via_r1 = pr1; clang_analyzer_eval(m.*(pm_via_l1) == 7); // expected-warning {{TRUE}} clang_analyzer_eval(m.*(pm_via_r1) == 16); // expected-warning {{TRUE}} }
I suspect you'll need to do something similar to what StoreManager::evalDerivedToBase() does (or maybe just use that function) to figure out the correct location to read and write from when accessing via a pointer to member.
It would also be good to add some tests for the double diamond scenario to ensure the list of path specifiers is constructed in the right order. For example:
void double_diamond() { D2 d2; static_cast<L1 *>(static_cast<L2 *>(&d2))->f = 1; static_cast<L1 *>(static_cast<R2 *>(&d2))->f = 2; static_cast<R1 *>(static_cast<L2 *>(&d2))->f = 3; static_cast<R1 *>(static_cast<R2 *>(&d2))->f = 4; clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int L2::*>(static_cast<int L1::*>(&B::f)))) == 1); // expected-warning {{TRUE}} clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int R2::*>(static_cast<int L1::*>(&B::f)))) == 2); // expected-warning {{TRUE}} clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int L2::*>(static_cast<int R1::*>(&B::f)))) == 3); // expected-warning {{TRUE}} clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int R2::*>(static_cast<int R1::*>(&B::f)))) == 4); // expected-warning {{TRUE}} }
include/clang/StaticAnalyzer/Core/PathSensitive/BasicValueFactory.h | ||
---|---|---|
217 ↗ | (On Diff #77041) | In functional paradigms, 'cons' is used to mean prepending an item the the beginning of a linked list. https://en.wikipedia.org/wiki/Cons In my opinion, 'prepend' is better than 'cons', which I find super-confusing. I don't think 'concatenate' is quite right, since typically that operation combines two (or more) lists. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
899 ↗ | (On Diff #79560) | Just sticking this in the middle of a fallthrough cascade seems quite brittle. For example, it relies on the sub expression of a unary deref never being a DeclRefExpr to a field. This may be guaranteed by Sema (?) but if so it is quite a non-obvious invariant to rely upon. I think it would be better the restructure this so that the AddrOf case doesn't fall in the the middle of a fallthrough cascade. You could either factor out the default behavior into a function or use a goto. |
322 ↗ | (On Diff #77041) | These conditional fallthroughs make me very, very uncomfortable because they will broken if any of the intervening cases get special handling in the future. I think it would safer to factor out code in the "destination" case (here 'CK_LValueBitCast') into a function, call it directly, and then continue regardless of the branch. Another possibility is to use gotos to directly jump to the default 'destination' case. |
472 ↗ | (On Diff #77041) | I think it would be good to be explicit about this fallthrough behavior, as well. |
lib/StaticAnalyzer/Core/SValBuilder.cpp | ||
298 ↗ | (On Diff #79560) | Can this be removed? There are no tests for it. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
899 ↗ | (On Diff #79560) | It seems that UO_Extension case is the default behavior for this case cascade (just below UO_AddrOf). AFAIU it would be better to explicitly call the default behavior function following by break for each case preceding UO_Extension (UO_Plus,UO_Deref and UO_AddrOf). Or i missed something? |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
899 ↗ | (On Diff #79560) | Or maybe just move UO_AddrOf to the beginning of the case cascade and proceed with the default behavior if it falls (no need to change UO_Plus and UO_Deref in that case)? |
Thanks for your comments, Devin! You were right about the list of path specifiers construction order, so i fix it. Now the base specifier list is being used for figuring out the correct subobject field. Also this diff changes fallthrough behavior in some cases and renames functions with 'cons' prefix.
Looks good to me, other than some super tiny nits! Thanks for iterating on this -- it is awesome that the analyzer will be able to model this now!!
include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h | ||
---|---|---|
463 ↗ | (On Diff #81352) | I think this deserves a comment about why it is a NonLoc and why it needs to keep the PointerToMemberData with the list of CXXBaseSpecifiers. |
lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
269 ↗ | (On Diff #81352) | While we're here we might as well correct the misspelling --> "path sensitivity". |
test/Analysis/pointer-to-member.cpp | ||
74 ↗ | (On Diff #81352) | Can you include the warning text in the expected-warning directive? This will ensure we keep emitting the correct warning in the future. |