Page MenuHomePhabricator

[analyzer] Add a new SVal to support pointer-to-member operations.

Authored by kromanenkov on Oct 11 2016, 7:19 AM.



Add a new type of NonLoc SVal for pointer-to-member operations. This SVal supports both pointers to member functions and pointers to member data.

Diff Detail


Event Timeline

kromanenkov retitled this revision from to [analyzer] Add a new SVal to support pointer-to-member operations..
kromanenkov updated this object.
kromanenkov added reviewers: zaks.anna, NoQ, dcoughlin.
kromanenkov added subscribers: a.sidorin, cfe-commits.
NoQ edited edge metadata.Oct 12 2016, 7:09 AM

Yay, thanks for posting this! :)

I've got a bit of concern for some assert-suppressions.

2314 ↗(On Diff #74249)

Hmm. Why would anybody try to load anything from a plain pointer-to-member, rather than from a pointer-to-member-applied-to-an-object (which would no longer be represented by a PointerToMember value)? I suspect there's something wrong above the stack (or one of the sub-expression SVals is incorrect), because otherwise i think that making PointerToMember a NonLoc is correct - we cannot store things in it or load things from it.

465 ↗(On Diff #74249)

cast<>()? It seems that all dynamic casts here are asserted to succeed.

69 ↗(On Diff #74249)

Same concern: Why are we copying a NonLoc?

79 ↗(On Diff #74249)

In fact, maybe dereferencing a null pointer-to-member should produce an UndefinedVal, which could be later caught by core.uninitialized.UndefReturn. I wonder why doesn't this happen.

kromanenkov added inline comments.Oct 14 2016, 12:40 PM
2314 ↗(On Diff #74249)

Brief analysis shows that we call evalLoad of pointer-to-member SVal after ExprEngine::VisitCast call when cast kind is CK_LValueToRValue. Skipping this check leads to assertion fail in pointer-to-member.cpp test. I will investigate the other ways to supress this assertion.

79 ↗(On Diff #74249)

In fact, I plan to caught dereferencing of null pointer-to-members by the checker which I intend to commit after this patch :) So, do you think that the check for dereferencing a null pointer-to-member should be a part of an analyzer core?

NoQ added inline comments.Oct 18 2016, 1:12 AM
79 ↗(On Diff #74249)

That'd be great! However, we're doing a lot of such double-work in case a checker to find the defect is not enabled.

For example, as seen by ExprEngine, result of division by zero, or of reading past an array bound, or of reading an unitialized variable, is UndefinedVal. For division by zero and for reading past an array bound, there's a checker that immediately terminates the analysis when the error occurs, which makes it completely irrelevant how exactly ExprEngine models the erroneous operation. In case of array bound, however, this checker is alpha and disabled by default. For reading an unitialized variable, no checker immediately warns, and UndefinedVal proceeds to exist until it is actually used anywhere.

So i think this tradition is worth keeping, and the result of null pointer-to-member dereference should first be modeled as UndefinedVal.

More importantly, however, it is worth investigating why doesn't it already magically happen. I suspect it might be related to the other problem of loading from a non-loc.

kromanenkov edited edge metadata.

Null pointer-to-member dereference is now modeled as UndefinedVal. Fixing this also releases us from loading from nonloc SVal.
I delete an assertion suppression in ExprEngine::performTrivialCopy because I can't reproduce its violation.

dcoughlin added inline comments.Oct 18 2016, 6:30 PM
462 ↗(On Diff #75020)

I don't think pattern matching on the sub expression to find the referred-to declaration is the right thing to do here. It isn't always the case that the casted expression will be a unary pointer to member operation. For example, this is perfectly fine and triggers an assertion failure on your patch:

struct B {
  int f;

struct D : public B {
  int g;

void foo() {
  D d;
  d.f = 7;

  int B::* pfb = &B::f;
  int D::* pfd = pfb;
  int v = d.*pfd;

Note that you can't just propagate the value already computed for the subexpression. Here is a particularly annoying example from the C++ spec:

struct B {
  int f;
struct L : public B { };
struct R : public B { };
struct D : public L, R { };

void foo() {
  D d;

  int B::* pb = &B::f;
  int L::* pl = pb;
  int R::* pr = pb;

  int D::* pdl = pl;
  int D::* pdr = pr;

  clang_analyzer_eval(pdl == pdr); // FALSE
  clang_analyzer_eval(pb == pl); // TRUE

My guess is this will require accumulating CXXBasePath s or something similar for each cast. I don't know how to do this efficiently.

dcoughlin added inline comments.Oct 18 2016, 10:20 PM
462 ↗(On Diff #75020)

I don't know how to do this efficiently.

Jordan suggested storing this in a bump-pointer allocated object with a lifetime of the AnalysisDeclContext. The common case is no multiple inheritance, so that should be the fast case.

Maybe the Data could be a pointer union between a DeclaratorDecl and an immutable linked list (with sharing) of CXXBaseSpecifiers from the CastExprs in the AST. The storage for this could be managed with a new manager in SValBuilder.

dcoughlin added inline comments.Oct 18 2016, 10:35 PM
462 ↗(On Diff #75020)

(The bump pointer-allocated thing would have to have the Decl as well.)

This could also probably live in BasicValueFactory. The extra data would be similar to LazyCompoundValData.

kromanenkov marked 4 inline comments as done.Oct 24 2016, 5:44 AM
kromanenkov added inline comments.Oct 24 2016, 6:24 AM
462 ↗(On Diff #75020)

My understanding is that PointerToMember SVal should be represented similar to LazyCompoundVal SVal. If I got you right, PointerToMember SVal should be constructed in VisitUnaryOperator and VisitCast (CK_BaseToDerivedMemberPointer and so on) just add CXXBaseSpecifiers to this SVal?
What do you mean by sharing of immutable linked list?

dcoughlin added inline comments.Oct 24 2016, 9:29 AM
462 ↗(On Diff #75020)

SVal has very strict size limitations -- you can only store a single pointer for its Data. Since you'll need to have multiple CXXBaseSpecifiers in a single SVal (for example, consider a multiple inheritance hierarchy with a double diamond), the storage for this container will have to live elsewhere.

One way to do this is to bump-pointer allocate linked list nodes for the "path" of casts that the value has taken. If you do this then you can share the memory for the nodes for a shared path prefix for two paths with a common prefix.

kromanenkov updated this revision to Diff 77041.Nov 7 2016, 8:48 AM

According to dcoughlin suggestion PointerToMember SVal is now modeled as a PointerUnion of DeclaratorDecl and bump pointer-allocated object stored DeclaratorDecl and immutable linked list of CXXBaseSpecifiers.

kromanenkov marked an inline comment as done.Nov 7 2016, 8:50 AM
NoQ added a comment.Nov 28 2016, 5:40 AM

This looks good to me (bikeshedded a bit), but i think Devin should have another look, because his comments were way deeper than mine.

217 ↗(On Diff #77041)

Hmm, is it "construct"? Or "constrain"? I think a verbose name wouldn't hurt.
In fact, i suspect that consVals are rather conSVals, where con stands for "concatenate".

882 ↗(On Diff #75020)

Hmm, do we need to cover CXXMethodDecl here? Or is it modeled as a function pointer anyway, so no action is needed? Worth commenting, i think.

Thanks for your comments, Artem! Make function name less ambiguous. Also I take liberty to update analogical function name.

kromanenkov marked an inline comment as done.Nov 29 2016, 6:45 AM
kromanenkov added inline comments.
217 ↗(On Diff #77041)

In fact I thought that this entails "consume" :)

882 ↗(On Diff #75020)

AFAIU CXXMethodDecl is processed in SVal::getAsFunctionDecl() so i'm for no action.

kromanenkov marked an inline comment as done.Nov 29 2016, 6:46 AM
dcoughlin edited edge metadata.Nov 30 2016, 2:08 PM

PointerToMemberData looks like it is on the right track! One thing that is still missing is using the base specifier list to figure out the correct subobject field to read and write from. This is important when there is non-virtual multiple inheritance, as there will be multiple copies of the same field in the same object.

Here are is an example where the current patch is not quite right; both of these should evaluate to true:

void clang_analyzer_eval(int);

struct B {
  int f;
struct L1 : public B { };
struct R1 : public B { };
struct M : public L1, R1 { };
struct L2 : public M { };
struct R2 : public M { };
struct D2 : public L2, R2 { };

void diamond() {
  M m;

  static_cast<L1 *>(&m)->f = 7;
  static_cast<R1 *>(&m)->f = 16;

  int L1::* pl1 = &B::f;
  int M::* pm_via_l1 = pl1;

  int R1::* pr1 = &B::f;
  int M::* pm_via_r1 = pr1;

  clang_analyzer_eval(m.*(pm_via_l1) == 7); // expected-warning {{TRUE}}
  clang_analyzer_eval(m.*(pm_via_r1) == 16); // expected-warning {{TRUE}}

I suspect you'll need to do something similar to what StoreManager::evalDerivedToBase() does (or maybe just use that function) to figure out the correct location to read and write from when accessing via a pointer to member.

It would also be good to add some tests for the double diamond scenario to ensure the list of path specifiers is constructed in the right order. For example:

void double_diamond() {
  D2 d2;

  static_cast<L1 *>(static_cast<L2 *>(&d2))->f = 1;
  static_cast<L1 *>(static_cast<R2 *>(&d2))->f = 2;
  static_cast<R1 *>(static_cast<L2 *>(&d2))->f = 3;
  static_cast<R1 *>(static_cast<R2 *>(&d2))->f = 4;

  clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int L2::*>(static_cast<int L1::*>(&B::f)))) == 1); // expected-warning {{TRUE}}
  clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int R2::*>(static_cast<int L1::*>(&B::f)))) == 2); // expected-warning {{TRUE}}
  clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int L2::*>(static_cast<int R1::*>(&B::f)))) == 3); // expected-warning {{TRUE}}
  clang_analyzer_eval(d2.*(static_cast<int D2::*>(static_cast<int R2::*>(static_cast<int R1::*>(&B::f)))) == 4); // expected-warning {{TRUE}}
217 ↗(On Diff #77041)

In functional paradigms, 'cons' is used to mean prepending an item the the beginning of a linked list.

In my opinion, 'prepend' is better than 'cons', which I find super-confusing. I don't think 'concatenate' is quite right, since typically that operation combines two (or more) lists.

322 ↗(On Diff #77041)

These conditional fallthroughs make me very, very uncomfortable because they will broken if any of the intervening cases get special handling in the future.

I think it would safer to factor out code in the "destination" case (here 'CK_LValueBitCast') into a function, call it directly, and then continue regardless of the branch.

Another possibility is to use gotos to directly jump to the default 'destination' case.

472 ↗(On Diff #77041)

I think it would be good to be explicit about this fallthrough behavior, as well.

899 ↗(On Diff #79560)

Just sticking this in the middle of a fallthrough cascade seems quite brittle. For example, it relies on the sub expression of a unary deref never being a DeclRefExpr to a field. This may be guaranteed by Sema (?) but if so it is quite a non-obvious invariant to rely upon.

I think it would be better the restructure this so that the AddrOf case doesn't fall in the the middle of a fallthrough cascade. You could either factor out the default behavior into a function or use a goto.

298 ↗(On Diff #79560)

Can this be removed? There are no tests for it.

kromanenkov added inline comments.Dec 13 2016, 4:19 AM
899 ↗(On Diff #79560)

It seems that UO_Extension case is the default behavior for this case cascade (just below UO_AddrOf). AFAIU it would be better to explicitly call the default behavior function following by break for each case preceding UO_Extension (UO_Plus,UO_Deref and UO_AddrOf). Or i missed something?

kromanenkov added inline comments.Dec 13 2016, 5:05 AM
899 ↗(On Diff #79560)

Or maybe just move UO_AddrOf to the beginning of the case cascade and proceed with the default behavior if it falls (no need to change UO_Plus and UO_Deref in that case)?

kromanenkov edited edge metadata.

Thanks for your comments, Devin! You were right about the list of path specifiers construction order, so i fix it. Now the base specifier list is being used for figuring out the correct subobject field. Also this diff changes fallthrough behavior in some cases and renames functions with 'cons' prefix.

kromanenkov marked 4 inline comments as done.Dec 14 2016, 2:14 AM
dcoughlin accepted this revision.Dec 14 2016, 2:42 PM
dcoughlin edited edge metadata.

Looks good to me, other than some super tiny nits! Thanks for iterating on this -- it is awesome that the analyzer will be able to model this now!!

463 ↗(On Diff #81352)

I think this deserves a comment about why it is a NonLoc and why it needs to keep the PointerToMemberData with the list of CXXBaseSpecifiers.

269 ↗(On Diff #81352)

While we're here we might as well correct the misspelling --> "path sensitivity".

74 ↗(On Diff #81352)

Can you include the warning text in the expected-warning directive? This will ensure we keep emitting the correct warning in the future.

This revision is now accepted and ready to land.Dec 14 2016, 2:42 PM
kromanenkov edited edge metadata.

Fix issues pointed by @dcoughlin and rebase patch on master.

This revision was automatically updated to reflect the committed changes.