Page MenuHomePhabricator

[analyzer] Fix reinterpret_cast handling for pointer-to-member
Needs RevisionPublic

Authored by RedDocMD on Feb 18 2021, 10:29 AM.

Details

Summary

If the cast is to a class such that it is possible to reach the correct
sub-object containing the pointed member, a path to that class is retrieved from the CXXRecordDecl.
This path is used to construct the new
pointer-to-member data. If there is no such path, the resulting node is
not analyzed further.

Diff Detail

Unit TestsFailed

TimeTest
50 msx64 debian > Flang.Semantics::resolve102.f90
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/flang/test/Semantics/test_errors.sh /mnt/disks/ssd0/agent/llvm-project/flang/test/Semantics/resolve102.f90 /mnt/disks/ssd0/agent/llvm-project/build/tools/flang/test/Semantics/Output/resolve102.f90.tmp /mnt/disks/ssd0/agent/llvm-project/build/bin/f18 -intrinsic-module-directory /mnt/disks/ssd0/agent/llvm-project/build/tools/flang/include/flang

Event Timeline

RedDocMD created this revision.Feb 18 2021, 10:29 AM
RedDocMD requested review of this revision.Feb 18 2021, 10:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2021, 10:29 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@vsavchenko, does it look okay?

@vsavchenko, could you please review this?

@steakhal, could you please review this?

@steakhal, could you please review this?

The longer I stare at it the more questions I have. However, I don't know how member pointers work in the analyzer and I'm not really in the mood to dive deep into that territory.

Could you elaborate on what approach did you choose and more importantly, why that approach?
Why do we need a graph search here?
What cases do you try to resolve? I mean, there is a single test-case, which has reinterpret casts back and forth (literally), and it's not immediately clear to me why are you doing those gymnastics.
Maybe worth creating other test-cases as well. Probably covering multiple inheritance or virtual inheritance?

Your code has shared pointers, which is interesting, but I would favor unique pointers unless you can defend this use-case.
Also, in general, we prefer algorithms to raw for or while loops if they are semantically equivalent.
Besides all of these, I can see a few copy-pasted lines here and there. I'm not saying that it's bad, I'm just highlighting this fact.

I can not confirm the implementation and accept the patch.
Improving the summary, helping reviewers with helpful notes here and there could give the boost you need to get this reviewed.
BTW, the review process takes ages in the analyzer community. We should also improve on that ofc, but the first step is always done by you (and here I mean not specifically you but we all).

clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
554–555

I know that you just copy-pasted this, but what is this xD

clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
4–5

I assume this is resolved by now.

RedDocMD marked 2 inline comments as done.Feb 25 2021, 6:07 AM

Many thanks for you comments, @steakhal!
I will address the issues you have pointed out in this comment. To clean things up I should perhaps add some more clarification to the summary.

Could you elaborate on what approach did you choose and more importantly, why that approach?
Why do we need a graph search here?

Pointer-to-members contain two things - a pointer to a NamedDecl to store the field/method being pointed to and a list of CXXBaseSpeciifier. This list is used to determine which sub-object the member lies in. This path needs to be determined and unfortunately with reinterpret_cast, the AST provides no structural assistance (unlike static_cast). Hence, it needs to be searched by searching through the BaseSpecifiers of the CXXRecordDecl.
As for the need of graph search, it is due to multiple-inheritance. For a given class, it may have two or more bases and we need to follow into both of them to find the path to the required sub-object.
Why do I use BFS? Because one branch of the (inverted) inheritance tree may be very deep yet not contain the required class. But DFS will first exhaust it and then go the other branch. BFS on the other hand will find the shortest path to the correct base optimally, and this shortest path is all we need.

What cases do you try to resolve? I mean, there is a single test-case, which has reinterpret casts back and forth (literally), and it's not immediately clear to me why are you doing those gymnastics.
Maybe worth creating other test-cases as well. Probably covering multiple inheritance or virtual inheritance?

I will add a test case for multiple inheritance and perhaps replace existing tests with clearer tests. Virtual inheritance will not help since pointer-to-member to virtual base field/method is not allowed.
There are three cases which I have handled:

  1. The required class is the class we are casting to
  2. The required class is higher in the class hierarchy
  3. The required class is lower in the class hierarchy
  4. The required class is not related to the class being cast to.

For Case 1 and 2, a valid path can be calculated and so a pointer-to-member is created.
For Case 3 and 4, no valid path can be calculated and so it doesn't make sense to further pursue static analysis down this path.
Note that the required class is determined by the actual field/member we are pointing to and is not explicit in the reinterpret_cast.

Your code has shared pointers, which is interesting, but I would favor unique pointers unless you can defend this use-case.

The reason for not using unique_ptr is as follows:- each BFSNode holds a pointer to its parent. This cannot be a unique_ptr since there may be multiple children to a BFSNode. This must be a shared_ptr as a consequence.

Also, in general, we prefer algorithms to raw for or while loops if they are semantically equivalent.

I am not aware of any std/LLVM algorithm that perform BFS. If there are any, I would be glad to learn more about them.
I am going to replace the for loops with a range-for loop. Sorry about that one!

Besides all of these, I can see a few copy-pasted lines here and there. I'm not saying that it's bad, I'm just highlighting this fact.

Yes this was a conscious choice. The other alternative perhaps is to use fall-through, which I think would be confusing in this case since it will be obscured by the intermediate code.

clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
554–555

I know that you just copy-pasted this, but what is this xD

Not entirely sure, but I think it performs tasks required to handle the pointer-to-member created or do the requisite default in the absence of a pointer-to-member.

clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
4–5

I assume this is resolved by now.

Sorry, that was a lame mistake on my part.

RedDocMD updated this revision to Diff 326393.Feb 25 2021, 7:58 AM
RedDocMD marked 2 inline comments as done.

Removed explicit for-loop with range-for loop

RedDocMD updated this revision to Diff 326410.Feb 25 2021, 8:48 AM

Cleaned up tests, added test above class hierarchy

RedDocMD updated this revision to Diff 326413.Feb 25 2021, 8:56 AM

Added test for multiple inheritance

RedDocMD updated this revision to Diff 326414.Feb 25 2021, 9:02 AM

Added some more comments to increase clarity

Many thanks for you comments, @steakhal!
I will address the issues you have pointed out in this comment. To clean things up I should perhaps add some more clarification to the summary.

Could you elaborate on what approach did you choose and more importantly, why that approach?
Why do we need a graph search here?

Pointer-to-members contain two things - a pointer to a NamedDecl to store the field/method being pointed to and a list of CXXBaseSpeciifier. This list is used to determine which sub-object the member lies in. This path needs to be determined and unfortunately with reinterpret_cast, the AST provides no structural assistance (unlike static_cast). Hence, it needs to be searched by searching through the BaseSpecifiers of the CXXRecordDecl.

So here you are implying that for reinterpret casts, the CastE->path_begin()-end() range is empty? And that is why you need to manually come up with a possible CXXBaseSpecifier list?

I strongly encourage you to find a better alternative to this manual graph search.
This information must be available somewhere.

Have you looked at CXXRecordDecl::isDerivedFrom()? Or any other member functions of that class?

So here you are implying that for reinterpret casts, the CastE->path_begin()-end() range is empty? And that is why you need to manually come up with a possible CXXBaseSpecifier list?

@steakhal, Yes. For illustration, the following code:

struct Base {
  int field;
};

struct Derived : public Base {};
struct DoubleDerived : public Derived {};

int f() {
  int DoubleDerived::*ddf = &Derived::field;
  int Base::*bf1 = static_cast<int Base::*>(ddf);
  int Base::*bf2 = reinterpret_cast<int Base::*>(ddf);
}

produces the following (truncated) AST dump

-FunctionDecl 0x12f3890 <line:8:1, line:12:1> line:8:5 f 'int ()'
  `-CompoundStmt 0x12f3ec0 <col:9, line:12:1>
    |-DeclStmt 0x12f3b70 <line:9:3, col:44>
    | `-VarDecl 0x12f3a18 <col:3, col:39> col:23 used ddf 'int DoubleDerived::*' cinit
    |   `-ImplicitCastExpr 0x12f3b48 <col:29, col:39> 'int DoubleDerived::*' <BaseToDerivedMemberPointer (Derived -> Base)>
    |     `-UnaryOperator 0x12f3b30 <col:29, col:39> 'int Base::*' prefix '&' cannot overflow
    |       `-DeclRefExpr 0x12f3ad0 <col:30, col:39> 'int' lvalue Field 0x12f33d0 'field' 'int'
    |-DeclStmt 0x12f3d20 <line:10:3, col:49>
    | `-VarDecl 0x12f3bf0 <col:3, col:48> col:14 bf1 'int Base::*' cinit
    |   `-CXXStaticCastExpr 0x12f3ce0 <col:20, col:48> 'int Base::*' static_cast<int struct Base::*> <DerivedToBaseMemberPointer (Derived -> Base)>
    |     `-ImplicitCastExpr 0x12f3cc8 <col:45> 'int DoubleDerived::*' <LValueToRValue> part_of_explicit_cast
    |       `-DeclRefExpr 0x12f3c60 <col:45> 'int DoubleDerived::*' lvalue Var 0x12f3a18 'ddf' 'int DoubleDerived::*'
    `-DeclStmt 0x12f3ea8 <line:11:3, col:54>
      `-VarDecl 0x12f3d88 <col:3, col:53> col:14 bf2 'int Base::*' cinit
        `-CXXReinterpretCastExpr 0x12f3e78 <col:20, col:53> 'int Base::*' reinterpret_cast<int struct Base::*> <ReinterpretMemberPointer>
          `-ImplicitCastExpr 0x12f3e60 <col:50> 'int DoubleDerived::*' <LValueToRValue> part_of_explicit_cast
            `-DeclRefExpr 0x12f3df8 <col:50> 'int DoubleDerived::*' lvalue Var 0x12f3a18 'ddf' 'int DoubleDerived::*'

If you compare the three cases:

  1. The first statement has an implicit cast inserted during parsing, and so we need not calculate the path manually
  2. The second statement has a static_cast and it also supplies a path, except that we need to remove these bases instead of adding. (Ref to D95877).
  3. The third statement has no path provided, because it is a reinterpret_cast and the parser cannot produce a path in all cases (if you cast to an integer for an instance) - so it doesn't produce any path

With the parser not providing us with info, we have to figure it out in a different way.
Also, it is not one possible path but the only possible path. If there are multiple acceptable paths then it is illegal to define such a member-pointer and the frontend will reject it as such.

I strongly encourage you to find a better alternative to this manual graph search.
This information must be available somewhere.

Have you looked at CXXRecordDecl::isDerivedFrom()? Or any other member functions of that class?

Unfortunately, all the methods on CXXRecordDecl, like the one you mentioned, are for querying and thus returns a bool, while I need the entire path.

Unfortunately, all the methods on CXXRecordDecl, like the one you mentioned, are for querying and thus returns a bool, while I need the entire path.

AFAIK the second overload accepts an out parameter serving exactly our needs. Could you check if my assumption is correct?

RedDocMD updated this revision to Diff 326731.Feb 26 2021, 10:12 AM

Replaced BFS with existing CXXRecordDeclMethod

RedDocMD edited the summary of this revision. (Show Details)Feb 26 2021, 10:12 AM

@steakhal, you are absolutely right! It works. Thank you for pointing it out, not sure how I missed this earlier this evening.

RedDocMD updated this revision to Diff 326733.Feb 26 2021, 10:15 AM

Removed unnecessary includes

steakhal requested changes to this revision.Mar 1 2021, 3:08 AM

I have serious concerns inline.

clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
20–21

Please have a note describing why you are doing this roundtrip.

29–31

The assignment is actually UB.
TBH I don't know how to test such behavior xD
Same for the next example.

Shouldn't it return undef for reading via an invalid member pointer?

43–50

An ASCII art would help so much:

A   C(field)
|   |
B   D
 \ /
  E
  |
  F

However, I'm still missing a diamond-shaped inheritance.

52–62

Wait a minute. It's not how it works.
How I imagine member pointers, they are just offsets.
&F::field is notionally equivalent with offsetof(F, field). That being said, You can not apply this member pointer to any object besides F.
Imagine if the classes of the inheritance tree would have other fields as well.
Then the offsetof(T, field) would be different for F, and C.

This example demonstrates that both of these member pointer dereferences are UB.
https://godbolt.org/z/15sMEP
It returns different values depending on the optimization level, which is a clear sign of UB.
BTW this issue is closely related to strict aliasing.

This revision now requires changes to proceed.Mar 1 2021, 3:08 AM
RedDocMD marked an inline comment as done.Mar 1 2021, 3:28 AM
RedDocMD added inline comments.
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
43–50

An ASCII art would help so much:

A   C(field)
|   |
B   D
 \ /
  E
  |
  F

However, I'm still missing a diamond-shaped inheritance.

Thanks for the ASCII art!
The diamond-shaped inheritance as far as I understood will cause illegal code.
Eg:

   A
   |
   B
 /   \
C     D
 \   /
   E

According to my understanding, if I have a field in A or B, and try to define a member pointer like int E::* ef = &A::field, it is not allowed.
On GCC, this is the error message I get:

struct A {
  int field;
};

struct B : public A {};
struct C : public virtual B {};
struct D : public virtual B {};
struct E : public C, public D {};

int main() {
  int E::* ef1 = &A::field;
}

diamond-member-pointer.cpp: In function ‘int main()’:
diamond-member-pointer.cpp:11:22: error: pointer to member conversion via virtual base ‘B’

11 |   int E::* ef1 = &A::field;
   |                      ^~~~~
RedDocMD marked an inline comment as done.Mar 2 2021, 5:18 AM
RedDocMD added inline comments.
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
20–21

You mean why a simpler test wouldn't suffice? Well I think it would, but this was a common corner identified by @vsavchenko and I have put it in as a result.

29–31

I don't quite know what is undef. Is it a special macro or something?
Using an invalid member pointer is definitely UB, but I also need to show that casting to an invalid pointer is properly handled because that is not UB. I guess I will remove the member access and claim that since there was no crash, it is okay and has been handled appropriately.

52–62

The member access on A is definitely UB, I guess I will do what I proposed in the Some case.
I don't think the other one is. Consider the following:

struct A {};
struct B : public A {};
struct C {
  int field;
};
struct D : public C {};
struct E : public B, public D {};
struct F : public E {};

int main() {
  int F::* ff = &F::field;
  int C::* cf1 = static_cast<int C::*>(ff);
  int C::* cf2 = reinterpret_cast<int C::*>(ff);
  C c;
  c.*cf1 = 10;
  c.*cf2 = 10;
  return 0;
}

cf1 and cf2 are the same thing, except that they are declared differently (one via static_cast, other via reinterpret_cast). If we look at the AST (truncated to the first three lines of main):

CompoundStmt 0x1a4fe18 <col:12, line:18:1>
    |-DeclStmt 0x1a4f3a8 <line:11:3, col:26>
    | `-VarDecl 0x1a21078 <col:3, col:21> col:12 used ff 'int F::*' cinit
    |   `-ImplicitCastExpr 0x1a4f378 <col:17, col:21> 'int F::*' <BaseToDerivedMemberPointer (E -> D -> C)>
    |     `-UnaryOperator 0x1a4f360 <col:17, col:21> 'int C::*' prefix '&' cannot overflow
    |       `-DeclRefExpr 0x1a4f2f8 <col:18, col:21> 'int' lvalue Field 0x1a207d0 'field' 'int'
    |-DeclStmt 0x1a4f560 <line:12:3, col:43>
    | `-VarDecl 0x1a4f428 <col:3, col:42> col:12 used cf1 'int C::*' cinit
    |   `-CXXStaticCastExpr 0x1a4f518 <col:18, col:42> 'int C::*' static_cast<int struct C::*> <DerivedToBaseMemberPointer (E -> D -> C)>
    |     `-ImplicitCastExpr 0x1a4f500 <col:40> 'int F::*' <LValueToRValue> part_of_explicit_cast
    |       `-DeclRefExpr 0x1a4f498 <col:40> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*'
    |-DeclStmt 0x1a4f6e8 <line:13:3, col:48>
    | `-VarDecl 0x1a4f5c8 <col:3, col:47> col:12 used cf2 'int C::*' cinit
    |   `-CXXReinterpretCastExpr 0x1a4f6b8 <col:18, col:47> 'int C::*' reinterpret_cast<int struct C::*> <ReinterpretMemberPointer>
    |     `-ImplicitCastExpr 0x1a4f6a0 <col:45> 'int F::*' <LValueToRValue> part_of_explicit_cast
    |       `-DeclRefExpr 0x1a4f638 <col:45> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*'

Notice how the static_cast figures out the path to the correct subobject. This is how member-pointers are handled as far as I can tell.
Unfortunately, Stroustrup is surprisingly scant when it comes to this topic. I am trying to dig through the Standard.

steakhal added inline comments.Mar 2 2021, 6:47 AM
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
29–31

By undef, I was referring to the clang::ento::UndefinedVal - which usually represents values that shouldn't be used, not even read. For example, an uninitialized variable has undefined value. Or an out-of-bound memory access also produces undefined value. There are a few checkers that are hunting for such undefined values like CallAndMessageChecker, UndefBranchChecker, UndefinedArraySubscriptChecker, UndefinedAssignmentChecker, UndefResultChecker.

Probably removing the member accesses is the best you can do.

but I also need to show that casting to an invalid pointer is properly handled because that is not UB

In this test, I can't see where you cast the member pointer back to make it valid again.

43–50

Oh yea, now I get it.
When it inherits virtually, you can not take the address of it, and if you don't inherit virtually then it's ambiguous. Either way, it won't compile.
My bad. However, my other comments still apply.

52–62

I tried to highlight that the classes could have non-static data members which could cause your assumption about cf1 == cf2 not to hold anymore. See my previously attached godbolt link.

static_cast<T>(p) makes sure that the offset that the member pointer p represents is updated accordingly to match the offset of the field member within the new type T. Usually, it involves some addition/subtraction to accommodate this.
While static_cast<T>(p) does not update the value, just reinterprets it in a different way - which usually results in an invalid pointer though and a bunch of subtle rules kick in such as the type similarity, pointer interchangeability, which in general known as strict-aliasing.

RedDocMD marked an inline comment as done.Mar 2 2021, 7:18 AM
RedDocMD added inline comments.
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
29–31

In this test, I can't see where you cast the member pointer back to make it valid again.

I meant that sf when used will lead to undefined behaviour. But the statement int Some::*sf = reinterpret_cast<int Some::*>(ddf); isn't undefined, is it? AFAIK, reinterpret_cast is the C++ equivalent of the C unchecked cast - You can cast pointers as you wish but don't bet on the result.
On the other hand, pointer-to-members are unlike regular pointers, so there may be a catch. But all I really require is that the statement int Some::*sf = reinterpret_cast<int Some::*>(ddf); to not drive the Static Analyzer crazy - it simply gives up on trying to reason about sf. The test would be to confirm that. So removing the member access should solve it

52–62

While static_cast<T>(p) does not update the value, just reinterprets it in a different way - which usually results in an invalid pointer though and a bunch of subtle rules kick in such as the type similarity, pointer interchangeability, which in general known as strict-aliasing.

Presumably you mean reinterpret_cast.

Ah, I see. This makes it glaringly obvious: Another Godbolt link
So I guess handling reinterpret_cast for pointer-to-members pointless. I guess the idea will be to simply not analyze pointer-to-members which have been obtained through reinterpret_cast.

steakhal added inline comments.Mar 2 2021, 9:13 AM
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
29–31

I meant that sf when used will lead to undefined behaviour

It depends on how you define use.
It's fine to make copy of the pointer sf, but dereferencing sf is not. My problem, as I carefully highlighted, is about the dereference.

You can do exactly 2 things with the invalid sf pointer:

  • cast it back to the original type, or
  • simply not use (not dereference) :D

The result of the dereference of an invalid pointer could be modelled as either UnknownVal or UndefinedVal. The latter would be more accurate, but the former is also a valid approximation.

52–62

Presumably you mean reinterpret_cast.

Yes, sorry for copy-pasting too much xD

So I guess handling reinterpret_cast for pointer-to-members pointless.

I don't know if it's really pointless, but hard to model precisely for sure.

I guess the idea will be to simply not analyze pointer-to-members which have been obtained through reinterpret_cast.

I guess, you could just return unknown? I don't know. This area is not my expertise, I just wanted to get you going with my observations.

RedDocMD added inline comments.Mar 3 2021, 1:44 AM
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp
52–62

I guess, you could just return unknown?

If I return nullptr as PointerToMemberData, then the analyzer will mark the pointer-to-member as unknown [this is done by handleLVectorSplat]

I don't know if it's really pointless, but hard to model precisely for sure.

Really hard to model and has too many cases and ifs and buts. Like although the member-pointer obtained by reinterpret_cast probably has the wrong offset, we still don't get seg-fault. Curious, but seems to happen for both GCC and Clang.

Talking on the mailing list, I got a link on reinterpret_casting for pointer-to-member: https://timsong-cpp.github.io/cppwp/n4861/expr.reinterpret.cast#10
The gist is that this sort of cast is only valid if we cast back to the original type (that too with a caveat). Now currently there are three things that can be done:

  • Leave things the way they are - reinterpret_cast will cause an assertion to fail
  • Never analyze reinterpret_cast - this is easy, just return nullptr for PointerToMemberData
  • Properly model this - this would require possibly adding more information in PointerToMember and a significantly complex logic to handle it.

The first two are trivially done. I would like to know if it is worth following the third - especially since it is a really obscure feature.
@vsavchenko, @NoQ, @steakhal what do you think should be done?

Talking on the mailing list, I got a link on reinterpret_casting for pointer-to-member: https://timsong-cpp.github.io/cppwp/n4861/expr.reinterpret.cast#10
The gist is that this sort of cast is only valid if we cast back to the original type (that too with a caveat).

What is that caveat?

Now currently there are three things that can be done:

  • Leave things the way they are - reinterpret_cast will cause an assertion to fail
  • Never analyze reinterpret_cast - this is easy, just return nullptr for PointerToMemberData
  • Properly model this - this would require possibly adding more information in PointerToMember and a significantly complex logic to handle it.

Do you have anything specific in mind about implementing the 3rd option?
Why do you think it's significantly complex?

Leaving assertion failures is not an option.
IMO if this feature is important to you, then you should pursue the 3rd.

What is that caveat?

The second point in the link I gave above.

Do you have anything specific in mind about implementing the 3rd option?
Why do you think it's significantly complex?

The modelling is not so complex. We need to store in the pointer-to-member a bool to remember that it was reinterpret_casted, and a CXXRecordDecl pointer to store where it was reinterpret_casted from. If it is reinterpret_casted back properly, then these are reset. If we do further reinterpret_casting or static_cast on this, then we can store in a bool that the pointer-to-member is unsafe to de-reference.
The problem is utilizing this fact (that pointer-to-member is safe or not to de-reference). This, I reckon, is a big refactor.
@steakhal, @vsavchenko, @NoQ what do you think?

RedDocMD updated this revision to Diff 328931.Mar 7 2021, 9:31 PM

Replaced smart-ptr dereference with direct access

@steakhal, your views?

I'm somewhat busy. If it's not urgent, I would postpone this.
Ping me in a few weeks.

I'm somewhat busy. If it's not urgent, I would postpone this.
Ping me in a few weeks.

Sure ok. Is it okay if I tell someone else to take a look at this in the meanwhile?

I'm really sorry about being sooo picky about this patch.
It's not my expertise and the change seems to address a corner-case, so we have to especially careful not introducing bugs.

My concern is that I still don't understand why do we want to do anything with reinterpret casts, besides remembering what the original type was.
AFAIK the resulting pointer can not be safely dereferenced unless it's reinterpreted back to the original type.
But for example, you are expecting this:

void testMultiple() {
  int F::*f = &F::field;
  int A::*a = reinterpret_cast<int A::*>(f); // Dereferencing 'a' is UB, AFAIK
  int C::*c = reinterpret_cast<int C::*>(f); // Same here!
  A aobj;
  C cobj;
  aobj.*a = 13; // Now an airplane suddenly crashes.
  cobj.field = 29;
  clang_analyzer_eval(aobj.*a == 13); // expected-warning{{UNKNOWN}} // Same here!
  clang_analyzer_eval(cobj.*c == 29); // expected-warning{{TRUE}} // Same here!
}

IMO if an expression results in UB, the symbolic value associated with that expression should be Undef.
Unknown would be also fine, but nothing else.

I could spam a couple of nits, but I think it's better to sort this question out.

@steakhal, being picky with code from beginners is a good way to train them to write code I think, and so I must thank you for you scrutiny!
As for the test you pointed out, it is a wrong test. It is wrong now that I have a better understanding of the problem.
There are two options that can be taken I reckon:

  • Simply make the stored SVal Unknown when we have a reinterpret-cast on pointer-to-member
  • Model it properly to take care of all the semantics.

Both of them remove the bug. I would prefer the first one and leave a TODO for better modelling. Better modelling necessarily makes interpretation of pointer-to-member clumsy. Because previously, a pointer-to-member was always valid. If we include modelling for reinterpret-cast, then as far as I can see, the pointer-to-member may either be in a valid or invalid state. This would lead to a chain of other changes. Possible most certainly but I think a tad bit unnecessary, especially because this corner case makes the normal case handling more complex

steakhal requested changes to this revision.Mar 24 2021, 10:25 AM

This patch does not model faithfully how reinterpretcasting member pointers should work.

The base specifiers represent the history for the member pointer (aka. mptr), describing how it was transformed to point wherever it currently points to.
If and only if the PointerToMemberData is null, then the represented member pointer is null.

class PointerToMemberData could have a bool IsValid field, representing whether or not this mptr is safely dereferancable or not.
Depending on this, loading a value via this mptr should result in Undefined or the associated value from the store.
Whenever you encounter a reinterpret_cast expression casting a valid mptr to a different type, the resulting PointerToMemberData should be the same as before BUT the IsValid field set to false!
Later, if it's cast back to the original type (which is probably the NamedDecl inside the PointerToMemberData, I don't know).

You should also change the way how the abstract machine loads a value from the store using an invalid mptr - as that should return Undefined instead of the actual associated value.

This revision now requires changes to proceed.Mar 24 2021, 10:25 AM