If the cast is to a class such that it is possible to reach the correct
sub-object containing the pointed member, a path to that class is retrieved from the CXXRecordDecl.
This path is used to construct the new
pointer-to-member data. If there is no such path, the resulting node is
not analyzed further.
Details
- Reviewers
vsavchenko NoQ dcoughlin steakhal
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
50 ms | x64 debian > Flang.Semantics::resolve102.f90 |
Event Timeline
The longer I stare at it the more questions I have. However, I don't know how member pointers work in the analyzer and I'm not really in the mood to dive deep into that territory.
Could you elaborate on what approach did you choose and more importantly, why that approach?
Why do we need a graph search here?
What cases do you try to resolve? I mean, there is a single test-case, which has reinterpret casts back and forth (literally), and it's not immediately clear to me why are you doing those gymnastics.
Maybe worth creating other test-cases as well. Probably covering multiple inheritance or virtual inheritance?
Your code has shared pointers, which is interesting, but I would favor unique pointers unless you can defend this use-case.
Also, in general, we prefer algorithms to raw for or while loops if they are semantically equivalent.
Besides all of these, I can see a few copy-pasted lines here and there. I'm not saying that it's bad, I'm just highlighting this fact.
I can not confirm the implementation and accept the patch.
Improving the summary, helping reviewers with helpful notes here and there could give the boost you need to get this reviewed.
BTW, the review process takes ages in the analyzer community. We should also improve on that ofc, but the first step is always done by you (and here I mean not specifically you but we all).
clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
554–555 | I know that you just copy-pasted this, but what is this xD | |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
4–5 | I assume this is resolved by now. |
Many thanks for you comments, @steakhal!
I will address the issues you have pointed out in this comment. To clean things up I should perhaps add some more clarification to the summary.
Could you elaborate on what approach did you choose and more importantly, why that approach?
Why do we need a graph search here?
Pointer-to-members contain two things - a pointer to a NamedDecl to store the field/method being pointed to and a list of CXXBaseSpeciifier. This list is used to determine which sub-object the member lies in. This path needs to be determined and unfortunately with reinterpret_cast, the AST provides no structural assistance (unlike static_cast). Hence, it needs to be searched by searching through the BaseSpecifiers of the CXXRecordDecl.
As for the need of graph search, it is due to multiple-inheritance. For a given class, it may have two or more bases and we need to follow into both of them to find the path to the required sub-object.
Why do I use BFS? Because one branch of the (inverted) inheritance tree may be very deep yet not contain the required class. But DFS will first exhaust it and then go the other branch. BFS on the other hand will find the shortest path to the correct base optimally, and this shortest path is all we need.
What cases do you try to resolve? I mean, there is a single test-case, which has reinterpret casts back and forth (literally), and it's not immediately clear to me why are you doing those gymnastics.
Maybe worth creating other test-cases as well. Probably covering multiple inheritance or virtual inheritance?
I will add a test case for multiple inheritance and perhaps replace existing tests with clearer tests. Virtual inheritance will not help since pointer-to-member to virtual base field/method is not allowed.
There are three cases which I have handled:
- The required class is the class we are casting to
- The required class is higher in the class hierarchy
- The required class is lower in the class hierarchy
- The required class is not related to the class being cast to.
For Case 1 and 2, a valid path can be calculated and so a pointer-to-member is created.
For Case 3 and 4, no valid path can be calculated and so it doesn't make sense to further pursue static analysis down this path.
Note that the required class is determined by the actual field/member we are pointing to and is not explicit in the reinterpret_cast.
Your code has shared pointers, which is interesting, but I would favor unique pointers unless you can defend this use-case.
The reason for not using unique_ptr is as follows:- each BFSNode holds a pointer to its parent. This cannot be a unique_ptr since there may be multiple children to a BFSNode. This must be a shared_ptr as a consequence.
Also, in general, we prefer algorithms to raw for or while loops if they are semantically equivalent.
I am not aware of any std/LLVM algorithm that perform BFS. If there are any, I would be glad to learn more about them.
I am going to replace the for loops with a range-for loop. Sorry about that one!
Besides all of these, I can see a few copy-pasted lines here and there. I'm not saying that it's bad, I'm just highlighting this fact.
Yes this was a conscious choice. The other alternative perhaps is to use fall-through, which I think would be confusing in this case since it will be obscured by the intermediate code.
clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp | ||
---|---|---|
554–555 |
Not entirely sure, but I think it performs tasks required to handle the pointer-to-member created or do the requisite default in the absence of a pointer-to-member. | |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
4–5 |
Sorry, that was a lame mistake on my part. |
So here you are implying that for reinterpret casts, the CastE->path_begin()-end() range is empty? And that is why you need to manually come up with a possible CXXBaseSpecifier list?
I strongly encourage you to find a better alternative to this manual graph search.
This information must be available somewhere.
Have you looked at CXXRecordDecl::isDerivedFrom()? Or any other member functions of that class?
So here you are implying that for reinterpret casts, the CastE->path_begin()-end() range is empty? And that is why you need to manually come up with a possible CXXBaseSpecifier list?
@steakhal, Yes. For illustration, the following code:
struct Base { int field; }; struct Derived : public Base {}; struct DoubleDerived : public Derived {}; int f() { int DoubleDerived::*ddf = &Derived::field; int Base::*bf1 = static_cast<int Base::*>(ddf); int Base::*bf2 = reinterpret_cast<int Base::*>(ddf); }
produces the following (truncated) AST dump
-FunctionDecl 0x12f3890 <line:8:1, line:12:1> line:8:5 f 'int ()' `-CompoundStmt 0x12f3ec0 <col:9, line:12:1> |-DeclStmt 0x12f3b70 <line:9:3, col:44> | `-VarDecl 0x12f3a18 <col:3, col:39> col:23 used ddf 'int DoubleDerived::*' cinit | `-ImplicitCastExpr 0x12f3b48 <col:29, col:39> 'int DoubleDerived::*' <BaseToDerivedMemberPointer (Derived -> Base)> | `-UnaryOperator 0x12f3b30 <col:29, col:39> 'int Base::*' prefix '&' cannot overflow | `-DeclRefExpr 0x12f3ad0 <col:30, col:39> 'int' lvalue Field 0x12f33d0 'field' 'int' |-DeclStmt 0x12f3d20 <line:10:3, col:49> | `-VarDecl 0x12f3bf0 <col:3, col:48> col:14 bf1 'int Base::*' cinit | `-CXXStaticCastExpr 0x12f3ce0 <col:20, col:48> 'int Base::*' static_cast<int struct Base::*> <DerivedToBaseMemberPointer (Derived -> Base)> | `-ImplicitCastExpr 0x12f3cc8 <col:45> 'int DoubleDerived::*' <LValueToRValue> part_of_explicit_cast | `-DeclRefExpr 0x12f3c60 <col:45> 'int DoubleDerived::*' lvalue Var 0x12f3a18 'ddf' 'int DoubleDerived::*' `-DeclStmt 0x12f3ea8 <line:11:3, col:54> `-VarDecl 0x12f3d88 <col:3, col:53> col:14 bf2 'int Base::*' cinit `-CXXReinterpretCastExpr 0x12f3e78 <col:20, col:53> 'int Base::*' reinterpret_cast<int struct Base::*> <ReinterpretMemberPointer> `-ImplicitCastExpr 0x12f3e60 <col:50> 'int DoubleDerived::*' <LValueToRValue> part_of_explicit_cast `-DeclRefExpr 0x12f3df8 <col:50> 'int DoubleDerived::*' lvalue Var 0x12f3a18 'ddf' 'int DoubleDerived::*'
If you compare the three cases:
- The first statement has an implicit cast inserted during parsing, and so we need not calculate the path manually
- The second statement has a static_cast and it also supplies a path, except that we need to remove these bases instead of adding. (Ref to D95877).
- The third statement has no path provided, because it is a reinterpret_cast and the parser cannot produce a path in all cases (if you cast to an integer for an instance) - so it doesn't produce any path
With the parser not providing us with info, we have to figure it out in a different way.
Also, it is not one possible path but the only possible path. If there are multiple acceptable paths then it is illegal to define such a member-pointer and the frontend will reject it as such.
I strongly encourage you to find a better alternative to this manual graph search.
This information must be available somewhere.Have you looked at CXXRecordDecl::isDerivedFrom()? Or any other member functions of that class?
Unfortunately, all the methods on CXXRecordDecl, like the one you mentioned, are for querying and thus returns a bool, while I need the entire path.
AFAIK the second overload accepts an out parameter serving exactly our needs. Could you check if my assumption is correct?
@steakhal, you are absolutely right! It works. Thank you for pointing it out, not sure how I missed this earlier this evening.
I have serious concerns inline.
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
20–21 | Please have a note describing why you are doing this roundtrip. | |
29–31 | The assignment is actually UB. Shouldn't it return undef for reading via an invalid member pointer? | |
43–50 | An ASCII art would help so much: A C(field) | | B D \ / E | F However, I'm still missing a diamond-shaped inheritance. | |
52–62 | Wait a minute. It's not how it works. This example demonstrates that both of these member pointer dereferences are UB. |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
43–50 |
Thanks for the ASCII art! A | B / \ C D \ / E According to my understanding, if I have a field in A or B, and try to define a member pointer like int E::* ef = &A::field, it is not allowed. struct A { int field; }; struct B : public A {}; struct C : public virtual B {}; struct D : public virtual B {}; struct E : public C, public D {}; int main() { int E::* ef1 = &A::field; } diamond-member-pointer.cpp: In function ‘int main()’: 11 | int E::* ef1 = &A::field; | ^~~~~ |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
20–21 | You mean why a simpler test wouldn't suffice? Well I think it would, but this was a common corner identified by @vsavchenko and I have put it in as a result. | |
29–31 | I don't quite know what is undef. Is it a special macro or something? | |
52–62 | The member access on A is definitely UB, I guess I will do what I proposed in the Some case. struct A {}; struct B : public A {}; struct C { int field; }; struct D : public C {}; struct E : public B, public D {}; struct F : public E {}; int main() { int F::* ff = &F::field; int C::* cf1 = static_cast<int C::*>(ff); int C::* cf2 = reinterpret_cast<int C::*>(ff); C c; c.*cf1 = 10; c.*cf2 = 10; return 0; } cf1 and cf2 are the same thing, except that they are declared differently (one via static_cast, other via reinterpret_cast). If we look at the AST (truncated to the first three lines of main): CompoundStmt 0x1a4fe18 <col:12, line:18:1> |-DeclStmt 0x1a4f3a8 <line:11:3, col:26> | `-VarDecl 0x1a21078 <col:3, col:21> col:12 used ff 'int F::*' cinit | `-ImplicitCastExpr 0x1a4f378 <col:17, col:21> 'int F::*' <BaseToDerivedMemberPointer (E -> D -> C)> | `-UnaryOperator 0x1a4f360 <col:17, col:21> 'int C::*' prefix '&' cannot overflow | `-DeclRefExpr 0x1a4f2f8 <col:18, col:21> 'int' lvalue Field 0x1a207d0 'field' 'int' |-DeclStmt 0x1a4f560 <line:12:3, col:43> | `-VarDecl 0x1a4f428 <col:3, col:42> col:12 used cf1 'int C::*' cinit | `-CXXStaticCastExpr 0x1a4f518 <col:18, col:42> 'int C::*' static_cast<int struct C::*> <DerivedToBaseMemberPointer (E -> D -> C)> | `-ImplicitCastExpr 0x1a4f500 <col:40> 'int F::*' <LValueToRValue> part_of_explicit_cast | `-DeclRefExpr 0x1a4f498 <col:40> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*' |-DeclStmt 0x1a4f6e8 <line:13:3, col:48> | `-VarDecl 0x1a4f5c8 <col:3, col:47> col:12 used cf2 'int C::*' cinit | `-CXXReinterpretCastExpr 0x1a4f6b8 <col:18, col:47> 'int C::*' reinterpret_cast<int struct C::*> <ReinterpretMemberPointer> | `-ImplicitCastExpr 0x1a4f6a0 <col:45> 'int F::*' <LValueToRValue> part_of_explicit_cast | `-DeclRefExpr 0x1a4f638 <col:45> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*' Notice how the static_cast figures out the path to the correct subobject. This is how member-pointers are handled as far as I can tell. |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
29–31 | By undef, I was referring to the clang::ento::UndefinedVal - which usually represents values that shouldn't be used, not even read. For example, an uninitialized variable has undefined value. Or an out-of-bound memory access also produces undefined value. There are a few checkers that are hunting for such undefined values like CallAndMessageChecker, UndefBranchChecker, UndefinedArraySubscriptChecker, UndefinedAssignmentChecker, UndefResultChecker. Probably removing the member accesses is the best you can do.
In this test, I can't see where you cast the member pointer back to make it valid again. | |
43–50 | Oh yea, now I get it. | |
52–62 | I tried to highlight that the classes could have non-static data members which could cause your assumption about cf1 == cf2 not to hold anymore. See my previously attached godbolt link. static_cast<T>(p) makes sure that the offset that the member pointer p represents is updated accordingly to match the offset of the field member within the new type T. Usually, it involves some addition/subtraction to accommodate this. |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
29–31 |
I meant that sf when used will lead to undefined behaviour. But the statement int Some::*sf = reinterpret_cast<int Some::*>(ddf); isn't undefined, is it? AFAIK, reinterpret_cast is the C++ equivalent of the C unchecked cast - You can cast pointers as you wish but don't bet on the result. | |
52–62 |
Presumably you mean reinterpret_cast. Ah, I see. This makes it glaringly obvious: Another Godbolt link |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
29–31 |
It depends on how you define use. You can do exactly 2 things with the invalid sf pointer:
The result of the dereference of an invalid pointer could be modelled as either UnknownVal or UndefinedVal. The latter would be more accurate, but the former is also a valid approximation. | |
52–62 |
Yes, sorry for copy-pasting too much xD
I don't know if it's really pointless, but hard to model precisely for sure.
I guess, you could just return unknown? I don't know. This area is not my expertise, I just wanted to get you going with my observations. |
clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp | ||
---|---|---|
52–62 |
If I return nullptr as PointerToMemberData, then the analyzer will mark the pointer-to-member as unknown [this is done by handleLVectorSplat]
Really hard to model and has too many cases and ifs and buts. Like although the member-pointer obtained by reinterpret_cast probably has the wrong offset, we still don't get seg-fault. Curious, but seems to happen for both GCC and Clang. |
Talking on the mailing list, I got a link on reinterpret_casting for pointer-to-member: https://timsong-cpp.github.io/cppwp/n4861/expr.reinterpret.cast#10
The gist is that this sort of cast is only valid if we cast back to the original type (that too with a caveat). Now currently there are three things that can be done:
- Leave things the way they are - reinterpret_cast will cause an assertion to fail
- Never analyze reinterpret_cast - this is easy, just return nullptr for PointerToMemberData
- Properly model this - this would require possibly adding more information in PointerToMember and a significantly complex logic to handle it.
The first two are trivially done. I would like to know if it is worth following the third - especially since it is a really obscure feature.
@vsavchenko, @NoQ, @steakhal what do you think should be done?
What is that caveat?
Now currently there are three things that can be done:
- Leave things the way they are - reinterpret_cast will cause an assertion to fail
- Never analyze reinterpret_cast - this is easy, just return nullptr for PointerToMemberData
- Properly model this - this would require possibly adding more information in PointerToMember and a significantly complex logic to handle it.
Do you have anything specific in mind about implementing the 3rd option?
Why do you think it's significantly complex?
Leaving assertion failures is not an option.
IMO if this feature is important to you, then you should pursue the 3rd.
The second point in the link I gave above.
Do you have anything specific in mind about implementing the 3rd option?
Why do you think it's significantly complex?
The modelling is not so complex. We need to store in the pointer-to-member a bool to remember that it was reinterpret_casted, and a CXXRecordDecl pointer to store where it was reinterpret_casted from. If it is reinterpret_casted back properly, then these are reset. If we do further reinterpret_casting or static_cast on this, then we can store in a bool that the pointer-to-member is unsafe to de-reference.
The problem is utilizing this fact (that pointer-to-member is safe or not to de-reference). This, I reckon, is a big refactor.
@steakhal, @vsavchenko, @NoQ what do you think?
I'm somewhat busy. If it's not urgent, I would postpone this.
Ping me in a few weeks.
I'm really sorry about being sooo picky about this patch.
It's not my expertise and the change seems to address a corner-case, so we have to especially careful not introducing bugs.
My concern is that I still don't understand why do we want to do anything with reinterpret casts, besides remembering what the original type was.
AFAIK the resulting pointer can not be safely dereferenced unless it's reinterpreted back to the original type.
But for example, you are expecting this:
void testMultiple() { int F::*f = &F::field; int A::*a = reinterpret_cast<int A::*>(f); // Dereferencing 'a' is UB, AFAIK int C::*c = reinterpret_cast<int C::*>(f); // Same here! A aobj; C cobj; aobj.*a = 13; // Now an airplane suddenly crashes. cobj.field = 29; clang_analyzer_eval(aobj.*a == 13); // expected-warning{{UNKNOWN}} // Same here! clang_analyzer_eval(cobj.*c == 29); // expected-warning{{TRUE}} // Same here! }
IMO if an expression results in UB, the symbolic value associated with that expression should be Undef.
Unknown would be also fine, but nothing else.
I could spam a couple of nits, but I think it's better to sort this question out.
@steakhal, being picky with code from beginners is a good way to train them to write code I think, and so I must thank you for you scrutiny!
As for the test you pointed out, it is a wrong test. It is wrong now that I have a better understanding of the problem.
There are two options that can be taken I reckon:
- Simply make the stored SVal Unknown when we have a reinterpret-cast on pointer-to-member
- Model it properly to take care of all the semantics.
Both of them remove the bug. I would prefer the first one and leave a TODO for better modelling. Better modelling necessarily makes interpretation of pointer-to-member clumsy. Because previously, a pointer-to-member was always valid. If we include modelling for reinterpret-cast, then as far as I can see, the pointer-to-member may either be in a valid or invalid state. This would lead to a chain of other changes. Possible most certainly but I think a tad bit unnecessary, especially because this corner case makes the normal case handling more complex
This patch does not model faithfully how reinterpretcasting member pointers should work.
The base specifiers represent the history for the member pointer (aka. mptr), describing how it was transformed to point wherever it currently points to.
If and only if the PointerToMemberData is null, then the represented member pointer is null.
class PointerToMemberData could have a bool IsValid field, representing whether or not this mptr is safely dereferancable or not.
Depending on this, loading a value via this mptr should result in Undefined or the associated value from the store.
Whenever you encounter a reinterpret_cast expression casting a valid mptr to a different type, the resulting PointerToMemberData should be the same as before BUT the IsValid field set to false!
Later, if it's cast back to the original type (which is probably the NamedDecl inside the PointerToMemberData, I don't know).
You should also change the way how the abstract machine loads a value from the store using an invalid mptr - as that should return Undefined instead of the actual associated value.
I'm sorry but is this some sort of joke? I don't think we should be calling ChickenFajitas straightaway like this, debounce it, recalculate, repent let's go
I know that you just copy-pasted this, but what is this xD