This patch models the std::make_unique and std::make_unique_for_overwrite methods.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The drawback of the current approach is that we are not using the following piece of information:
std::unique_ptr created from std::make_unique is not null (to begin with)
I am not being able to use this info since I don't have access to the raw pointer, so cannot create a SVal and then constrain the SVal to non-null.
Any suggestions @NoQ, @vsavchenko , @xazax.hun, @teemperor?
I am not being able to use this info since I don't have access to the raw pointer, so cannot create a SVal and then constrain the SVal to non-null.
Any suggestions @NoQ, @vsavchenko , @xazax.hun, @teemperor?
You can always create a new symbol to represent the inner pointer. Something like this already happens, when you have a unique_ptr formal parameter and call get on it.
The way handleGet obtains a new symbol cannot really be used here since we do not have an Expr for the inner pointer at hand. handleGet does the following:
C.getSValBuilder().conjureSymbolVal( CallExpr, C.getLocationContext(), Call.getResultType(), C.blockCount());
Since we have CallExpr, we can easily conjure up an SVal. But I don't see how I can do it similarly in this patch.
Since we have CallExpr, we can easily conjure up an SVal. But I don't see how I can do it similarly in this patch.
You should have a CallExpr for std::make_unique too. I believe that expression is used to determine how the conjured symbol was created (to give it an identity). So probably it should be ok to use the make_unique call to create this symbol (but be careful to create the symbol with the right type).
Tests pls!
Yes I think you should totally do an evalCall() here. The function has no other side effects apart from making a pointer so it's valuable to fully model it so that to avoid unnecessary store invalidations.
That's right, a conjured symbol isn't necessarily the value of its respective expression. So you can conjure it and assume that it's non-null. Also consider using a heap conjured symbol (a-la getConjuredHeapSymbolVal()). That said, getConjuredHeapSymbolVal() doesn't provide an overload for overriding the type; there's no reason it shouldn't though, please feel free to extend it.
There's a separate story about SymbolConjured being the right class for the problem. I'm still in favor of having SymbolMetadata instead, so that to properly deduplicate symbols across different branches and merge more paths.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
190 | Please use CallDescription for most of this stuff. | |
352 | Please merge with isStdSmartPtrCall. | |
353–356 | getTypePtr() is almost never necessary because QualType has an overloaded operator->(). |
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
352 | I don't think that can be done because isStdSmartPtrCall checks calls like a.foo(), checking if a is a smart-ptr. On the other hand I need to check for statements like a = foo() and ensure that a is a smart-ptr. |
Ugh, this entire checkBind hack was so unexpected that I didn't even recognize evalCall. What prevents you from doing everything in evalCall? No state traits, no nothing, just directly take the this-region and attach the value to it?
Because in evalCall, I don't have access to the ThisRegion. At least I think I don't.
So we have a statement like auto a = std::make_unique<int>(100). In evalCall I get a CallEvent corresponding to std::make_unique<int>(100). Using Call.getOriginExpr() I can get the LHS. But that's only the Expr, not the MemRegion. Unlike in a regular constructor call or method call, where ThisRegion can easily be obtained from the Call.
Did you consider trying to get it from CXXInstanceCall::getCXXThisVal? Usually, if you cannot access something you need, that is a bug in the analyzer. But I found that this is very rarely the case. Most of the callbacks are very well thought out and already gives access to most of the information you might want to access.
You do have access to it. That's how C++ works: the call constructs the value directly into the target this-region. In code
auto a = std::make_unique<int>(100)
it is known even before std::make_unique is invoked (i.e., even in checkPreCall) that the this-region for the call is going to be a. Because it's up to the call to invoke the constructor of the external object, and you can't invoke a constructor if you don't have a this to pass into it. You cannot ever have an object and not have this.
Even if it wasn't for RVO, you would have known the temporary region in which the smart pointer would originally reside.
This was the whole point of the CFG work described in https://www.youtube.com/watch?v=4n3l-ZcDJNY
You're looking in the wrong place though, as std::make_unique returns the structure by value (aka prvalue), so the value of the expression, even if available before the call, was never going to be this (which would have been the corresponding glvalue). What you're looking for is CallEvent::getReturnValueUnderConstruction().
Whoops, I totally missed your point with my suggestion (should not read it while sitting at meetings). But Artem has a great answer already.
Also congrats, sounds like you are to become the first actual user of this API! Hope it actually works 🤞🤞🤞
Nice. Looks like I have something to read up on. Turns out, I end up learning something new in C++ every now and then. 😃
Yeah, it's also pretty interesting ABI-wise. As you know, C++ doesn't have stable ABI. That said, one popular calling convention for functions that return C++ objects by value consists in passing a hidden argument to the function that contains the address at which to construct the return value. Like "this" is an implicit argument to methods, the return address is yet another implicit argument. So you can think of getReturnValueUnderConstruction() as of an accessor for that implicit argument value similarly to how getCXXThisVal() is the accessor for the implicit this-argument value.
This is completely different from C where you can return the structure on stack or even in registers (which can't be pointed to at all); this becomes possible in C because structures can be copied arbitrarily without invoking any sort of copy constructor.
Random general advice! When writing patches, you typically explore the design space of potential solutions - which you did great in this case. When you do so, it's a great idea to include the results of such exploration in the commit message (and/or in later comments). Especially if the first / most obvious thing to try didn't work. If you publish your second-choice solution, people will most likely ask you why you chose it instead of the obvious first-choice solution so it's an act of friendliness to anticipate this question. Similarly, if you have concerns about your solution, please communicate them openly; they'll either be discovered anyway but a lot more effort will be spent on it, or they won't be discovered during review which could result in future problems; both variants are worse than communicating openly. Ideally you carry more responsibility for proving that your approach is right than the reviewer has a responsibility to find problems in it. The reviewer is the last line of defense when it comes to applying critical thinking but the process really starts with you as you challenge your own assumptions and approaches by questioning both their correctness and their architectural harmony. Trying to find at least two solutions to each problem and compare them against each other is such a good exercise in critical thinking that almost every patch deserves such discussion.
If I understand your comment correctly, then see my inline comment.
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
4 | You set the standard here via the -std arg. If you want different standards you need to have multiple RUN: lines that all pass a different standard. |
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
4 | Yup, thanks! |
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
164 | getParent() will assert when the MethodDecl is not a in a Record (which is the only way this can return null) so you can remove that check. |
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
201–202 | Untyped region isn't a region without a type; everything has a type. Untyped region is when we don't know the type. A typical situation that produces untyped region is when the region comes in through a void pointer. I vaguely remember that one way to trick your specific code may be to do std::unique_ptr<int> foo() { return make_unique<int>(123); } which will RVO into an unknown region. I also wouldn't rely on it being typed in all other cases. A much safer way to access the inner pointer type would be to query the function's template parameter. | |
214–217 | I suggest a TODO: ExprEngine should do this for us.. | |
219 | Do we need a note here as well? I guess we don't because we'll never emit null dereference reports against a non-null pointer. But if we later emit more sophisticated bug reports, we might need one. Maybe leave a comment? |
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
199 | Can you do a getConjuredHeapSymbolVal() instead? That'd give us the right memory space as well as the extra bit of information that the new symbol doesn't alias with any previous symbols. I get it that it doesn't accept the type but it's perfectly ok for you to teach it how to accept the type. | |
clang/test/Analysis/smart-ptr-text-output.cpp | ||
334 | Mmm wait a sec, that doesn't look like what the spec says. https://en.cppreference.com/w/cpp/memory/unique_ptr/make_unique:
It zero-initializes the pointee, not the pointer. The difference between std::make_unique<A>() and std::make_unique_for_overwrite<A>() is the difference between value-initialization (invoking the default constructor) and zero-initialization (simply filling the buffer with 0s). |
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
334 | Oops! Look like I completely misunderstood the spec. |
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
208–212 | I think it's fine to have it here in this commit and change it in one of the future commits. But I think we should add a bit more description of 1) what's the problem, 2) how is this a solution, 3) how should it look when moved to the engine. |
This looks great to me, I think the patch is good to go as long as Valeriy's comment is addressed :)
Speaking of specs, hold up, we're forgetting something very important.
make_unique() also has to call the constructor of the underlying value. We're supposed to model that.
The constructor may be inlined or evaluated conservatively.
~unique_ptr() is also supposed to call the destructor of the underlying value. We're supposed to model that.
I.e., you could imagine a bunch of tests like this
struct S { int *p; S(int *p): p(p) {} ~S() { *p = 0; } }; void foo() { int x = 1; { auto P = std::make_unique<S>(&x); clang_analyzer_eval(*P->p == 1); // TRUE } clang_analyzer_eval(x == 0); // TRUE }
We could stick to conservative evaluation of such constructors and destructors; at the very least, we should actively invalidate Store at the destructor. This patch can remain unchanged because the default values inside freshly conjured regions are unknown. But that'd still be a regression because the above test (presumably) passes with inlining.
Is now a good time to legalize modeling function calls inside checker callbacks?
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
27 | Could you double-check that this flag correctly disables all the newly introduced modeling so that it wasn't accidentally enabled for all users before it's ready? |
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
27 | Yup. There are 133 notes and warnings (just search for "// expected-" and see the count). And on setting the flag to false, there are 133 errors. |
I would suppose that constructor calls are properly handled. (ie, member constructors are called properly).
As for modelling destructors, there is a separate problem - since we don't have a Stmt, the postCall handler keeps on crashing.
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
27 | I mean, specifically the modeling added by this patch? I see you wrote an entire checker callback that doesn't seem to have an analyzer-config option check anywhere in it. Simply having different results isn't sufficient to verify this. |
I believe there are a couple of comments that are done but not marked accordingly.
I agree with Artem, if we could craft code that fails due to not calling ctors, we should probably include them in the tests, even if they don't reflect the desired behavior they are a great source of documentation what is missing.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
---|---|---|
151 | Nit: consider TemplateArgs.empty(). |
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
27 | Oh I see. |
The analyzer doesn't seem to be able to make up its mind.
member-constructor.cpp:15:5: warning: FALSE [debug.ExprInspection] clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ member-constructor.cpp:15:25: note: Assuming the condition is false clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~ member-constructor.cpp:15:5: note: FALSE clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ member-constructor.cpp:15:5: warning: TRUE [debug.ExprInspection] clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ member-constructor.cpp:15:25: note: Assuming the condition is true clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~ member-constructor.cpp:15:5: note: TRUE clang_analyzer_eval(*P->p == 0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 warnings generated.
I usually just add the wrong expectation to make the test pass and add a TODO comment that explains why is this wrong and we should fix it in the future.
The line of thinking here is that tests are just something that gives us a signal when the behavior changes. They don't necessarily demonstrate the expected behavior. If they don't there's still value in knowing that the observed behavior on them has changed.
For example, if you believe that your commit shouldn't change the observed behavior ("NFC commit", eg. refactoring), any signal that the observed behavior has changed should raise concerns and probably cause you to rethink the commit, even if the behavior has in fact improved.
Similarly, if the functional change is intended, you may still find it suspicious if it suddenly fixes a test that it wasn't supposed to fix. This may help you discover a bug in your commit that would cause the behavior to regress in other related cases.
You don't want the buggy test to be silenced, you want it to keep actively verifying that the bug is not yet fixed.
Then, separately, for every test there's a requirement to keep it understandable, so that the reader would understand what exactly is tested and what are the consequences of breaking the test. It's often valuable to provide comments indicating your level of confidence - "this test is extremely important to pass", "we don't really care about our behavior on this test but it's still nice to know when that behavior changes", etc. - and "we definitely don't want this test to pass but for now it does" is a possible answer as well and it doesn't make the test any less valuable.
So basically
void foo() { // FIXME: Should be FALSE. clang_analyzer_eval(1 + 1 == 3); // expected-warning{{TRUE}} // FIXME: Should be TRUE. clang_analyzer_eval(1 + 1 == 2); // expected-warning{{FALSE}} }
is a great test to have.
I believe that you might see analyzer-eagerly-assume in action. Basically, instead of keeping a constraint unknown, the analyzer can split the path eagerly. So you have one path where the constraint is assumed to be true and one where it is assumed to be false. I think the rationale was that the user is likely to branch on certain conditions and doing it eagerly can make the analysis more precise in some cases but I do not really remember the details. Hopefully Artem has some more contexts.
clang/test/Analysis/smart-ptr-text-output.cpp | ||
---|---|---|
340 | Nit: we usually add new lines to the ends of the files. |
Yes, @xazax.hun is correct.
It's incorrect to say that the static analyzer "doesn't seem to be able to make up its mind". The analyzer gives perfectly clear and consistent answers for each execution path it explores and it's not surprising that the results are different on different execution paths. The presence of the FALSE path indicates indicates that the test indeed doesn't pass: an impossible execution path is being explored.
Eagerly-assume is a thing because it produces easier constraints for the solver to solve. Also the state split is pretty justified on any boolean expression with no side effects.
Is the syntax of specifying expected notes and warnings documented somewhere? I could not find the note-specific syntax.
Nit: consider TemplateArgs.empty().