This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Handle std::make_unique for SmartPtrModeling
ClosedPublic

Authored by RedDocMD on Jun 5 2021, 5:46 AM.

Details

Summary

This patch models the std::make_unique and std::make_unique_for_overwrite methods.

Diff Detail

Event Timeline

RedDocMD created this revision.Jun 5 2021, 5:46 AM
RedDocMD requested review of this revision.Jun 5 2021, 5:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2021, 5:46 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
RedDocMD retitled this revision from [analyzer] Handle std::make_unique for SmartPtrModeling to [analyzer][WIP] Handle std::make_unique for SmartPtrModeling.Jun 5 2021, 11:20 AM
RedDocMD updated this revision to Diff 350067.Jun 5 2021, 11:23 AM

Accounting for std::make_unique_for_overwrite

RedDocMD edited the summary of this revision. (Show Details)Jun 5 2021, 11:25 AM
RedDocMD updated this revision to Diff 350103.Jun 6 2021, 7:15 AM

Fixed binding of SVal to variable

The drawback of the current approach is that we are not using the following piece of information:

std::unique_ptr created from std::make_unique is not null (to begin with)

I am not being able to use this info since I don't have access to the raw pointer, so cannot create a SVal and then constrain the SVal to non-null.
Any suggestions @NoQ, @vsavchenko , @xazax.hun, @teemperor?

I am not being able to use this info since I don't have access to the raw pointer, so cannot create a SVal and then constrain the SVal to non-null.
Any suggestions @NoQ, @vsavchenko , @xazax.hun, @teemperor?

You can always create a new symbol to represent the inner pointer. Something like this already happens, when you have a unique_ptr formal parameter and call get on it.

You can always create a new symbol to represent the inner pointer. Something like this already happens, when you have a unique_ptr formal parameter and call get on it.

The way handleGet obtains a new symbol cannot really be used here since we do not have an Expr for the inner pointer at hand. handleGet does the following:

C.getSValBuilder().conjureSymbolVal(
        CallExpr, C.getLocationContext(), Call.getResultType(), C.blockCount());

Since we have CallExpr, we can easily conjure up an SVal. But I don't see how I can do it similarly in this patch.

RedDocMD updated this revision to Diff 350109.Jun 6 2021, 9:10 AM

Reformatted code

RedDocMD updated this revision to Diff 350110.Jun 6 2021, 9:14 AM

Fixed git history

Since we have CallExpr, we can easily conjure up an SVal. But I don't see how I can do it similarly in this patch.

You should have a CallExpr for std::make_unique too. I believe that expression is used to determine how the conjured symbol was created (to give it an identity). So probably it should be ok to use the make_unique call to create this symbol (but be careful to create the symbol with the right type).

NoQ added a comment.Jun 6 2021, 5:33 PM

Tests pls!

Yes I think you should totally do an evalCall() here. The function has no other side effects apart from making a pointer so it's valuable to fully model it so that to avoid unnecessary store invalidations.

Since we have CallExpr, we can easily conjure up an SVal. But I don't see how I can do it similarly in this patch.

You should have a CallExpr for std::make_unique too. I believe that expression is used to determine how the conjured symbol was created (to give it an identity). So probably it should be ok to use the make_unique call to create this symbol (but be careful to create the symbol with the right type).

That's right, a conjured symbol isn't necessarily the value of its respective expression. So you can conjure it and assume that it's non-null. Also consider using a heap conjured symbol (a-la getConjuredHeapSymbolVal()). That said, getConjuredHeapSymbolVal() doesn't provide an overload for overriding the type; there's no reason it shouldn't though, please feel free to extend it.

There's a separate story about SymbolConjured being the right class for the problem. I'm still in favor of having SymbolMetadata instead, so that to properly deduplicate symbols across different branches and merge more paths.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
245

Please use CallDescription for most of this stuff.

442

Please merge with isStdSmartPtrCall.

443–446

getTypePtr() is almost never necessary because QualType has an overloaded operator->().

RedDocMD updated this revision to Diff 350362.Jun 7 2021, 11:00 AM

Made stylistic refactors

RedDocMD marked 3 inline comments as done.Jun 7 2021, 11:04 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
442

I don't think that can be done because isStdSmartPtrCall checks calls like a.foo(), checking if a is a smart-ptr. On the other hand I need to check for statements like a = foo() and ensure that a is a smart-ptr.

RedDocMD marked an inline comment as done.Jun 7 2021, 11:05 AM

Yes I think you should totally do an evalCall() here. The function has no other side effects apart from making a pointer so it's valuable to fully model it so that to avoid unnecessary store invalidations.

Am I not doing that now? I didn't get this point.

NoQ added a comment.Jun 7 2021, 11:42 AM

Ugh, this entire checkBind hack was so unexpected that I didn't even recognize evalCall. What prevents you from doing everything in evalCall? No state traits, no nothing, just directly take the this-region and attach the value to it?

Ugh, this entire checkBind hack was so unexpected that I didn't even recognize evalCall. What prevents you from doing everything in evalCall? No state traits, no nothing, just directly take the this-region and attach the value to it?

Because in evalCall, I don't have access to the ThisRegion. At least I think I don't.
So we have a statement like auto a = std::make_unique<int>(100). In evalCall I get a CallEvent corresponding to std::make_unique<int>(100). Using Call.getOriginExpr() I can get the LHS. But that's only the Expr, not the MemRegion. Unlike in a regular constructor call or method call, where ThisRegion can easily be obtained from the Call.

Ugh, this entire checkBind hack was so unexpected that I didn't even recognize evalCall. What prevents you from doing everything in evalCall? No state traits, no nothing, just directly take the this-region and attach the value to it?

Because in evalCall, I don't have access to the ThisRegion. At least I think I don't.

Did you consider trying to get it from CXXInstanceCall::getCXXThisVal? Usually, if you cannot access something you need, that is a bug in the analyzer. But I found that this is very rarely the case. Most of the callbacks are very well thought out and already gives access to most of the information you might want to access.

NoQ added a comment.Jun 7 2021, 12:44 PM

Ugh, this entire checkBind hack was so unexpected that I didn't even recognize evalCall. What prevents you from doing everything in evalCall? No state traits, no nothing, just directly take the this-region and attach the value to it?

Because in evalCall, I don't have access to the ThisRegion.

You do have access to it. That's how C++ works: the call constructs the value directly into the target this-region. In code

auto a = std::make_unique<int>(100)

it is known even before std::make_unique is invoked (i.e., even in checkPreCall) that the this-region for the call is going to be a. Because it's up to the call to invoke the constructor of the external object, and you can't invoke a constructor if you don't have a this to pass into it. You cannot ever have an object and not have this.

Even if it wasn't for RVO, you would have known the temporary region in which the smart pointer would originally reside.

This was the whole point of the CFG work described in https://www.youtube.com/watch?v=4n3l-ZcDJNY

You're looking in the wrong place though, as std::make_unique returns the structure by value (aka prvalue), so the value of the expression, even if available before the call, was never going to be this (which would have been the corresponding glvalue). What you're looking for is CallEvent::getReturnValueUnderConstruction().

Whoops, I totally missed your point with my suggestion (should not read it while sitting at meetings). But Artem has a great answer already.

NoQ added a comment.Jun 7 2021, 12:54 PM

Also congrats, sounds like you are to become the first actual user of this API! Hope it actually works 🤞🤞🤞

Nice. Looks like I have something to read up on. Turns out, I end up learning something new in C++ every now and then. 😃

NoQ added a comment.Jun 7 2021, 8:57 PM

Nice. Looks like I have something to read up on. Turns out, I end up learning something new in C++ every now and then. 😃

Yeah, it's also pretty interesting ABI-wise. As you know, C++ doesn't have stable ABI. That said, one popular calling convention for functions that return C++ objects by value consists in passing a hidden argument to the function that contains the address at which to construct the return value. Like "this" is an implicit argument to methods, the return address is yet another implicit argument. So you can think of getReturnValueUnderConstruction() as of an accessor for that implicit argument value similarly to how getCXXThisVal() is the accessor for the implicit this-argument value.

This is completely different from C where you can return the structure on stack or even in registers (which can't be pointed to at all); this becomes possible in C because structures can be copied arbitrarily without invoking any sort of copy constructor.


Random general advice! When writing patches, you typically explore the design space of potential solutions - which you did great in this case. When you do so, it's a great idea to include the results of such exploration in the commit message (and/or in later comments). Especially if the first / most obvious thing to try didn't work. If you publish your second-choice solution, people will most likely ask you why you chose it instead of the obvious first-choice solution so it's an act of friendliness to anticipate this question. Similarly, if you have concerns about your solution, please communicate them openly; they'll either be discovered anyway but a lot more effort will be spent on it, or they won't be discovered during review which could result in future problems; both variants are worse than communicating openly. Ideally you carry more responsibility for proving that your approach is right than the reviewer has a responsibility to find problems in it. The reviewer is the last line of defense when it comes to applying critical thinking but the process really starts with you as you challenge your own assumptions and approaches by questioning both their correctness and their architectural harmony. Trying to find at least two solutions to each problem and compare them against each other is such a good exercise in critical thinking that almost every patch deserves such discussion.

RedDocMD updated this revision to Diff 351349.Jun 11 2021, 12:13 AM

Put changes discussed in the meeting, tests to come in next revision

RedDocMD updated this revision to Diff 351356.Jun 11 2021, 1:07 AM

Added tests

How do I set the C++ standard while running a test?

teemperor added a comment.EditedJun 11 2021, 1:23 AM

How do I set the C++ standard while running a test?

If I understand your comment correctly, then see my inline comment.

clang/test/Analysis/smart-ptr-text-output.cpp
9

You set the standard here via the -std arg. If you want different standards you need to have multiple RUN: lines that all pass a different standard.

RedDocMD updated this revision to Diff 351376.Jun 11 2021, 2:50 AM

Fixed up tests, now also runnning on C++20

RedDocMD marked an inline comment as done.Jun 11 2021, 2:50 AM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr-text-output.cpp
9

Yup, thanks!

RedDocMD marked an inline comment as done.Jun 11 2021, 2:50 AM
RedDocMD retitled this revision from [analyzer][WIP] Handle std::make_unique for SmartPtrModeling to [analyzer] Handle std::make_unique for SmartPtrModeling.
teemperor added inline comments.Jun 11 2021, 5:00 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
204

getParent() will assert when the MethodDecl is not a in a Record (which is the only way this can return null) so you can remove that check.

RedDocMD marked an inline comment as done.Jun 11 2021, 9:49 AM
RedDocMD updated this revision to Diff 351485.Jun 11 2021, 9:51 AM

Removed un-necessary check

NoQ added inline comments.Jun 11 2021, 11:26 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
280–281

Untyped region isn't a region without a type; everything has a type. Untyped region is when we don't know the type. A typical situation that produces untyped region is when the region comes in through a void pointer.

I vaguely remember that one way to trick your specific code may be to do

std::unique_ptr<int> foo() {
  return make_unique<int>(123);
}

which will RVO into an unknown region. I also wouldn't rely on it being typed in all other cases.

A much safer way to access the inner pointer type would be to query the function's template parameter.

293–296

I suggest a TODO: ExprEngine should do this for us..

298

Do we need a note here as well? I guess we don't because we'll never emit null dereference reports against a non-null pointer. But if we later emit more sophisticated bug reports, we might need one. Maybe leave a comment?

RedDocMD marked 3 inline comments as done.Jun 11 2021, 9:15 PM
RedDocMD updated this revision to Diff 351628.Jun 11 2021, 9:21 PM

Fixed up technique used to find inner pointer type, put TODO's

NoQ added inline comments.Jun 14 2021, 8:44 PM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
278

Can you do a getConjuredHeapSymbolVal() instead? That'd give us the right memory space as well as the extra bit of information that the new symbol doesn't alias with any previous symbols.

I get it that it doesn't accept the type but it's perfectly ok for you to teach it how to accept the type.

clang/test/Analysis/smart-ptr-text-output.cpp
338

Mmm wait a sec, that doesn't look like what the spec says.

https://en.cppreference.com/w/cpp/memory/unique_ptr/make_unique:

Same as (1), except that the object is default-initialized. This overload participates in overload resolution only if T is not an array type. The function is equivalent to unique_ptr<T>(new T)

It zero-initializes the pointee, not the pointer.

The difference between std::make_unique<A>() and std::make_unique_for_overwrite<A>() is the difference between value-initialization (invoking the default constructor) and zero-initialization (simply filling the buffer with 0s).

RedDocMD marked an inline comment as done.Jun 15 2021, 6:49 AM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr-text-output.cpp
338

Oops! Look like I completely misunderstood the spec.
So basically, the inner pointer is not null, and we have the same assumptions that apply for std::make_unique. Only problem is that, for primitive types, the value obtained on de-referencing is undefined.

RedDocMD updated this revision to Diff 352127.Jun 15 2021, 6:53 AM

Fixed up meaning of make_unique_for_overwrite, use getConjuredHeapSymbolVal.

vsavchenko added inline comments.Jun 15 2021, 7:20 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
287–291

I think it's fine to have it here in this commit and change it in one of the future commits. But I think we should add a bit more description of 1) what's the problem, 2) how is this a solution, 3) how should it look when moved to the engine.

NoQ accepted this revision.Jun 16 2021, 12:26 AM

This looks great to me, I think the patch is good to go as long as Valeriy's comment is addressed :)

Speaking of specs, hold up, we're forgetting something very important.

make_unique() also has to call the constructor of the underlying value. We're supposed to model that.

The constructor may be inlined or evaluated conservatively.

~unique_ptr() is also supposed to call the destructor of the underlying value. We're supposed to model that.

I.e., you could imagine a bunch of tests like this

struct S {
  int *p;
  S(int *p): p(p) {}
  ~S() { *p = 0; }
};

void foo() {
  int x = 1;
  {
    auto P = std::make_unique<S>(&x);
    clang_analyzer_eval(*P->p == 1); // TRUE
  }
  clang_analyzer_eval(x == 0); // TRUE
}

We could stick to conservative evaluation of such constructors and destructors; at the very least, we should actively invalidate Store at the destructor. This patch can remain unchanged because the default values inside freshly conjured regions are unknown. But that'd still be a regression because the above test (presumably) passes with inlining.

Is now a good time to legalize modeling function calls inside checker callbacks?

clang/test/Analysis/smart-ptr-text-output.cpp
65

Could you double-check that this flag correctly disables all the newly introduced modeling so that it wasn't accidentally enabled for all users before it's ready?

This revision is now accepted and ready to land.Jun 16 2021, 12:26 AM
RedDocMD updated this revision to Diff 352621.Jun 16 2021, 8:57 PM

Added more info to the TODO

RedDocMD marked an inline comment as done.Jun 16 2021, 9:57 PM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr-text-output.cpp
65

Yup. There are 133 notes and warnings (just search for "// expected-" and see the count). And on setting the flag to false, there are 133 errors.
So it works.

RedDocMD marked an inline comment as done.Jun 16 2021, 9:58 PM

I would suppose that constructor calls are properly handled. (ie, member constructors are called properly).
As for modelling destructors, there is a separate problem - since we don't have a Stmt, the postCall handler keeps on crashing.

NoQ added inline comments.Jun 17 2021, 11:16 AM
clang/test/Analysis/smart-ptr-text-output.cpp
65

I mean, specifically the modeling added by this patch? I see you wrote an entire checker callback that doesn't seem to have an analyzer-config option check anywhere in it. Simply having different results isn't sufficient to verify this.

NoQ added a comment.Jun 17 2021, 11:41 AM

I would suppose that constructor calls are properly handled. (ie, member constructors are called properly).

Do the above tests pass when your new evalCall modeling is enabled?

I believe there are a couple of comments that are done but not marked accordingly.
I agree with Artem, if we could craft code that fails due to not calling ctors, we should probably include them in the tests, even if they don't reflect the desired behavior they are a great source of documentation what is missing.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
194

Nit: consider TemplateArgs.empty().

RedDocMD marked an inline comment as done.Jun 18 2021, 10:34 AM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr-text-output.cpp
65

Oh I see.

RedDocMD marked an inline comment as done.Jun 18 2021, 10:35 AM

I would suppose that constructor calls are properly handled. (ie, member constructors are called properly).

Do the above tests pass when your new evalCall modeling is enabled?

The analyzer doesn't seem to be able to make up its mind.

member-constructor.cpp:15:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is false
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: FALSE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is true
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: TRUE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
RedDocMD marked an inline comment as done.Jun 18 2021, 10:49 AM
RedDocMD updated this revision to Diff 353048.Jun 18 2021, 10:51 AM

Little changes, a failing test

I believe there are a couple of comments that are done but not marked accordingly.
I agree with Artem, if we could craft code that fails due to not calling ctors, we should probably include them in the tests, even if they don't reflect the desired behavior they are a great source of documentation what is missing.

Do you want the new failing test to be marked expected to fail?

Do you want the new failing test to be marked expected to fail?

I usually just add the wrong expectation to make the test pass and add a TODO comment that explains why is this wrong and we should fix it in the future.

NoQ added a comment.Jun 18 2021, 7:16 PM

Do you want the new failing test to be marked expected to fail?

The line of thinking here is that tests are just something that gives us a signal when the behavior changes. They don't necessarily demonstrate the expected behavior. If they don't there's still value in knowing that the observed behavior on them has changed.

For example, if you believe that your commit shouldn't change the observed behavior ("NFC commit", eg. refactoring), any signal that the observed behavior has changed should raise concerns and probably cause you to rethink the commit, even if the behavior has in fact improved.

Similarly, if the functional change is intended, you may still find it suspicious if it suddenly fixes a test that it wasn't supposed to fix. This may help you discover a bug in your commit that would cause the behavior to regress in other related cases.

You don't want the buggy test to be silenced, you want it to keep actively verifying that the bug is not yet fixed.

Then, separately, for every test there's a requirement to keep it understandable, so that the reader would understand what exactly is tested and what are the consequences of breaking the test. It's often valuable to provide comments indicating your level of confidence - "this test is extremely important to pass", "we don't really care about our behavior on this test but it's still nice to know when that behavior changes", etc. - and "we definitely don't want this test to pass but for now it does" is a possible answer as well and it doesn't make the test any less valuable.

So basically

void foo() {
  // FIXME: Should be FALSE.
  clang_analyzer_eval(1 + 1 == 3); // expected-warning{{TRUE}}
  // FIXME: Should be TRUE.
  clang_analyzer_eval(1 + 1 == 2); // expected-warning{{FALSE}}
}

is a great test to have.

Do the above tests pass when your new evalCall modeling is enabled?

The analyzer doesn't seem to be able to make up its mind.

member-constructor.cpp:15:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is false
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: FALSE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is true
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: TRUE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.

What about this?

What about this?

I believe that you might see analyzer-eagerly-assume in action. Basically, instead of keeping a constraint unknown, the analyzer can split the path eagerly. So you have one path where the constraint is assumed to be true and one where it is assumed to be false. I think the rationale was that the user is likely to branch on certain conditions and doing it eagerly can make the analysis more precise in some cases but I do not really remember the details. Hopefully Artem has some more contexts.

clang/test/Analysis/smart-ptr-text-output.cpp
378

Nit: we usually add new lines to the ends of the files.

NoQ added a comment.Jun 20 2021, 7:00 PM

Yes, @xazax.hun is correct.

It's incorrect to say that the static analyzer "doesn't seem to be able to make up its mind". The analyzer gives perfectly clear and consistent answers for each execution path it explores and it's not surprising that the results are different on different execution paths. The presence of the FALSE path indicates indicates that the test indeed doesn't pass: an impossible execution path is being explored.

Eagerly-assume is a thing because it produces easier constraints for the solver to solve. Also the state split is pretty justified on any boolean expression with no side effects.

RedDocMD updated this revision to Diff 358267.Jul 13 2021, 7:58 AM

Fixing up tests

Is the syntax of specifying expected notes and warnings documented somewhere? I could not find the note-specific syntax.

RedDocMD updated this revision to Diff 359605.Jul 18 2021, 2:23 AM

Post-rebase cleanup

RedDocMD updated this revision to Diff 359625.Jul 18 2021, 7:16 AM

Fixed up tests

RedDocMD updated this revision to Diff 359627.Jul 18 2021, 7:21 AM

Marked test with FIXME notes

This revision was landed with ongoing or failed builds.Jul 18 2021, 7:25 AM
This revision was automatically updated to reflect the committed changes.