This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Model comparision methods of std::unique_ptr
ClosedPublic

Authored by RedDocMD on Jun 20 2021, 11:10 PM.

Details

Summary

This patch handles all the comparision methods (defined via overloaded
operators) on std::unique_ptr. These operators compare the underlying
pointers, which makes it difficult to model the result.

Diff Detail

Event Timeline

RedDocMD created this revision.Jun 20 2021, 11:10 PM
RedDocMD requested review of this revision.Jun 20 2021, 11:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2021, 11:10 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

The only method that I think can be realistically modelled is == (and thus !=). If both the operands refer to the same unique_ptr, we know == returns true. If they are not the same, the only way == can return true if the two smart pointers were initialized from the same raw pointer. This is of course a fatal bug in itself. So perhaps we can ignore this case and only consider the first case.
The ordering operators I guess can't be handled because there is no way to statically tell in general the address of some value. We have the following deductions, nevertheless, mathematically:
Let ptr1 and ptr2 be two std::unique_ptr objects.
If (ptr1 == ptr2) is true:

  • ptr1 < ptr2 is false
  • ptr1 > ptr2 is false
  • ptr1 <= ptr2 is true
  • ptr1 >= ptr2 is true

If (ptr1 == ptr2) is false, we can't say anything really.

The only method that I think can be realistically modelled is == (and thus !=). If both the operands refer to the same unique_ptr, we know == returns true. If they are not the same, the only way == can return true if the two smart pointers were initialized from the same raw pointer. This is of course a fatal bug in itself. So perhaps we can ignore this case and only consider the first case.
The ordering operators I guess can't be handled because there is no way to statically tell in general the address of some value. We have the following deductions, nevertheless, mathematically:
Let ptr1 and ptr2 be two std::unique_ptr objects.
If (ptr1 == ptr2) is true:

  • ptr1 < ptr2 is false
  • ptr1 > ptr2 is false
  • ptr1 <= ptr2 is true
  • ptr1 >= ptr2 is true

If (ptr1 == ptr2) is false, we can't say anything really.

Sounds good to me!

If (ptr1 == ptr2) is false, we can't say anything really.

Well, I think it depends. If one of the pointers is null, for some platforms, we can. E.g. null < non-null is probably true on most architectures, and similarly non-null < null is false. Also null <= anyptr is probably true on the platforms we care about.

If they are not the same, the only way == can return true if the two smart pointers were initialized from the same raw pointer. This is of course a fatal bug in itself.

Is it? E.g. two default constructed unique_ptrs both should have null as the underlying pointer, they should be considered equal, are not the same object, and this is not a fatal bug.

RedDocMD updated this revision to Diff 353709.Jun 22 2021, 10:50 AM

Logic for handling special cases, when both are unique_ptr

Looks like a solid start!

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
323

nit: variable names should be capitalized

356

nit: llvm_unreachable instead of assert(false)

395–413

This switch exactly repeats the previous switch.

vsavchenko added inline comments.Jun 22 2021, 11:16 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
343–345

Another idea here.
Instead of repeating:

State = State->assume(llvm::dyn_cast<DefinedOrUnknownSVal>(RetVal), VALUE);
addTransition = true;

and having boolean addTransition, you can have Optional<bool> CompResult and do:

CompResult = VALUE;

And do the actual assumption at the bottom section when CompResult.hasValue()

367–386

These are also symmetrical.

416–458

These are symmetrical, is there a way to implement it without this boilerplate?

NoQ added a comment.Jun 22 2021, 3:31 PM

If (ptr1 == ptr2) is false, we can't say anything really.

Well, I think it depends. If one of the pointers is null, for some platforms, we can. E.g. null < non-null is probably true on most architectures, and similarly non-null < null is false. Also null <= anyptr is probably true on the platforms we care about.

If they are not the same, the only way == can return true if the two smart pointers were initialized from the same raw pointer. This is of course a fatal bug in itself.

Is it? E.g. two default constructed unique_ptrs both should have null as the underlying pointer, they should be considered equal, are not the same object, and this is not a fatal bug.

Why not simply delegate this job to assume(evalBinOp(...)) over raw pointer values, which already has all this logic written down nicely?

Why not simply delegate this job to assume(evalBinOp(...)) over raw pointer values, which already has all this logic written down nicely?

This is what I had in mind, I just did not want to spoil it :)

Why not simply delegate this job to assume(evalBinOp(...)) over raw pointer values, which already has all this logic written down nicely?

This is what I had in mind, I just did not want to spoil it :)

Looks like I have wasted a good deal of effort. :(

Looks like I have wasted a good deal of effort. :(

Sorry about that! :( If we learned anything new in the process it was not wasted effort though.

Looks like I have wasted a good deal of effort. :(

Sorry about that! :( If we learned anything new in the process it was not wasted effort though.

I know what I learnt - if I am writing too much code, there is probably a function to that. :)

RedDocMD updated this revision to Diff 354323.Jun 24 2021, 11:43 AM

Removed re-invention, added tests

We have a failing test here (test at line 473).
Which makes me wonder if the handleComparision function is at all called. This is something I need to check.

xazax.hun added inline comments.Jun 24 2021, 12:13 PM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
312

So it looks like operator<< is the only operator we are not supporting. I think it might be good to include some test code to ensure that its use does not suppress warnings. (Also OK, if you decide to deal with this in a follow-up PR.)

345

I think in cases where we know that the result is true or false, the RetVal should probably be a constant instead of a conjured symbol with an assumption.

352

Instead of marking this unreachable, I'd suggest to just return a conjured symbol for now. Usually, we should not use asserts to test user input.

365

I am not sure if we actually need all this special casing. You could create an SVal that represents a nullpointer constant and use evalBinOp with that SVal when there is no symbol available.

405

cannot reach here is redundant information. That is already encoded in llvm_unreachable. I suggest including a message that tells the reader why is it unreachable. In this case it could be "unexpected overloaded operator kind".

clang/test/Analysis/smart-ptr.cpp
466

Putting tests like this on the same path can be risky. Tests might split the execution path (maybe not now, but in the future). I think it might be more future proof to have a large switch statement that switches on an unknown value and put the tests in separate cases.

RedDocMD marked 4 inline comments as done.Jun 25 2021, 2:42 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
312

Yes that's the next patch. (and one more for std::hash).

352

Ah yes that's what is happening now. RetVal is initialized to a conjured value. If we can conclude no further, then that is what is returned - which is what happens here. In other cases, constraints are applied or it is replaced by the output from evalBinOp.

365

Actually the naming is a bit misleading here, perhaps. FirstPtr is not the inner pointer itself but a pointer to the SVal. So the test FirstPtr && !SecondPtr checks that we know the SVal for the inner pointer of the first arg but we do not know that of the second arg. Using a null-pointer constant would not help.
However, we could use a conjured value to simplify some work.

RedDocMD marked 3 inline comments as done.Jun 25 2021, 2:42 AM
RedDocMD updated this revision to Diff 354459.Jun 25 2021, 3:16 AM

Refactored code, removed duplications, fixed tests, added some more

RedDocMD added inline comments.Jun 25 2021, 3:19 AM
clang/test/Analysis/smart-ptr.cpp
466

I didn't quite get you.

RedDocMD updated this revision to Diff 354460.Jun 25 2021, 3:19 AM

Removed dump statement

xazax.hun added inline comments.Jun 25 2021, 7:47 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
336

In case we conjure a new symbol, we want this stored in the TrackedRegionMap. So the analyzer can correctly record the constraints between the two symbols.

clang/test/Analysis/smart-ptr.cpp
466

You remember this in the other patch:

member-constructor.cpp:15:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is false
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: FALSE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
member-constructor.cpp:15:25: note: Assuming the condition is true
    clang_analyzer_eval(*P->p == 0);
                        ^~~~~~~~~~
member-constructor.cpp:15:5: note: TRUE
    clang_analyzer_eval(*P->p == 0);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.

It looks like this does not happen for overloaded comparison operators at the moment. But we might want to do that in the future (@NoW what do you think). I was wondering, if we want to future proof these test cases for that behavior. But looking at the test cases again, you only have two, where the expected result is unknown, and they are at the very end. So feel free to ignore this and leave the code as is.

xazax.hun added inline comments.Jun 25 2021, 7:59 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
343

Btw, if we do not have a helper yet to translate between these enums in the analyer, we should create one. Could you look for some other places in the analyzer code base to double check?

RedDocMD added inline comments.Jun 26 2021, 2:27 AM
clang/test/Analysis/smart-ptr.cpp
466

Then perhaps I should leave a comment to indicate this possibility.

RedDocMD updated this revision to Diff 354695.Jun 26 2021, 10:49 AM

First try at implementing conversion function from OverloadedOperatorKind to BinaryOperatorKind

NoQ added inline comments.Jun 27 2021, 11:33 PM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
355–356

Optimization: if one of the lookups succeeds, the other is unnecessary. A bit premature but I think we shouldn't lose too much elegance over this.

392–393

This sounds like a super common functionality. Doesn't .get() do the same as well? I think it should be de-duplicated into a top-level method.

447–450

Because these operators are pure boolean functions, a state split would be justified. It's pointless to call the operator unless both return values are feasible. I think you should do an assume() and transition into both states.

It might also make sense to consult -analyzer-config eagerly-assume= before doing it but it sounds like this option will stays true forever and we might as well remove it.

The spaceship operator is obviously an exceptional case. Invocation of the spaceship operator isn't a good indication that all three return values are feasible, might be only two.

RedDocMD retitled this revision from [analyzer][WIP] Model comparision methods of std::unique_ptr to [analyzer] Model comparision methods of std::unique_ptr.Jun 29 2021, 10:28 AM

Sorry for not updating. Was down with fever.
This patch does *not* work now. operationKindFromOverloadedOperator is broken because the maps don't get populated. I am not entirely sure why this is happening.
Will try to fix tomorrow. @NoQ, @vsavchenko, @xazax.hun, @teemperor do you have a hunch as to why this may be happening?

NoQ added a comment.Jun 29 2021, 1:48 PM

I suspect that you might want to include OperationKinds.def instead of OperationKinds.h.

RedDocMD updated this revision to Diff 355445.Jun 29 2021, 10:16 PM

Fixed bug in enum conversion

RedDocMD updated this revision to Diff 355458.Jun 30 2021, 12:24 AM

Refactored out common block

RedDocMD added inline comments.Jun 30 2021, 12:32 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
447–450

I think this comment has got displaced from its original location from subsequent updates. Could you please clarify?

NoQ added inline comments.Jun 30 2021, 7:31 PM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
447–450

You can always go back in time by clicking the |<< button in the top-left corner of the comment ;)

I'm trying to say that you should introduce a state split while evaluating the operators, not wait for the control flow operators to do that for you, exactly like the static analyzer currently does for the raw comparison operators.

RedDocMD updated this revision to Diff 356128.Jul 2 2021, 1:29 AM

Performing state split on normal comparision ops

xazax.hun added inline comments.Jul 2 2021, 8:26 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
409

Do we want to create a new SVal in this case? Why returning S is insufficient?

417

While this is smart (overwriting the state here), I think this is hard to follow. I'd prefer this lambda returning a pair.

436

assume might not return a state (when the analyzer is smart enough to figure out one path is infeasible). While your code would probably work as is, as far as I remember we usually try to not call addTransition with null states.

clang/test/Analysis/smart-ptr.cpp
484

I do not see test cases where the path is actually split. Would you add some?

NoQ added inline comments.Jul 2 2021, 10:54 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
432

Now that we've made a state split, it makes sense to bind a concrete true value or a concrete false value (depending on the state) instead of a symbolic value. This produces simpler symbolic equations for our constraint solver to handle.

I.e., something like

std::tie(TrueState, FalseState) = State->assume(...);
if (TrueState)
  C.addTransition(TrueState->BindExpr(E, LCtx, SVB.makeTruthVal(true));
if (FalseState)
  C.addTransition(FalseState->BindExpr(E, LCtx, SVB.makeTruthVal(false));
436

Yeah I have a lot of anxiety about passing nulls there as well, have to look up the source code of these functions every time :/

RedDocMD marked 3 inline comments as done.Jul 2 2021, 11:08 AM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr.cpp
484

Oops! I removed two tests emitting UNKNOWN and forgot to put them back in.

RedDocMD marked 2 inline comments as done.Jul 2 2021, 11:09 AM
RedDocMD updated this revision to Diff 356226.Jul 2 2021, 11:12 AM

Simplify SVal on state split, other refactors

NoQ accepted this revision.Jul 3 2021, 12:57 PM

I only have a few nits left.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
36

I think you're not using this anymore.

317

One good place to put this may be CheckerHelpers.h. This is where we dump all the stuff that's probably useful in multiple checkers but not in other places.

I also wonder if you plan to support unary operators. The interesting part about them is that they are sometimes ambiguous to binary operators in their string representation, eg. -.

436

C.getSValBuilder() is called three times now, you might want to stash it in a reference. There's probably not much performance benefit but the code should be easier to read this way.

clang/test/Analysis/Inputs/system-header-simulator-cxx.h
983

I think it's a good idea to test these cross-type comparison operators as well. IIUC they're handled exactly the same as their same-type counterparts but it may uncover occasional assertion failures.

This revision is now accepted and ready to land.Jul 3 2021, 12:57 PM
RedDocMD marked 4 inline comments as done.Jul 4 2021, 1:32 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
317

I guess passing a parameter to chose operator arity should work.

RedDocMD marked an inline comment as done.Jul 4 2021, 1:32 AM
RedDocMD updated this revision to Diff 356371.Jul 4 2021, 1:35 AM

Little refactors

NoQ added a comment.Jul 4 2021, 11:40 PM

If everybody's happy let's commit :)

clang/test/Analysis/smart-ptr.cpp
467
487
494
xazax.hun requested changes to this revision.Jul 5 2021, 8:45 AM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
351

Nit: couldn't you just return retrieveOrConjureInnerPtrVal(Reg, E, Type, C); instead of all the ceremony with std::tie?

366

I think we might end up losing some information here. Imagine both makeSValFor conjuring a new symbol. Only one of the symbols will be stored in the state since capturing happend once, when you created the lambda. Maybe it is better to ask for a state in makeSValFor as a parameter instead of doing a lambda capture.

Moreover, retrieveOrConjureInnerPtrVal will not even look at the state captured by makeSValFor. It will ask the CheckerContext to get the state, but that state is the state from the beginning of the function that does not have any of the modifications you made to it.

This revision now requires changes to proceed.Jul 5 2021, 8:45 AM
RedDocMD marked 2 inline comments as done.Jul 5 2021, 10:21 AM
RedDocMD updated this revision to Diff 356526.Jul 5 2021, 10:24 AM

Major bug fix

xazax.hun accepted this revision.Jul 5 2021, 10:48 AM

The latest version looks good to me, let's land this!

This revision is now accepted and ready to land.Jul 5 2021, 10:48 AM
This revision was automatically updated to reflect the committed changes.