This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][NFC] Add methods `getReturnObject()` and `getArgObject()` to `CallEvent`
AbandonedPublic

Authored by baloghadamsoftware on Jun 12 2020, 1:29 AM.

Details

Reviewers
NoQ
Szelethus
Summary

There are cases when a checker cannot know in advance the nature of the return value they try to retrieve from a CallEvent. If it is a class instance then it must use getReturnValueUnderConstruction(). If it is not, then it must use getReturnValue(), because the former function returns None since only class instances have construction context.

Simlarly, arguments which are class instances passed by value are copy-constructed into the parameter. To retrieve them the checker needs to invoke getParameterLocation() instead of getArgSVal(). However, for basic types and class instances passed by pointer of reference only the latter can be used to track the value of the argument.

To save checkers from doing these branching all over the time this patch introduces two methods to CallEvent: getReturnObject() to get the correct return value and getArgObject() to get the correct argument.

Diff Detail

Event Timeline

NoQ added a comment.EditedJun 12 2020, 2:42 AM

No, please don't do this.

The checker should *always* know the nature of the object in advance. If you look at the SVal in order to figure out the nature of the object, you are unable to discriminate between objects of completely different nature that are intentionally represented by the same SVal. For example, if you receive a SymbolicRegion value as a return value from your getReturnObject(), this can be for two reasons: (1) you're returning a pointer, (2) you're returning an object into that region. And you can't discriminate between those cases by looking at the SVal.

Both arbitrary glvalues and pointer prvalues are represented by Loc values but any code that fails to discriminate between the value of expression E and the value of ImplicitCastExpr(E, CK_LValueToRValue) is bonkers.

See also D77062#inline-744287

Thank you for taking a look on this!

Let me explain! An iterator may be implemented by multiple ways. It can be a simple pointer, which means that it is a basic type passed by value both as argument and return value. Thus here getArgSVal() and getReturnValue() are the correct methods to invoke. It can be something small (e.g. a pointer) wrapped into a class. It is also passed by value both as argument (maybe also for comparison operators ) and return value, but since it is a class instance, both require copy construction. This means that here getParameterLocation() and getReturnValueUnderConstruction() are the right methods. Furthermore, it can be a somewhat larger class that is passed by constant reference as argument, but of course by value as return value. In this case getArgSVal() and getReturnValueUnderConstruction() are the winners. Also the same class can be a value parameter in one function and a reference parameter in another one. The checker does not know this in advance, but must work correctly for all these cases.

In this patch getReturnObject() uses a "trial and failure" approach, thus it tries to assume that we are handling a class instance, which is copy constructed, so it tries to retrieve it from the construction context of the call. If no such context exists then it means that the value is not copy constructed. I think this approach is correct.

On the other hand, I agree with you that in getArgObject() checking the SVal is insufficient/incorrect. (I also planned to make a self-comment there, but I forgot.) Is it OK, if I look at the callee instead, and examine the type of its parameter? If it is passed by value, and is a C++ class instance, then I use getParameterLocation(), otherwise getArgSVal(). Of course, I can do the same for getReturnObject(), but the result is exactly the same as the current "trial and failure" approach.

It is also possible to put these functions into Iterator.h, but I think that other checkers may also benefit from such functionality. One thing is sure: I need such wrapper functions somewhere, without them all the iterator-related checkers will be unreadable because of the code multiplication. There are lots of places where we retrieve one of the arguments or the return value, and checking the nature of the value everywhere almost duplicates the size of the code and makes it unreadable.

Updated to check the type of the argument expression instead of the SVal.

NoQ added a comment.Jun 16 2020, 2:11 AM

Let me explain! An iterator may be implemented by multiple ways. It can be a simple pointer...

Let's cover this with tests then, for once.

Hint: they don't work, for the exact reason i described above, and your last update to this patch is insufficient to fix that.

// system-header-simulator-cxx.h:

...
222   class vector {
223     T *_start;
224     T *_finish;
225     T *_end_of_storage;
226
227   public:
228     typedef T value_type;
229     typedef size_t size_type;
230     typedef T* iterator;              // <===
231     typedef const T * const_iterator; // <===
...

// iterator-range.cpp:

...
11 void deref_end(const std::vector<int> &V) {
12   auto i = V.end();
13   *i; // expected-warning{{Past-the-end iterator dereferenced}}
14       // expected-note@-1{{Past-the-end iterator dereferenced}}
15 }
...

And indeed it doesn't emit the warning.

So it still thinks that the iterator is an object, whereas it should be tracking the iterator as symbol instead. As expected, because your code fails to discriminate between these two cases. You can't reorder checks in {set,get}IteratorPosition to fix that (first check for symbol, then check for region) because that'd break object iterators that reside in symbolic regions; you need to repeat the check on the AST instead. Which makes the reuse of that check proposed in this patch worthless.

In D81718#2095043, @NoQ wrote:

So it still thinks that the iterator is an object, whereas it should be tracking the iterator as symbol instead. As expected, because your code fails to discriminate between these two cases. You can't reorder checks in {set,get}IteratorPosition to fix that (first check for symbol, then check for region) because that'd break object iterators that reside in symbolic regions; you need to repeat the check on the AST instead. Which makes the reuse of that check proposed in this patch worthless.

This one is easy to overcome, because in case of pointers we need the pointer itself. The checker knows in advance, whether it is a pointer or not. However it does not know whether it is a value or a (constant) reference. However, with such iterator-specific behavior this function must go into the iterator library. (Pointers are symbols, references are objects.)

Do you have some alternative suggestions?

NoQ added a comment.Jun 16 2020, 8:58 AM

If the checker does not know something, then neither does a method on CallEvent. I suggest you don't introduce methods on CallEvent the entire purpose of which is to support an incorrect solution in your checker.

Your test case unfortunately does not test what you want, because raw pointers are not supported yet: all the checkers look for overloaded operators of classes, basic operator expressions are not handled yet. It is planned, of course, in the future, once we can go ahead.

In D81718#2095957, @NoQ wrote:

If the checker does not know something, then neither does a method on CallEvent. I suggest you don't introduce methods on CallEvent the entire purpose of which is to support an incorrect solution in your checker.

What is the correct solution then? To put branches all over the code of the checkers? Surely not. The checker must first recognize the nature of the iterator parameters and return values for every call. This recognition and then the decision should be wrapped into a function. If not in CallEvent then in the iterator-related checker's common library. There is no point to repeat the same code all over the checkers.

NoQ requested changes to this revision.Jun 16 2020, 12:44 PM

Your test case unfortunately does not test what you want, because raw pointers are not supported yet

It tests exactly what i want: the correctness of your code in this very patch that was written to handle this very case for which you never even bothered figuring out the correct solution but you already wrote a large amount of code (including some code in this patch) with zero test coverage to demonstrate correctness of your solution.

As of now you wrote your new two functions to return some value in every case. However, only some of these cases are covered with tests; in other cases the value was basically chosen randomly. The way you added unittests in order to make sure this completely random value stays random rather than correct doesn't count as testing.

What is the correct solution then? To put branches all over the code of the checkers? Surely not.

That's literally how test-driven development works. First you write tests, then you write any code that covers them, and only then you figure out how to reuse code. This is because correctness is completely orthogonal to prettiness; arguing that my solution is incorrect "because it's ugly" is ridiculous. This is why test-driven development recommends starting with a correct-but-ugly solution, not with an incorrect-but-pretty solution: first you figure out how to solve the problem at all, then refactor.

I will not discuss this further unless all the dead code that you're refactoring is either covered with LIT tests or removed.

This revision now requires changes to proceed.Jun 16 2020, 12:44 PM
baloghadamsoftware added a comment.EditedJun 16 2020, 11:23 PM
In D81718#2096540, @NoQ wrote:

Your test case unfortunately does not test what you want, because raw pointers are not supported yet

It tests exactly what i want: the correctness of your code in this very patch that was written to handle this very case for which you never even bothered figuring out the correct solution but you already wrote a large amount of code (including some code in this patch) with zero test coverage to demonstrate correctness of your solution.

Did you read my comment? These functions are not called at all by your tests!

It tests exactly what i want: the correctness of your code in this very patch that was written to handle this very case for which you never even bothered figuring out the correct solution but you already wrote a large amount of code (including some code in this patch) with zero test coverage to demonstrate correctness of your solution.

Zero test coverage? Then what are the unit tests?

I am really sorry to tell that, but now I began adding support for raw pointers as iterators (I will upload them in a separate patch when fully ready) and then tried your test. It passes, the error is found using this particular patch. So please tell me what is wrong with it, because the test you suggested does not prove it incorrect once support for pointers as iterators are there. If you feel it too iterator-specific, then I can abandon this patch and move this functions into the iterator library (Iterator.h and Iterator.cpp) in D77229.

NoQ added a comment.Jun 17 2020, 4:49 AM

These functions are not called at all by your tests!

Of course they aren't. Because they're dead code. You just introduced them and haven't called them yet.

But that code is taken and re-used from the checker. And the code in the checker has the problem. And after you tweaked the code a bit they still have that problem. And i'm pointing out the problem. And you don't want me to stop pointing out problems.

zero test coverage to demonstrate correctness of your solution.

Zero test coverage? Then what are the unit tests?

Unittests do not demonstrate correctness of the overall solution; that's what integration tests do. Unittests only demonstrate that the specific API behaves as expected. I'm questioning the expected behavior.

I am really sorry to tell that, but now I began adding support for raw pointers as iterators (I will upload them in a separate patch when fully ready) and then tried your test. It passes, the error is found using this particular patch.

This particular patch is NFC. It could not have helped. Sigh.

The new functions are now considered as iterator-specific thus moved into D77229.