This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Add `getReturnValueUnderConstruction()` to `CallEvent`
ClosedPublic

Authored by baloghadamsoftware on May 21 2020, 12:55 AM.

Details

Summary

Checkers should be able to get the return value under construction for a CallEvent. This patch adds a function to achieve this which retrieves the return value from the construction context of the call.

Diff Detail

Event Timeline

I think adding a better getter to StackFrameContext would make the patch a tad nicer, but other than that, I don't have much to add to this patch, unfortunately :) The code looks nice and we definitely need something like this.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
551

This mustn't be serious. StackFrameContext::getIndex() has no comments at all! So it is an index to the element within a CFGBlock to which it also stores a field for??? Ugh. Shouldn't we just have a getCallSiteCFGElement or something? Especially since we have CFGElementRefs now.

Would you be so nice to hunt this down please? :)

baloghadamsoftware marked an inline comment as done.May 21 2020, 10:05 PM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
551

Do you mean a new StackFrameContext::getCFGElement() member function? I agree. I can do it, however this technology where we get the block and the index separately is used at many places in the code, then it would be appropriate to refactor all these places. Even wrose is the backward direction, where we look for the CFG element of a statement: we find the block from CFGStmtMap and then we search for the index liearly, instrad of storing it in the map as well!!!

balazske added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
438

A / is missing (or too much of / above?).
It looks like that the comment tells something about how the function works, not what it does. Or not? (The function works by retrieving the construction context and something from it, but that seems to be the return value, not a region. This is why this comment can be confusing.) This belongs better to the implementation, not to the documentation part. The documentation could contain something about how and when the function can be used to get a non-null value.

NoQ added inline comments.May 22 2020, 4:35 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof (eg., standalone temporaries without destructor technically have a construction context with 0 items so when we implement them correctly your procedure will stop working).

Szelethus added inline comments.May 22 2020, 5:47 AM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
551

Nasty. Yeah, I mean not to burden you work refactoring any more then you feel like doing it, but simply adding a StackFrameContext::getCFGElement() method and using it in this patch would be nice enough, and some comments to getIndex(). But this obviously isn't a high prio issue.

Merged retrieveFromConstructionContext() to handleConstructionContext().

baloghadamsoftware marked 2 inline comments as done.May 24 2020, 7:14 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
575

This is extremely ugly and was one of the reasons I originally did not reuse handleConstructionContext(). What should I do here? Create a pure virtual handleConstructionContext() into SubEngine? Or somehow make handleConstructionContext static? Is there maybe a more proper way to access ExprEngine from CallEvent?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

That was so my first thought. However, handleConstructionContext() is private and non-static. Now I tried to merge the two methods: if the value is already in the construction context, we return it, if not then we add it. Is this what you suggest? Or did I misunderstand you? At the very beginning I tried to simply use handleConstructionContext(), but it asserted because the value was already in the map.

martong edited the summary of this revision. (Show Details)May 25 2020, 7:33 AM
martong added inline comments.May 25 2020, 9:24 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
438

+1

I'd expect a description about what it does and only then the details.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
719

Why was it necessary to move this prototype? Is this hunk related at all?

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
575

Seems like SubEngine has only one descendant: ExprEngine. Consequently SubEngine is a false interface that is never ever used polymorphically, perhaps we could just get rid of that? I mean what if we used ExprEngine everywhere and scraped the SubEngine class? SubEngine seems to be a superfluous legacy to me (I can be wrong of course).

clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp
30

I think it would make the tests cleaner if we had some member variables set here and then in the test function (addTestReturnValueUnderConstructionChecker) we had the expectations on those variables.
Right now, if we face an assertion it kills the whole unit test suite and it is not obvious why that happened.

48

Please clang-format test code as well! You may also use raw string literals. Please be precise with test code too.

NoQ added inline comments.May 26 2020, 8:56 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

I'd like to preserve the assertion that objects-under-construction are never filled twice; it's a very useful sanity check. What you need in your checker is a function that computes object-under-construction but doesn't put it into the objects-under-construction map. So you have to separate the computation from filling in the state.

baloghadamsoftware marked an inline comment as done.May 26 2020, 10:02 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

OK, so I (fortunately) misundertood you. Thus I should refactor this function to a calculation and a storing part?

baloghadamsoftware marked an inline comment as done.May 27 2020, 1:17 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

OK, I see what you are speaking about, but I have no idea how to do it properly. The problem is that the control logic of filling in the state also depends on the kind of the construction context. For some kinds we do not fill at all. Every way I try it becomes more complex and less correct:

  1. NewAllocatedObjectKind: we do not add this to the state, we only retrieve the original.
  2. SimpleReturnedValueKind and CXX17ElidedCopyReturnedValueKind: depending on whether we are in top frame we handle this case recursively or we do not fill at all, just return the value. What is the construction context item here? Maybe the ReturnStmt?
  3. ElidedTemporaryObjectKind: this is the most problematic: we first handle it recursively for the construction context after elision, then we also fill it for the elided temporary object construction context as well.

The only thing I can (maybe) do is to retrieve the construction context item. But then the switch is still duplicated for filling, because of the different control logic for different construction context kinds.

baloghadamsoftware marked an inline comment as done.May 27 2020, 3:09 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

The only thing I can (maybe) do is to retrieve the construction context item.

This does not help even for retrieving the value, because its control logic also depends on the construction context kind. If I do it, it will be code triplication instead of duplication and results in a code that is worse to understand than the current one.

baloghadamsoftware marked an inline comment as done.May 27 2020, 3:35 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof (eg., standalone temporaries without destructor technically have a construction context with 0 items so when we implement them correctly your procedure will stop working).

Any solution I can come up with rather adds 100 to 150 lines to the code instead of saving 50. For retrieving we only have to determine the construction context item (the key). For storing we also have to calculate the value. It sounds good to calculate these things in separate functions and then have a simple retriever and store function but there are lots of recursions, double strores, non-stores, retrieving in the store function that make it too complicated.

Also retrieving is more complex than just determining the item and getting the value from the context: if you look at SimpleReturnedValueKind and CXX17ElidedCopyReturnedValueKind you can see that we do not use the construction context we got in the parameter (the construction context of the return value) but the construction context of the call instead. For ElidedTemporaryObjectKind we take the construction context before the elusion.

Future proofness: I agree, this is a problem to solve. In cases where the construction context is empty maybe we can move the calculation of the values to a separate function that is called by both the "handler" and the "retriever".

martong added inline comments.May 28 2020, 2:42 AM
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp
25

I assume this tests this call expression: returnC(1) . But this is absolutely not trivial from the test code, could you please make this cleaner?

balazske added inline comments.May 28 2020, 3:41 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Do I think correctly that retrieveFromConstructionContext (in the previous version) should return the same value as second part of handleConstructionContext? It does not look obvious at first. Or do we need a function with that purpose? If yes it looks a difficult task because the similar "logic" to get the value and update the state. Probably it is enough to add a flag to handleConstructionContext to not make new state. The current code looks bad because calling a "handle" type of function (that was private before) only to compute a value.

martong added inline comments.May 28 2020, 4:21 AM
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp
29

How do we know that 0 is the proper block count?

In this version of the patch you are using handleConstructionContext to "retrieve" or read an SVal for the construction. However, it seems like that function was designed to store values in the GDM for individual construction context items. To mix a read-only functionality into a setter seems to be error-prone to me.

Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof

I am not sure that this is the best option here. In my viewpoint, ConstructionContext and ConstructionContextItem are both "variants" (i.e. tagged unions) and we perform operations on these. Each operation is actually a visitation on the different kind of values (i.e. a switch). One operation is transforming a ConstructionContext to a ConstructionContextItem. This is what is needed to be able to call the getObjectUnderConstruction function. And this operation - the reading - is way different than the other - the store -, see Adam's concerns. This yields to different functions for each op. But this is perfectly natural once we work with "variants". I mean we can easily extend the operations on a variant with a new operation as a form of a visitation of the kinds (contrary to traditional class hierarchies where adding a new op is hard, but adding a new type is easy).

In this version of the patch you are using handleConstructionContext to "retrieve" or read an SVal for the construction. However, it seems like that function was designed to store values in the GDM for individual construction context items. To mix a read-only functionality into a setter seems to be error-prone to me.

Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof

I am not sure that this is the best option here. In my viewpoint, ConstructionContext and ConstructionContextItem are both "variants" (i.e. tagged unions) and we perform operations on these. Each operation is actually a visitation on the different kind of values (i.e. a switch). One operation is transforming a ConstructionContext to a ConstructionContextItem. This is what is needed to be able to call the getObjectUnderConstruction function. And this operation - the reading - is way different than the other - the store -, see Adam's concerns. This yields to different functions for each op. But this is perfectly natural once we work with "variants". I mean we can easily extend the operations on a variant with a new operation as a form of a visitation of the kinds (contrary to traditional class hierarchies where adding a new op is hard, but adding a new type is easy).

Just one more thing. It may turn out that the visitation should be done the same way in both cases, then we have to have a Visitor factored out. But, now it seems that even the visitation strategy is different for the individual operations.

NoQ added inline comments.May 28 2020, 7:20 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Any solution I can come up with rather adds 100 to 150 lines to the code instead of saving 50.

I think the following is a fairly literal implementation of my suggestion:

It's 26 lines shorter than your retrieveFromConstructionContext() (ok, not 50, duplicating the switch was more annoying than i expected), it adds zero new logic (the only new piece of logic is to track constructors that were modeled correctly but their elision wasn't; previously we could fit it into control flow), it no longer tries to reverse-engineer the original behavior by duplicating its logic, and it's more future-proof for the reasons explained above.

baloghadamsoftware marked an inline comment as done.May 28 2020, 7:48 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Thank you for working on this. The point what I did not see (and I still no not see it) is that in your solution (and your proposal) instead of retrieving the value from the construction context we actually recalculate it, thus we do not use the construction context at all (except for NewAllocatedObjectKind) to retrieve the location of the return value. Can we guarantee that we always recalculate exactly the same symbolic value as the symbolic values we store in the map? For example, for SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it really the same value than that the one in the map, thus the one which is the real location where the object is constructed? If not, then the checkers might not work correctly because they store into the GDM using a different region as key than they use for retrieving the value.

martong added inline comments.May 28 2020, 8:55 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

I think the following is a fairly literal implementation of my suggestion: ... It's 26 lines shorter than your retrieveFromConstructionContext() (ok, not 50, duplicating the switch was more annoying than i expected), it adds zero new logic (the only new piece of logic is to track constructors that were modeled correctly but their elision wasn't; previously we could fit it into control flow), it no longer tries to reverse-engineer the original behavior by duplicating its logic, and it's more future-proof for the reasons explained above.

+1 for this version. Even though Adam has valid concerns, this version contains functions with clear responsibilities.

Can we guarantee that we always recalculate exactly the same symbolic value as the symbolic values we store in the map? For example, for SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it really the same value than that the one in the map, thus the one which is the real location where the object is constructed?

I think the guarantee is that in Artem's implementation the switch is duplicated. That would be okay for now, but I still consider this error-prone, exactly because of the duplication. So, maybe this yields for a common Visitor.

Guys, would it be possible to have a common Visitor for these? Could be somewhat similar to DeclVisitor with return type as type parameter. And we could pass a lambda that is called in every case? Or would that be too overkill

baloghadamsoftware marked an inline comment as done.May 28 2020, 10:07 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

I think the guarantee is that in Artem's implementation the switch is duplicated.

That is not my point. My point is that handleConstructionContext() is called, a value is calculated and stored in the map. This calculation may be the creation of a new conjured symbol. Thereafter, whenever a checker tries to retrieve the location of the return value from the construction context in Artem's solution we do not retrieve the value previously stored in the map, but calculate a new one, which is not necessarily the same as the one we stored in the map, because a in some cases a new symbol is conjured. Then the checker uses this new symbol as the index in GDM, but the object is constructed to the location stored in the map. Therefore, later when the checker tries to retrieve data from the GDM using the location of the existing object as a key it cannot find the data because it was stored using the wrong key.

getReturnValueUnderConstruction() must always return the location where the object is to be constructed. Therefore I think it must not recalculate the location which may involve creation of new conjured symbols but first it must look into the map of objects under construction. This guarantees that the returned location is really the location where the object is constructed.

NoQ added inline comments.May 29 2020, 5:43 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

Yes it should always return the same value, because there is exactly one correct solution to the problem that it's solving. If it may return different values for the same (E, State, LCtx, CC) tuple, it's a bug. This method should ultimately be turned into a static method.

Loading a stored value if it exists will only sweep the problem under the rug. It'd be much better for us to assert that the freshly recomputed value is equal to the stored value if the latter is present.

The code that constructs symbolic region for the return value of a top frame is indeed suspicious. I'm not sure it can actually return different regions twice but this code is clearly incorrect because SymbolConjured is never the correct solution anyway. The only reason SymbolConjured is used anywhere is because the author (in this case, i) was too lazy to define a new symbol class that has the right identity. SymbolConjured is like UnknownVal that's a bit less unknown but still screams "not implemented yet". I think this definitely deserves a fixme.

I think the guarantee is that in Artem's implementation the switch is duplicated. That would be okay for now, but I still consider this error-prone, exactly because of the duplication. So, maybe this yields for a common Visitor.
Guys, would it be possible to have a common Visitor for these?

Mmm, i don't think i understand this argument. A visitor is a switch with a fancy interface and wildcards and lacking checks for unhandled cases.

Generally though i wouldn't mind having a ConstructionContextVisitor. For instance, that'll allow us to replace code like

case ConstructionContext::CXX17ElidedCopyVariableKind:
case ConstructionContext::SimpleVariableKind: {
}

with a single

VisitVariable(VariableConstructionContext *CC) {
}

I wouldn't like a visitor for ConstructionContextItems because they are fairly meaningless when taken out of, well, context.

+GSoC gang because it'll ultimately help Nithin's future checker track smart pointers returned from functions.

vsavchenko added inline comments.Jun 2 2020, 4:03 AM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
551

+1
I do believe that clearer interface functions and better segregation of different functionalities will make it much easier for us in the future

553

nit: space after if

And I would also prefer putting const auto *Ctor

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
225–227

Maybe here (and further) we can use a widespread pattern of bool-castable things declared in the if condition (like you do with dyn_casts):

if (Optional<SVal> ExistingVal = getObjectUnderConstruction(State, CE, LCtx))
  return std::make_pair(State, *ExistingVal);
284–299

It seems like a code duplication to me.

First of all we can first compose a ConstructionContextItem from either CE, CCE, or ME and then call for getObjectUnderConstruction.
Additionally, I think we can hide even that by introducing an overload for getObjectUnderConstruction that takes an expression and an index.

clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp
25

I think that this function, the meat and bones of the test, should be properly commented and state explicitly what are the assumption and what is the expected outcome.

30

Why not gtests assertions? Those will also interrupt the execution, but will be much clearer in the output.

Incorporated @NoQ's solution.

Updated according to the comments.

baloghadamsoftware marked 9 inline comments as done.Jun 3 2020, 1:31 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
551

I fully agree, but StackFrameContext is not in the StaticAnalyzer part, but the Analysis part. I think that should be a separate patch. Who is the code owner for that part?

553

Ctor is not a pointer, but an llvm::Optional.

NoQ added inline comments.Jun 3 2020, 3:37 AM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
543

It is clear that the final return value of getConstructionContext() does not depend on the current block count; in fact the information you're trying to obtain is purely syntactic. You probably might as well pass 0 here. As i mentioned before, the ideal solution would be to make CallEvent carry a CFGElementRef to begin with (as a replacement for the origin expr), so that it didn't need to be recovered like this.

We should absolutely eliminate the BlockCount parameter of getReturnValueUnderConstruction() as well. Checker developers shouldn't understand what it is, especially given that it doesn't matter at all which value do you pass.

551

It's us anyway. StackFrameContext is essentially analyzer-exclusive but even for things like CFG and analysis-based warnings in Clang, there's nobody else who maintains them except for the analyzer gang.

553

Ah, classic.

baloghadamsoftware marked an inline comment as done.

Updated according to the comments.

baloghadamsoftware marked 4 inline comments as done.Jun 3 2020, 6:46 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
543

OK, I eliminated it. Do we need BlockCount for getParameterLocation() (the other patch)?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112

A Visitor is a good idea. However, is it used anywhere else? If it is, then it is worth to do it, but I think it can go into a new patch.

NoQ accepted this revision.Jun 8 2020, 5:12 AM

Great, thank you!

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
539

This looks useful. Let's turn it into a CallEvent method as well.

543

We absolutely need BlockCount in the other patch because it's part of the identity of the parameter region.

That said, the API that requires the user to provide BlockCount manually is still terrible and extremely error-prone. Namely, the contract of getParameterLocation() would be "The behavior is undefined unless you provide the same block count that is there when call happens". Which is a good indication that we should have stored BlockCount in CallEvent from the start.

This revision is now accepted and ready to land.Jun 8 2020, 5:12 AM
This revision was automatically updated to reflect the committed changes.
baloghadamsoftware marked an inline comment as done.
thakis added a subscriber: thakis.Jun 9 2020, 5:17 AM

Hello, this breaks check-clang everywhere, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/69062/steps/test-check-all/logs/FAIL%3A%20Clang-Unit%3A%3ATestReturnValueUnderConstructionChecker.ReturnValueUnderConstructionChecker

Please take a look, and revert for now if the fix isn't obvious.

Please watch the tree a bit after landing; it's been red for 4 hours now.

Hello, this breaks check-clang everywhere, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/69062/steps/test-check-all/logs/FAIL%3A%20Clang-Unit%3A%3ATestReturnValueUnderConstructionChecker.ReturnValueUnderConstructionChecker

Please take a look, and revert for now if the fix isn't obvious.

Please watch the tree a bit after landing; it's been red for 4 hours now.

Hello,

Thank you, I think I fixed it now. I worked on it for hours, because it involved lots of git branch changes and the the rebuild was slow.

thakis added a comment.Jun 9 2020, 6:33 AM

Thank you, I think I fixed it now. I worked on it for hours, because it involved lots of git branch changes and the the rebuild was slow.

Thanks for the fix! If you notice that fixing a bot will take more than a few minutes, please revert the commit while you work on the fix :)

riccibruno added a subscriber: riccibruno.EditedJun 9 2020, 6:50 AM

Warning (from eg: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/32043):

/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h:1102:23: warning: 'getDecl' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]

Edit: From D80522