Checkers should be able to get the return value under construction for a CallEvent. This patch adds a function to achieve this which retrieves the return value from the construction context of the call.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think adding a better getter to StackFrameContext would make the patch a tad nicer, but other than that, I don't have much to add to this patch, unfortunately :) The code looks nice and we definitely need something like this.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
551 | This mustn't be serious. StackFrameContext::getIndex() has no comments at all! So it is an index to the element within a CFGBlock to which it also stores a field for??? Ugh. Shouldn't we just have a getCallSiteCFGElement or something? Especially since we have CFGElementRefs now. Would you be so nice to hunt this down please? :) |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
551 | Do you mean a new StackFrameContext::getCFGElement() member function? I agree. I can do it, however this technology where we get the block and the index separately is used at many places in the code, then it would be appropriate to refactor all these places. Even wrose is the backward direction, where we look for the CFG element of a statement: we find the block from CFGStmtMap and then we search for the index liearly, instrad of storing it in the map as well!!! |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h | ||
---|---|---|
438 | A / is missing (or too much of / above?). |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof (eg., standalone temporaries without destructor technically have a construction context with 0 items so when we implement them correctly your procedure will stop working). |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
551 | Nasty. Yeah, I mean not to burden you work refactoring any more then you feel like doing it, but simply adding a StackFrameContext::getCFGElement() method and using it in this patch would be nice enough, and some comments to getIndex(). But this obviously isn't a high prio issue. |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
575 | This is extremely ugly and was one of the reasons I originally did not reuse handleConstructionContext(). What should I do here? Create a pure virtual handleConstructionContext() into SubEngine? Or somehow make handleConstructionContext static? Is there maybe a more proper way to access ExprEngine from CallEvent? | |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
112 | That was so my first thought. However, handleConstructionContext() is private and non-static. Now I tried to merge the two methods: if the value is already in the construction context, we return it, if not then we add it. Is this what you suggest? Or did I misunderstand you? At the very beginning I tried to simply use handleConstructionContext(), but it asserted because the value was already in the map. |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h | ||
---|---|---|
438 | +1 I'd expect a description about what it does and only then the details. | |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h | ||
719 | Why was it necessary to move this prototype? Is this hunk related at all? | |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
575 | Seems like SubEngine has only one descendant: ExprEngine. Consequently SubEngine is a false interface that is never ever used polymorphically, perhaps we could just get rid of that? I mean what if we used ExprEngine everywhere and scraped the SubEngine class? SubEngine seems to be a superfluous legacy to me (I can be wrong of course). | |
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp | ||
30 | I think it would make the tests cleaner if we had some member variables set here and then in the test function (addTestReturnValueUnderConstructionChecker) we had the expectations on those variables. | |
48 | Please clang-format test code as well! You may also use raw string literals. Please be precise with test code too. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | I'd like to preserve the assertion that objects-under-construction are never filled twice; it's a very useful sanity check. What you need in your checker is a function that computes object-under-construction but doesn't put it into the objects-under-construction map. So you have to separate the computation from filling in the state. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | OK, so I (fortunately) misundertood you. Thus I should refactor this function to a calculation and a storing part? |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | OK, I see what you are speaking about, but I have no idea how to do it properly. The problem is that the control logic of filling in the state also depends on the kind of the construction context. For some kinds we do not fill at all. Every way I try it becomes more complex and less correct:
The only thing I can (maybe) do is to retrieve the construction context item. But then the switch is still duplicated for filling, because of the different control logic for different construction context kinds. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 |
This does not help even for retrieving the value, because its control logic also depends on the construction context kind. If I do it, it will be code triplication instead of duplication and results in a code that is worse to understand than the current one. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 |
Any solution I can come up with rather adds 100 to 150 lines to the code instead of saving 50. For retrieving we only have to determine the construction context item (the key). For storing we also have to calculate the value. It sounds good to calculate these things in separate functions and then have a simple retriever and store function but there are lots of recursions, double strores, non-stores, retrieving in the store function that make it too complicated. Also retrieving is more complex than just determining the item and getting the value from the context: if you look at SimpleReturnedValueKind and CXX17ElidedCopyReturnedValueKind you can see that we do not use the construction context we got in the parameter (the construction context of the return value) but the construction context of the call instead. For ElidedTemporaryObjectKind we take the construction context before the elusion. Future proofness: I agree, this is a problem to solve. In cases where the construction context is empty maybe we can move the calculation of the values to a separate function that is called by both the "handler" and the "retriever". |
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp | ||
---|---|---|
25 | I assume this tests this call expression: returnC(1) . But this is absolutely not trivial from the test code, could you please make this cleaner? |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | Do I think correctly that retrieveFromConstructionContext (in the previous version) should return the same value as second part of handleConstructionContext? It does not look obvious at first. Or do we need a function with that purpose? If yes it looks a difficult task because the similar "logic" to get the value and update the state. Probably it is enough to add a flag to handleConstructionContext to not make new state. The current code looks bad because calling a "handle" type of function (that was private before) only to compute a value. |
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp | ||
---|---|---|
29 | How do we know that 0 is the proper block count? |
In this version of the patch you are using handleConstructionContext to "retrieve" or read an SVal for the construction. However, it seems like that function was designed to store values in the GDM for individual construction context items. To mix a read-only functionality into a setter seems to be error-prone to me.
Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof
I am not sure that this is the best option here. In my viewpoint, ConstructionContext and ConstructionContextItem are both "variants" (i.e. tagged unions) and we perform operations on these. Each operation is actually a visitation on the different kind of values (i.e. a switch). One operation is transforming a ConstructionContext to a ConstructionContextItem. This is what is needed to be able to call the getObjectUnderConstruction function. And this operation - the reading - is way different than the other - the store -, see Adam's concerns. This yields to different functions for each op. But this is perfectly natural once we work with "variants". I mean we can easily extend the operations on a variant with a new operation as a form of a visitation of the kinds (contrary to traditional class hierarchies where adding a new op is hard, but adding a new type is easy).
Just one more thing. It may turn out that the visitation should be done the same way in both cases, then we have to have a Visitor factored out. But, now it seems that even the visitation strategy is different for the individual operations.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 |
I think the following is a fairly literal implementation of my suggestion: It's 26 lines shorter than your retrieveFromConstructionContext() (ok, not 50, duplicating the switch was more annoying than i expected), it adds zero new logic (the only new piece of logic is to track constructors that were modeled correctly but their elision wasn't; previously we could fit it into control flow), it no longer tries to reverse-engineer the original behavior by duplicating its logic, and it's more future-proof for the reasons explained above. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | Thank you for working on this. The point what I did not see (and I still no not see it) is that in your solution (and your proposal) instead of retrieving the value from the construction context we actually recalculate it, thus we do not use the construction context at all (except for NewAllocatedObjectKind) to retrieve the location of the return value. Can we guarantee that we always recalculate exactly the same symbolic value as the symbolic values we store in the map? For example, for SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it really the same value than that the one in the map, thus the one which is the real location where the object is constructed? If not, then the checkers might not work correctly because they store into the GDM using a different region as key than they use for retrieving the value. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 |
+1 for this version. Even though Adam has valid concerns, this version contains functions with clear responsibilities.
I think the guarantee is that in Artem's implementation the switch is duplicated. That would be okay for now, but I still consider this error-prone, exactly because of the duplication. So, maybe this yields for a common Visitor. Guys, would it be possible to have a common Visitor for these? Could be somewhat similar to DeclVisitor with return type as type parameter. And we could pass a lambda that is called in every case? Or would that be too overkill |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 |
That is not my point. My point is that handleConstructionContext() is called, a value is calculated and stored in the map. This calculation may be the creation of a new conjured symbol. Thereafter, whenever a checker tries to retrieve the location of the return value from the construction context in Artem's solution we do not retrieve the value previously stored in the map, but calculate a new one, which is not necessarily the same as the one we stored in the map, because a in some cases a new symbol is conjured. Then the checker uses this new symbol as the index in GDM, but the object is constructed to the location stored in the map. Therefore, later when the checker tries to retrieve data from the GDM using the location of the existing object as a key it cannot find the data because it was stored using the wrong key. getReturnValueUnderConstruction() must always return the location where the object is to be constructed. Therefore I think it must not recalculate the location which may involve creation of new conjured symbols but first it must look into the map of objects under construction. This guarantees that the returned location is really the location where the object is constructed. |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
112 | Yes it should always return the same value, because there is exactly one correct solution to the problem that it's solving. If it may return different values for the same (E, State, LCtx, CC) tuple, it's a bug. This method should ultimately be turned into a static method. Loading a stored value if it exists will only sweep the problem under the rug. It'd be much better for us to assert that the freshly recomputed value is equal to the stored value if the latter is present. The code that constructs symbolic region for the return value of a top frame is indeed suspicious. I'm not sure it can actually return different regions twice but this code is clearly incorrect because SymbolConjured is never the correct solution anyway. The only reason SymbolConjured is used anywhere is because the author (in this case, i) was too lazy to define a new symbol class that has the right identity. SymbolConjured is like UnknownVal that's a bit less unknown but still screams "not implemented yet". I think this definitely deserves a fixme.
Mmm, i don't think i understand this argument. A visitor is a switch with a fancy interface and wildcards and lacking checks for unhandled cases. Generally though i wouldn't mind having a ConstructionContextVisitor. For instance, that'll allow us to replace code like case ConstructionContext::CXX17ElidedCopyVariableKind: case ConstructionContext::SimpleVariableKind: { } with a single VisitVariable(VariableConstructionContext *CC) { } I wouldn't like a visitor for ConstructionContextItems because they are fairly meaningless when taken out of, well, context. |
+GSoC gang because it'll ultimately help Nithin's future checker track smart pointers returned from functions.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
551 | +1 | |
553 | nit: space after if And I would also prefer putting const auto *Ctor | |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
225–227 | Maybe here (and further) we can use a widespread pattern of bool-castable things declared in the if condition (like you do with dyn_casts): if (Optional<SVal> ExistingVal = getObjectUnderConstruction(State, CE, LCtx)) return std::make_pair(State, *ExistingVal); | |
284–299 | It seems like a code duplication to me. First of all we can first compose a ConstructionContextItem from either CE, CCE, or ME and then call for getObjectUnderConstruction. | |
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp | ||
25 | I think that this function, the meat and bones of the test, should be properly commented and state explicitly what are the assumption and what is the expected outcome. | |
30 | Why not gtests assertions? Those will also interrupt the execution, but will be much clearer in the output. |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
543 | It is clear that the final return value of getConstructionContext() does not depend on the current block count; in fact the information you're trying to obtain is purely syntactic. You probably might as well pass 0 here. As i mentioned before, the ideal solution would be to make CallEvent carry a CFGElementRef to begin with (as a replacement for the origin expr), so that it didn't need to be recovered like this. We should absolutely eliminate the BlockCount parameter of getReturnValueUnderConstruction() as well. Checker developers shouldn't understand what it is, especially given that it doesn't matter at all which value do you pass. | |
551 | It's us anyway. StackFrameContext is essentially analyzer-exclusive but even for things like CFG and analysis-based warnings in Clang, there's nobody else who maintains them except for the analyzer gang. | |
553 | Ah, classic. |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
543 | OK, I eliminated it. Do we need BlockCount for getParameterLocation() (the other patch)? | |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
112 | A Visitor is a good idea. However, is it used anywhere else? If it is, then it is worth to do it, but I think it can go into a new patch. |
Great, thank you!
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
539 | This looks useful. Let's turn it into a CallEvent method as well. | |
543 | We absolutely need BlockCount in the other patch because it's part of the identity of the parameter region. That said, the API that requires the user to provide BlockCount manually is still terrible and extremely error-prone. Namely, the contract of getParameterLocation() would be "The behavior is undefined unless you provide the same block count that is there when call happens". Which is a good indication that we should have stored BlockCount in CallEvent from the start. |
Hello, this breaks check-clang everywhere, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/69062/steps/test-check-all/logs/FAIL%3A%20Clang-Unit%3A%3ATestReturnValueUnderConstructionChecker.ReturnValueUnderConstructionChecker
Please take a look, and revert for now if the fix isn't obvious.
Please watch the tree a bit after landing; it's been red for 4 hours now.
Hello,
Thank you, I think I fixed it now. I worked on it for hours, because it involved lots of git branch changes and the the rebuild was slow.
Thank you, I think I fixed it now. I worked on it for hours, because it involved lots of git branch changes and the the rebuild was slow.
Thanks for the fix! If you notice that fixing a bot will take more than a few minutes, please revert the commit while you work on the fix :)
Warning (from eg: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/32043):
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h:1102:23: warning: 'getDecl' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
Edit: From D80522
A / is missing (or too much of / above?).
It looks like that the comment tells something about how the function works, not what it does. Or not? (The function works by retrieving the construction context and something from it, but that seems to be the return value, not a region. This is why this comment can be confusing.) This belongs better to the implementation, not to the documentation part. The documentation could contain something about how and when the function can be used to get a non-null value.