To tell the static analyzer how to handle a structured binding to an array, we first have to evaluate ArrayInitLoopExpr, which is used to initialize the new array being created.
The analyzer must also be taught how to handle a BindingDecl when it is an array member.
Details
Diff Detail
Unit Tests
Event Timeline
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h | ||
---|---|---|
441 | It is not related to your code, but I wonder whether these comments are actually adding any value. @NoQ how would you feel about removing these one liners and have a single comment like // Transfer functions. Although that would mean that this text would not be visible when looking at the individual methods in doxygen's output. | |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
1380 | Hmm, looking at this file almost every transfer function is structured differently. Some transfer functions are calling runCheckersForPreStmt and runCheckersForPostStmt. At other places, like CXXBindTemporaryExprClass above, these calls are outside of the transfer function in the switch statements itself. Maybe we should standardize on one way of doing this. Feel free to ignore this comment for this PR, it was just a small rant on our inconsistent coding style :) | |
2545 | Not your code, but this might be redundant due to LCtx above. Might be nice to clean it up. | |
2614 | Are there cases where BD->getDecomposedDecl() returns something that is not a DecompositionDecl? I almost want to change this dyn_cast to a cast to learn how could that happen. | |
2618 | Do I remember well that Base will end up being an UnknownVal? If that is the case, I wonder if the problem is we are trying to look this value up too late. The analyzer tries to keep its program state tidy. Whenever it determines that a value will never be used, it does some sort of garbage collection and removes the value from the state. My theory is the following:
If this is the case, the source of the problem is that creating the binding here, where a DeclRefExpr is visited might be too late. We either need create bindings earlier, shortly after step 1. Or alternatively, we need to make extra effort to tell the analyzer to keep certain part of the state. I'd prefer to start with the first option and see where we end up. | |
2638 | Use getLimitedValue if you want a regular integer. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2618 | What I found is that Base ended up being an UnknownVal because VisitArrayInitLoopExpr wasn't implemeneted. Since then I haven't seen Base being UnknownVal but that can also mean, I just can't reproduce this issue. If you have a snippet thought, which is able to reproduce this behaviour please share it with me, and I'll look into it. |
Now binding to arrays is handled in case of binding by reference (both L and RValue).
I'm also using getLimitedValue() as suggested.
The code still crashes in case of some testcases. Also BindingDecl handling is not tested.
In my next update I'm going to address these issues and the remaining comments.
- VisitArrayInitLoopExpr no longer crashes clang
- removed unnecessary binding to environment in VisitArrayInitLoopExpr
- added testcases for binding to arrays
Updated ExprEngine::ArrayInitLoopExpr() to behave similar to object copy construction.
If the array, that's being copied hasn't been initialized, a new cluster is created with
uninitialized values, otherwise a lazyCompoundValue is used.
Your code looks very reasonable to me! I think it could use a lot more tests though. In particular, it sounds like you're the first person to ever handle ArrayInitLoopExpr *outside* of structured bindings, so we might want to have a few test cases about that specifically, that don't feature structured bindings.
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2625 | The expression is always a compile-time constant right? Let's comment that and maybe assert that, because typically getSVal() doesn't do much on expressions that aren't currently active. | |
2649–2652 | Ok so a lot of this code seems to be pattern-matching over possible AST patterns. This is not a typical thing to happen because typically sub-statements are handled by their own separate transfer functions. In situations like this I recommend commenting with examples of code and AST snippets corresponding to it, it's much easier to understand the code this way. | |
2666 | Hmm lambda capture? It probably deserves a test case! | |
2682 | SVal{} translates to UndefinedVal so I recommend spelling it out explicitly. It also doesn't seem right to me that all values except the first are always undefined. |
At the moment I can't really test this because of some weird behaviour. For example take a look at the following snippets.
struct s { int x; }; void foo() { s one; s two = one; int a = two.x; <- error reported here }
struct s { int x; int arr[2]; }; void foo() { s one; s two = one; int a = two.x; <- error NOT reported here }
In both cases I invoked the latest release (14.0.0) version of clang. Based on this, the issue has nothing to do with my code.
I suspect something is not handled in handleConstructor as after x is bound to the environment by VisitMemberExpr,
the struct itself is removed as if it was marked as a dead symbols. This is the reason "VisitMemberExpr is not able to
bind the array to the environment", it gets an Unknown value for the base of the expression.
I'm thinking about removing this part of the code entirely, and just leave a FIXME there as it used to be.
A similar behaviour is true for lambdas. I also tested this one with the latest release (14.0.0) of clang.
int x; [=]{ int a = x; <-- error reported here }();
int x; int arr[2]; [=]{ int a = x; <-- error NOT reported here int b = arr[0]; }();
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2682 | In this case the first is also undefined, I added first to the list to avoid invoking another SVal constructor. If the first value is known, then we don't enter this branch. Ideally a default binding to the region would be the best solution, but at this point the region is not created, the creation only happens after this transfer function. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | @NoQ Here I assume that if the first element is initialized, then so are all the other elements, as if any of the array elements are explicitly initialized, all the other elements are empty-initialized. As a result if the first element is uninitialized, then all the other elements are also uninitialized. However this proved to be incorrect. Consider this snippet: int init[2] = {0}; int uninit[2]; init[0] = uninit[0]; <-- error reported here, the analysis stops (?) auto [x, y] = init; int a = y; Though it seems that the analysis stops before ExprEngine::VisitArrayInitLoopExpr() by default, I wonder if we want to handle this scenario or not. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | The other way around there also is a problem. int uninit[2]; int init[2] = {0}; uninit[0] = init[0]; auto [x, y] = uninit; int a = y; <-- not reported auto [i, j] = init; int b = i; <-- reported as uninitialized assign |
Aha great thanks that's *MUCH* easier to read!
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | It looks to me like Base is the *location* of the array, and Fst is the *location* of the first element of that array; neither of them is the *value* of any element of the array. So if one of them is unknown or undefined, it doesn't tell us anything about whether the array is initialized or not, but it tells us that we don't know what the array even is or where it is. Which is most likely an indication that we're doing something wrong. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | No, never mind, you do a getSVal() later. I'll keep thinking. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | So, anyway, yes, the analysis stops when it reports a bug, namely, when CheckerContext.generateErrorNode() is invoked. The checkers are allowed to not interrupt the analysis, in this case they invoke CheckerContext.generateNonFatalErrorNode() instead. But uninitialized read is always fatal so we don't need to handle this situation. But there's always a way to partially initialize the array: int uninit[10]; uninit[5] = 0; This would result in the first element being uninitialized but some elements can still be initialized. So I think the right solution is to drop the isUnknownOrUndef() branch entirely. The lazy binding produced by the other branch will simply always do the right thing by snapshotting the entire store, with all its partial initializations. If it doesn't, let's look into that deeper. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | I think this issue is not related to this snippet. I looked at the state at this point, while I was going through the snippet above, and it The state *when* the first ArrayInitLoopExpr is visited, i.e. auto [x, y] = uninit;: "program_state": { "store": { "pointer": "0x555562d83d00", "items": [ { "cluster": "uninit", "pointer": "0x555562d84638", "items": [ { "kind": "Direct", "offset": 0, "value": "0 S32b" } ]} ]}, "environment": { "pointer": "0x555562d7f160", "items": [ { "lctx_id": 1, "location_context": "#0 Call", "calling": "main", "location": null, "items": [ { "stmt_id": 684, "pretty": "uninit[0]", "value": "&Element{uninit,0 S64b,int}" }, { "stmt_id": 703, "pretty": "init[0]", "value": "0 S32b" }, { "stmt_id": 706, "pretty": "uninit[0] = init[0]", "value": "&Element{uninit,0 S64b,int}" } ]} ]}, In this case { "stmt_id": 776, "pretty": "{uninit[*]}", "value": "lazyCompoundVal{0x555562d83d00,uninit}" } The state when the same expression is visted the *second* time, i.e.: auto [i, j] = init;: "program_state": { "store": { "pointer": "0x555562d89c88", "items": [ { "cluster": "a", "pointer": "0x555562d89c48", "items": [ { "kind": "Direct", "offset": 0, "value": "conj_$0{int, LC1, S833, #1}" } ]} ]}, "environment": null, In this case { "stmt_id": 899, "pretty": "{init[*]}", "value": "compoundVal{ Undefined, Undefined}" } It seems that in both cases, there are not enough information in the state, which allows the visitor to |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2712–2713 | Initially I was only creating a lazy binding, however if the array was uninitialized, the resulting value in case of the assignment was not read properly. int uninit[2]; auto [a, b] = uninit; // lazy binding created here lazyCompoundVal{0x0,uninit} int x = a; <-- x is `conj_$0{int, LC1, S742, #1}` and no error is reported. The store and the state are `null`s after the `ImplicitCastExpr` Object construction works the same way, like it also created a lazyCompoundValue, however when the object being lazy bound is uninitialized, a new region with Undefined˛ values is being created, that's why I'm doing the same here. |
Added a hack to live variable analysis, which helps modelling DecompositionDecls, and prevents some false positives in uninitialized value analysis.
- Handling the "small array" case in VisitArrayInitLoopExpr. This can be extracted to affect all the other expressions if we want to.
- Added more testcases.
clang/lib/Analysis/LiveVariables.cpp | ||
---|---|---|
326 ↗ | (On Diff #436353) | Oh ok looks like I misread the situation, so you're saying that DeclRefExpr for init is not added to the CFG at all? Yeah this seems to be the case. In this case the proper solution is to simply put it into the CFG. It should be straightforward since we aren't actually adding any control flow, probably the CFGBuilder visitor simply forgets to visit it. |
Removed the hack and put the reference to the CFG, see https://reviews.llvm.org/D127993.
Extraced and reworked the small array case in a different patch, see https://reviews.llvm.org/D128064.
This looks excellent.
clang/test/Analysis/array-init-loop.cpp | ||
---|---|---|
47 ↗ | (On Diff #438368) | Ideally these should warn on garbage return value! Or, at least, on passing garbage value to another function. Basically clang_analyzer_eval() never says UNDEFINED when core checkers are enabled. |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2715 | I explicitly initialized Base to Unknown, but I didn't delete the else branch. If Base contains some value, we cannot get as a region, then instead of Unknown, the value contained in Base will get bound to the expression. I'm not sure whether Base will always have a value at this point that we can get as a region (e.g.: we fail to model something before). |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
---|---|---|
2715 | Sounds good, thanks! |
It is not related to your code, but I wonder whether these comments are actually adding any value. @NoQ how would you feel about removing these one liners and have a single comment like // Transfer functions. Although that would mean that this text would not be visible when looking at the individual methods in doxygen's output.