This is an archive of the discontinued LLVM Phabricator instance.

[Static Analyzer] Structured binding to arrays
ClosedPublic

Authored by isuckatcs on May 29 2022, 7:15 AM.

Details

Summary

To tell the static analyzer how to handle a structured binding to an array, we first have to evaluate ArrayInitLoopExpr, which is used to initialize the new array being created.
The analyzer must also be taught how to handle a BindingDecl when it is an array member.

Diff Detail

Event Timeline

isuckatcs created this revision.May 29 2022, 7:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 7:15 AM
isuckatcs requested review of this revision.May 29 2022, 7:15 AM
isuckatcs retitled this revision from [Static Analyzer][Draft] Implemented ArrayInitLoopExpr evaluation to [Static Analyzer][Draft] Structured binding to arrays.May 29 2022, 9:50 AM
isuckatcs edited the summary of this revision. (Show Details)
isuckatcs updated this revision to Diff 432796.May 29 2022, 9:57 AM

Added draft implementation for BindingDecl in VisitCommonDeclRefExpr

xazax.hun added inline comments.May 29 2022, 8:09 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
441

It is not related to your code, but I wonder whether these comments are actually adding any value. @NoQ how would you feel about removing these one liners and have a single comment like // Transfer functions. Although that would mean that this text would not be visible when looking at the individual methods in doxygen's output.

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
1380

Hmm, looking at this file almost every transfer function is structured differently. Some transfer functions are calling runCheckersForPreStmt and runCheckersForPostStmt. At other places, like CXXBindTemporaryExprClass above, these calls are outside of the transfer function in the switch statements itself. Maybe we should standardize on one way of doing this. Feel free to ignore this comment for this PR, it was just a small rant on our inconsistent coding style :)

2545

Not your code, but this might be redundant due to LCtx above. Might be nice to clean it up.

2612

Are there cases where BD->getDecomposedDecl() returns something that is not a DecompositionDecl? I almost want to change this dyn_cast to a cast to learn how could that happen.

2616

Do I remember well that Base will end up being an UnknownVal? If that is the case, I wonder if the problem is we are trying to look this value up too late. The analyzer tries to keep its program state tidy. Whenever it determines that a value will never be used, it does some sort of garbage collection and removes the value from the state. My theory is the following:

  1. The analyzer visit the DecompositionDecl, stores its value in the program state.
  2. The analyzer continues with the analysis and realizes that no one references this part of the program state so removes it from the state.
  3. When you encounter a DeclRefExpr (that refers to a BindingDecl) you can no longer look the values up from the state, because it was already cleaned up.

If this is the case, the source of the problem is that creating the binding here, where a DeclRefExpr is visited might be too late. We either need create bindings earlier, shortly after step 1. Or alternatively, we need to make extra effort to tell the analyzer to keep certain part of the state. I'd prefer to start with the first option and see where we end up.

2649

Use getLimitedValue if you want a regular integer.

isuckatcs added inline comments.May 30 2022, 1:19 AM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2616

What I found is that Base ended up being an UnknownVal because VisitArrayInitLoopExpr wasn't implemeneted. Since then I haven't seen Base being UnknownVal but that can also mean, I just can't reproduce this issue. If you have a snippet thought, which is able to reproduce this behaviour please share it with me, and I'll look into it.

isuckatcs updated this revision to Diff 432980.May 30 2022, 1:46 PM

Now binding to arrays is handled in case of binding by reference (both L and RValue).
I'm also using getLimitedValue() as suggested.

The code still crashes in case of some testcases. Also BindingDecl handling is not tested.
In my next update I'm going to address these issues and the remaining comments.

isuckatcs marked an inline comment as done.May 30 2022, 1:47 PM
isuckatcs updated this revision to Diff 433105.May 31 2022, 9:01 AM
  • VisitArrayInitLoopExpr no longer crashes clang
  • removed unnecessary binding to environment in VisitArrayInitLoopExpr
  • added testcases for binding to arrays
  • applied clang format to the diff
isuckatcs updated this revision to Diff 434771.Jun 7 2022, 4:30 AM

Updated ExprEngine::ArrayInitLoopExpr() to behave similar to object copy construction.

If the array, that's being copied hasn't been initialized, a new cluster is created with
uninitialized values, otherwise a lazyCompoundValue is used.

isuckatcs updated this revision to Diff 435335.Jun 8 2022, 2:28 PM

Evaluating the case when ArrayInitLoopExpr is called in an implicit copy/move ctor

NoQ added a comment.Jun 8 2022, 6:52 PM

Your code looks very reasonable to me! I think it could use a lot more tests though. In particular, it sounds like you're the first person to ever handle ArrayInitLoopExpr *outside* of structured bindings, so we might want to have a few test cases about that specifically, that don't feature structured bindings.

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2623

The expression is always a compile-time constant right? Let's comment that and maybe assert that, because typically getSVal() doesn't do much on expressions that aren't currently active.

2660–2663

Ok so a lot of this code seems to be pattern-matching over possible AST patterns. This is not a typical thing to happen because typically sub-statements are handled by their own separate transfer functions. In situations like this I recommend commenting with examples of code and AST snippets corresponding to it, it's much easier to understand the code this way.

2677

Hmm lambda capture? It probably deserves a test case!

2693

SVal{} translates to UndefinedVal so I recommend spelling it out explicitly. It also doesn't seem right to me that all values except the first are always undefined.

isuckatcs added a comment.EditedJun 9 2022, 4:40 AM

Your code looks very reasonable to me! I think it could use a lot more tests though. In particular, it sounds like you're the first person to ever handle ArrayInitLoopExpr *outside* of structured bindings, so we might want to have a few test cases about that specifically, that don't feature structured bindings.

At the moment I can't really test this because of some weird behaviour. For example take a look at the following snippets.

struct s {
  int x;
};

void foo() {
  s one;
  s two = one;

  int a = two.x; <- error reported here
}
struct s {
  int x;
  int arr[2];
};

void foo() {
  s one;
  s two = one;

  int a = two.x; <- error NOT reported here
}

In both cases I invoked the latest release (14.0.0) version of clang. Based on this, the issue has nothing to do with my code.
I suspect something is not handled in handleConstructor as after x is bound to the environment by VisitMemberExpr,
the struct itself is removed as if it was marked as a dead symbols. This is the reason "VisitMemberExpr is not able to
bind the array to the environment", it gets an Unknown value for the base of the expression.

I'm thinking about removing this part of the code entirely, and just leave a FIXME there as it used to be.

A similar behaviour is true for lambdas. I also tested this one with the latest release (14.0.0) of clang.

int x;

[=]{
  int a = x; <-- error reported here
}();
int x;
int arr[2];

[=]{
  int a = x; <-- error NOT reported here
  int b = arr[0];
}();
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2693

In this case the first is also undefined, I added first to the list to avoid invoking another SVal constructor. If the first value is known, then we don't enter this branch.

Ideally a default binding to the region would be the best solution, but at this point the region is not created, the creation only happens after this transfer function.

isuckatcs updated this revision to Diff 435577.Jun 9 2022, 8:57 AM
isuckatcs marked 4 inline comments as done.

Addressed the comments on the code

isuckatcs added inline comments.Jun 9 2022, 9:12 AM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

@NoQ Here I assume that if the first element is initialized, then so are all the other elements, as if any of the array elements are explicitly initialized, all the other elements are empty-initialized. As a result if the first element is uninitialized, then all the other elements are also uninitialized.

However this proved to be incorrect. Consider this snippet:

int init[2] = {0};
int uninit[2];

init[0] = uninit[0];  <-- error reported here, the analysis stops (?)

auto [x, y] = init;
int a = y;

Though it seems that the analysis stops before ExprEngine::VisitArrayInitLoopExpr() by default, I wonder if we want to handle this scenario or not.

isuckatcs added inline comments.Jun 9 2022, 9:33 AM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

The other way around there also is a problem.

int uninit[2];
int init[2] = {0};

uninit[0] = init[0];

auto [x, y] = uninit;
int a = y; <-- not reported

auto [i, j] = init;
int b = i; <-- reported as uninitialized assign
NoQ added a comment.Jun 9 2022, 4:03 PM

Aha great thanks that's *MUCH* easier to read!

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

It looks to me like Base is the *location* of the array, and Fst is the *location* of the first element of that array; neither of them is the *value* of any element of the array. So if one of them is unknown or undefined, it doesn't tell us anything about whether the array is initialized or not, but it tells us that we don't know what the array even is or where it is. Which is most likely an indication that we're doing something wrong.

NoQ added inline comments.Jun 9 2022, 4:05 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

No, never mind, you do a getSVal() later. I'll keep thinking.

NoQ added inline comments.Jun 9 2022, 4:20 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

So, anyway, yes, the analysis stops when it reports a bug, namely, when CheckerContext.generateErrorNode() is invoked. The checkers are allowed to not interrupt the analysis, in this case they invoke CheckerContext.generateNonFatalErrorNode() instead. But uninitialized read is always fatal so we don't need to handle this situation.

But there's always a way to partially initialize the array:

int uninit[10];
uninit[5] = 0;

This would result in the first element being uninitialized but some elements can still be initialized.

So I think the right solution is to drop the isUnknownOrUndef() branch entirely. The lazy binding produced by the other branch will simply always do the right thing by snapshotting the entire store, with all its partial initializations. If it doesn't, let's look into that deeper.

isuckatcs added inline comments.Jun 9 2022, 4:22 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

I think this issue is not related to this snippet. I looked at the state at this point, while I was going through the snippet above, and it
seems init has already been removed by this point.

The state *when* the first ArrayInitLoopExpr is visited, i.e. auto [x, y] = uninit;:

"program_state": {
  "store": { "pointer": "0x555562d83d00", "items": [
    { "cluster": "uninit", "pointer": "0x555562d84638", "items": [
      { "kind": "Direct", "offset": 0, "value": "0 S32b" }
    ]}
  ]},
  "environment": { "pointer": "0x555562d7f160", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "main", "location": null, "items": [
      { "stmt_id": 684, "pretty": "uninit[0]", "value": "&Element{uninit,0 S64b,int}" },
      { "stmt_id": 703, "pretty": "init[0]", "value": "0 S32b" },
      { "stmt_id": 706, "pretty": "uninit[0] = init[0]", "value": "&Element{uninit,0 S64b,int}" }
    ]}
  ]},

In this case { "stmt_id": 776, "pretty": "{uninit[*]}", "value": "lazyCompoundVal{0x555562d83d00,uninit}" }
is bound to the environment.

The state when the same expression is visted the *second* time, i.e.: auto [i, j] = init;:

"program_state": {
  "store": { "pointer": "0x555562d89c88", "items": [
    { "cluster": "a", "pointer": "0x555562d89c48", "items": [
      { "kind": "Direct", "offset": 0, "value": "conj_$0{int, LC1, S833, #1}" }
    ]}
  ]},
  "environment": null,

In this case { "stmt_id": 899, "pretty": "{init[*]}", "value": "compoundVal{ Undefined, Undefined}" }
is bound to the environment.

It seems that in both cases, there are not enough information in the state, which allows the visitor to
evaluate everything properly. The other properties of the program state, that I have not listed above
are all nulls.

isuckatcs added inline comments.Jun 9 2022, 4:34 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2723–2724

Initially I was only creating a lazy binding, however if the array was uninitialized, the resulting value in case of the assignment was not read properly.

int uninit[2];

auto [a, b] = uninit; // lazy binding created here lazyCompoundVal{0x0,uninit}
int x = a; <-- x is `conj_$0{int, LC1, S742, #1}` and no error is reported. The store and the state are `null`s after the `ImplicitCastExpr`

Object construction works the same way, like it also created a lazyCompoundValue, however when the object being lazy bound is uninitialized, a new region with Undefined˛ values is being created, that's why I'm doing the same here.

isuckatcs updated this revision to Diff 436136.Jun 11 2022, 6:44 AM

Added a hack to live variable analysis, which helps modelling DecompositionDecls, and prevents some false positives in uninitialized value analysis.

isuckatcs updated this revision to Diff 436138.Jun 11 2022, 6:50 AM

Fixed the comment in LiveVariables.cpp, where a wrong AST was displayed.

isuckatcs updated this revision to Diff 436353.Jun 13 2022, 5:22 AM
  • Handling the "small array" case in VisitArrayInitLoopExpr. This can be extracted to affect all the other expressions if we want to.
  • Added more testcases.
NoQ added inline comments.Jun 15 2022, 11:32 PM
clang/lib/Analysis/LiveVariables.cpp
326

Oh ok looks like I misread the situation, so you're saying that DeclRefExpr for init is not added to the CFG at all? Yeah this seems to be the case.

In this case the proper solution is to simply put it into the CFG. It should be straightforward since we aren't actually adding any control flow, probably the CFGBuilder visitor simply forgets to visit it.

Removed the hack and put the reference to the CFG, see https://reviews.llvm.org/D127993.

isuckatcs updated this revision to Diff 437919.Jun 17 2022, 8:34 AM

Extraced and reworked the small array case in a different patch, see https://reviews.llvm.org/D128064.

isuckatcs retitled this revision from [Static Analyzer][Draft] Structured binding to arrays to [Static Analyzer] Structured binding to arrays.
isuckatcs edited the summary of this revision. (Show Details)

Updated the diff to be compatible with the latest changes.

isuckatcs updated this revision to Diff 438368.Jun 20 2022, 6:18 AM

Added more testcases

isuckatcs marked an inline comment as done.Jun 20 2022, 6:19 AM
NoQ accepted this revision.Jun 20 2022, 9:34 PM

This looks excellent.

clang/test/Analysis/array-init-loop.cpp
47 ↗(On Diff #438368)

Ideally these should warn on garbage return value! Or, at least, on passing garbage value to another function. Basically clang_analyzer_eval() never says UNDEFINED when core checkers are enabled.

This revision is now accepted and ready to land.Jun 20 2022, 9:34 PM
xazax.hun accepted this revision.Jun 21 2022, 8:47 AM

Looks great to me, thanks!

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2610

Nit: probably the binding cannot be a MemberExpr and a ArraySubscriptExpr at the same time. Use else if.

2726

Nit: Instead of this else, we could initialize Base with Unknown above where it is declared.

isuckatcs updated this revision to Diff 438971.Jun 22 2022, 4:58 AM
This comment was removed by isuckatcs.
isuckatcs updated this revision to Diff 438979.Jun 22 2022, 5:00 AM

Updated the wrong revision before

isuckatcs updated this revision to Diff 438986.Jun 22 2022, 5:21 AM

Addressed comments

isuckatcs marked 2 inline comments as done.Jun 22 2022, 5:33 AM
isuckatcs added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2726

I explicitly initialized Base to Unknown, but I didn't delete the else branch. If Base contains some value, we cannot get as a region, then instead of Unknown, the value contained in Base will get bound to the expression.

I'm not sure whether Base will always have a value at this point that we can get as a region (e.g.: we fail to model something before).

xazax.hun accepted this revision.Jun 22 2022, 1:48 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2726

Sounds good, thanks!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 2:38 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript