This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Don't bind values to ObjCForCollectionStmt, replace it with a GDM trait
ClosedPublic

Authored by Szelethus on Aug 27 2020, 1:25 PM.

Details

Summary

Based on the discussion in D82598#2171312. Thanks @NoQ!

Let's talk about how we got here. D82598 is titled "Get rid of statement liveness, because such a thing doesn't exist", and indeed, expressions express a value, non-expression statements don't.

if (a && get() || []{ return true; }())
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ has a value
    ~ has a value
    ~~~~~~~~~~ has a value
                  ~~~~~~~~~~~~~~~~~~~~ has a value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ doesn't have a value

That is simple enough, so it would only make sense if we only assigned symbolic values to expressions in the static analyzer. Yet the interface checkers can access presents, among other strange things, the following two methods:

ProgramState::BindExpr(const Stmt *S, const LocationContext *LCtx, SVal V, bool Invalidate=true)
ProgramState::getSVal(const Stmt *S, const LocationContext *LCtx)

So, what gives? Turns out, we make an exception for ReturnStmt (which we'll leave for another time) and ObjCForCollectionStmt. For any other loops, in order to know whether we should analyze another iteration, among other things, we evaluate it's condition. Which is a problem for ObjCForCollectionStmt, because it simply doesn't have one (CXXForRangeStmt has an implicit one!). In its absence, we assigned the actual statement with a concrete 1 or 0 to indicate whether there are any more iterations left. However, this is wildly incorrect, its just simply not true that the for statement has a value of 1 or 0, we can't calculate its liveness because that doesn't make any sense either, so this patch turns it into a GDM trait.

This patch isn't quite done (its nowhere near up to my standards in terms of documentation), but I wanted to publish it to avoid sinking too much work into a doomed patch.

Diff Detail

Event Timeline

Szelethus created this revision.Aug 27 2020, 1:25 PM
Szelethus requested review of this revision.Aug 27 2020, 1:25 PM

For any other loops, in order to know whether we should analyze another iteration, among other things, we evaluate it's condition. Which is a problem for ObjCForCollectionStmt, because it simply doesn't have one

Shouldn't we try to fix the ObjCForCollectionStmt in the AST rather? By adding the condition to that as a (sub)expression? There are already many discrepancies and inconsistencies between supposedly similar AST nodes, and people do fix them occasionally (e.g. D81787).
To be honest, it seems a bit off to store some parts of the liveness info in the GDM while other parts not in the GDM ...

For any other loops, in order to know whether we should analyze another iteration, among other things, we evaluate it's condition. Which is a problem for ObjCForCollectionStmt, because it simply doesn't have one

Shouldn't we try to fix the ObjCForCollectionStmt in the AST rather? By adding the condition to that as a (sub)expression? There are already many discrepancies and inconsistencies between supposedly similar AST nodes, and people do fix them occasionally (e.g. D81787).

To be fair, before I say anything, I don't know much about Objective C, but I don't think this is a bug. This is just how for each loops are in it. The C++ standard specifies that ranged based for loops must be equivalent to a pre-C++11 loop, hence the implicit condition in the AST. I think this is the same kind of annoyance we chatted about regarding virtuality in the AST -- even if a method is virtual, it may not show up as such in the AST if the keyword itself is absent.

To be honest, it seems a bit off to store some parts of the liveness info in the GDM while other parts not in the GDM ...

There is no liveness to talk about here, it just simply doesn't make sense. This is just a property: "Does this loop have any more iterations left?". While the earlier hack looks convincing that this has something more to it, it really doesn't. In fact, this patch demonstrates it quite well: we only need to keep track of this information until we make a state split, so theoretically, we could just pass this along as a parameter. The only thing holding me back from that solution is that it would be a lot more disruptive change, and would require a change to the checkBranchCondition callback.

For any other loops, in order to know whether we should analyze another iteration, among other things, we evaluate it's condition. Which is a problem for ObjCForCollectionStmt, because it simply doesn't have one

Shouldn't we try to fix the ObjCForCollectionStmt in the AST rather? By adding the condition to that as a (sub)expression? There are already many discrepancies and inconsistencies between supposedly similar AST nodes, and people do fix them occasionally (e.g. D81787).

To be fair, before I say anything, I don't know much about Objective C, but I don't think this is a bug. This is just how for each loops are in it. The C++ standard specifies that ranged based for loops must be equivalent to a pre-C++11 loop, hence the implicit condition in the AST. I think this is the same kind of annoyance we chatted about regarding virtuality in the AST -- even if a method is virtual, it may not show up as such in the AST if the keyword itself is absent.

To be honest, it seems a bit off to store some parts of the liveness info in the GDM while other parts not in the GDM ...

There is no liveness to talk about here, it just simply doesn't make sense. This is just a property: "Does this loop have any more iterations left?". While the earlier hack looks convincing that this has something more to it, it really doesn't. In fact, this patch demonstrates it quite well: we only need to keep track of this information until we make a state split, so theoretically, we could just pass this along as a parameter. The only thing holding me back from that solution is that it would be a lot more disruptive change, and would require a change to the checkBranchCondition callback.

Ok, fair enough.

clang/lib/StaticAnalyzer/Core/ProgramState.cpp
327 ↗(On Diff #288434)

Why it is not enough to simply have ObjCForCollectionStmt* as a key?

steakhal resigned from this revision.Aug 31 2020, 5:39 AM

I don't have much to say, but to keep up the good work xD
I will follow this and the rest of your patches @Szelethus.

Szelethus added inline comments.Sep 1 2020, 5:01 AM
clang/lib/StaticAnalyzer/Core/ProgramState.cpp
327 ↗(On Diff #288434)

Environment is the data structure we use to bind values to statements (which would ideally be expressions only). The problem is that a statement on its own isn't enough to identify where we are in the analysis:

void traverse(Graph &G) {
 // SubG may need to be modeled in different stackframes, but has the same Stmt*
 for (Graph &SubG : G->successors()) {
    traverse(SubG);
  }
}

LocationContext can be thought of as a stack function calls (or anything with a distinct stack frame). Indeed, if two (Stmt *, LocationContext *) pairs are equal, we're rebinding the same expression (another loop iteration, goto), if only the statements are the same, we're not in the same function call (stack frame).

Of course ObjCForCollectionStmt can not have a value, that is the whole point of the patch, but we still need a LocationContext to correctly identify precisely which loop we're talking about.

I don't have anymore immediate concerns, but I will need more time to comb through the rest of the patch in more details... hopefully I can do that in the following days.

clang/lib/StaticAnalyzer/Core/ProgramState.cpp
327 ↗(On Diff #288434)

Yeah, I did not think about stack frames, it makes perfect sense what you write, thanks for the explanation!

Szelethus added a comment.EditedSep 1 2020, 10:03 AM

I don't have anymore immediate concerns, but I will need more time to comb through the rest of the patch in more details... hopefully I can do that in the following days.

Thank you so much! I fear my project might take a turn, so this definitely isn't a high priority patch.

NoQ accepted this revision.Sep 5 2020, 5:38 PM

This looks correct to me. Thanks a lot. This hack was surprisingly crude and i'm very glad that we now consistently use expressions in our Environment.

clang/lib/StaticAnalyzer/Core/ProgramState.cpp
329 ↗(On Diff #288434)

I'd really prefer this to live in ExprEngine.cpp, like C++ constructor traits, and not be exposed in the ProgramState API, because as of now this is an implementation hack that other clients of ProgramState don't beed to know about.

The actual non-placeholder implementation would i guess be interesting. The loop boils down to repeatedly sending certain messages to the collection. So i guess we could evalCall them in the checker that'll deal with modeling these collections.

clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
498–499

IIRC "everything is" is the valid English.

This revision is now accepted and ready to land.Sep 5 2020, 5:38 PM
In D86736#2257880, @NoQ wrote:

i'm very glad that we now consistently use expressions in our Environment.

*except for ReturnStmt, but I haven't dug myself into that pit just yet :) Thanks!

Szelethus updated this revision to Diff 291199.EditedSep 11 2020, 6:43 AM
Szelethus marked 4 inline comments as done.

Reinstate the assert removed in rG032b78a0762bee129f33e4255ada6d374aa70c71, add a test. Enforce that Environment only makes an exception for ReturnStmt in terms of non-expression statements (and hope that isn't going to stay that way for long).

edit: And most importantly, move the new interface to ExprEngine.