The summary and very short discussion in D82122 summarizes whats happening here.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I didn't run this on big projects just yet, and judging from the fact that Environment only contained ObjetiveC examples of non-expression statements, I might need a tip on how to test on ObjC code :)
I always wondered if we actually need to track the liveness of exprs at all. We could just kill all subexpr at the end of the full expression without precomputing anything. But I might miss something.
We could just kill all subexpr at the end of the full expression
I suspect that it would be pretty bad if you, say, kill the condition of the if-statement before picking the branch. Or kill the initializer in the DeclStmt before putting it into the variable (same for CXXCtorInitializer which isn't even a Stmt!).
I might need a tip on how to test on ObjC code :)
Unfortunately i don't have any helpful tips for you :( Such code is not cross-compiled very often. I vaguely remember seeing some Objective-C code in linux software but it's most likely not very modern/representative. So i guess it's on us to inform you of the potential problems.
I would argue that the end of the full expression is AFTER the if was evaluated in this case. But I do see what you mean, thanks :)
Well, formally it isn't. The notion of full-expression is defined pretty strictly because that's the moment of time when temporary destructors are invoked. So the language pays a lot of attention to what exactly constitutes an expression, and as of now if-statements don't. In particular, temporary destructors will run before the branch is chosen.
I guess your point is that live expressions analysis could be replaced with imperative cleanup requests from, say, ExprEngine to the Environment. I.e., "We've just finished evaluating the if-statement, now we should actively tell the Environment to remove the condition expression". I guess it could totally work that way, but it'd be pretty hard to not forget such cleanups. Given that we'd also barely ever notice that we forgot one of those, i'm very much in favor of having liveness analysis instead, that would declaratively describe which expressions are live when, so that to automatically guarantee that we always only track what's necessary and never snowball our state with dead expressions.
@Szelethus
The patch looks good to me and I find it a welcome change that should make things easier to understand and maybe even a bit more efficient, I hope this won't break ObjC :) My discussion with Artem is orthogonal to this change.
Right. Sorry for being unclear there.
I guess your point is that live expressions analysis could be replaced with imperative cleanup requests from, say, ExprEngine to the Environment. I.e., "We've just finished evaluating the if-statement, now we should actively tell the Environment to remove the condition expression".
Yes. This is what I am proposing and this is what we did with the lifetime analysis. Unfortunately, I did not have time yet to identify all the cleanup points and I do agree that this is not a trivial task but I felt that this is much more lightweight in terms of memory and computation.
I guess it could totally work that way, but it'd be pretty hard to not forget such cleanups.
I agree.
Given that we'd also barely ever notice that we forgot one of those, i'm very much in favor of having liveness analysis instead, that would declaratively describe which expressions are live when, so that to automatically guarantee that we always only track what's necessary and never snowball our state with dead expressions.
Fair point. I do not remember any problem with the current liveness analysis so I guess if something is not broken do not fix it. We also don't have any evidence of this being a bottleneck.
Note to self: rename debug.DumpLiveStmts to debug.DumpLiveExprs. Thanks for the review btw! I don't immediately have something to add to your discussion though :)
I chased my own tail for weeks before realizing that there is indeed another instance when a live statement is stored, other then ObjCForCollectionStmt...
void clang_analyzer_eval(bool); void test_lambda_refcapture() { int a = 6; [&](int &a) { a = 42; }(a); clang_analyzer_eval(a == 42); // expected-warning{{TRUE}} } // CHECK: [ B0 (live statements at block exit) ] // CHECK-EMPTY: // CHECK-EMPTY: // CHECK-NEXT: [ B1 (live statements at block exit) ] // CHECK-EMPTY: // CHECK-EMPTY: // CHECK-NEXT: [ B2 (live statements at block exit) ] // CHECK-EMPTY: // CHECK-NEXT: CompoundStmt {{.*}} // CHECK-NEXT: `-BinaryOperator {{.*}} 'int' lvalue '=' // CHECK-NEXT: |-DeclRefExpr {{.*}} 'int' lvalue ParmVar {{.*}} 'a' 'int &' // CHECK-NEXT: `-IntegerLiteral {{.*}} 'int' 42
Hmm, interesting. I don't really understand why do we need to keep that block live, as we definitely won't use any of the value it provides (since it does not provide a value at all).
I am only doing guessing here, but maybe the side effect of the statement is pruned from the store as soon as it is no longer live?
I might be wrong here, but maybe solving this problem would not be the job of the liveness analysis but we have a bug elsewhere, as this binding should be pruned when a is a dead variable.
Actually, what I said initially is true:
[...] the only non-expression statements it queried are ObjCForCollectionStmts [...]
so I think it'd be okay to simply drop this. Also, its easy to see why this popped up, because... (cont. in inline)
clang/lib/Analysis/LiveVariables.cpp | ||
---|---|---|
316–319 | ..this part of the code caused the issue. Looking at the related CFG, void test_lambda_refcapture() [B2 (ENTRY)] Succs (1): B1 [B1] 1: 6 2: int a = 6; 3: operator() 4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int &) const) 5: [&](int &a) { a = 42; } 6: [B1.5] 7: [B1.6] (ImplicitCastExpr, NoOp, const class (lambda at /home/szelethus/Documents/llvm-project/clang/test/Analysis/live-stmts.cpp:183:3)) 8: a 9: [B1.7]([B1.8]) (OperatorCall) 10: clang_analyzer_eval 11: [B1.10] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(_Bool)) 12: a 13: [B1.12] (ImplicitCastExpr, LValueToRValue, int) 14: 42 15: [B1.13] == [B1.14] 16: [B1.11]([B1.15]) Preds (1): B2 Succs (1): B0 [B0 (EXIT)] Preds (1): B1 its clear that element 5 added the live statement, and I think that that this entire CFG just simply isn't right. Shouldn't we have a distinct element for the assignment? |
clang/lib/Analysis/LiveVariables.cpp | ||
---|---|---|
316–319 |
Strictly speaking, we have CFGs for a function. The assignment is not in this function, it is in the operator() of the class representing this lambda expression. So basically, we do have a LambdaExpr to represent the expression, but the body of the lambda is in a separate entity. |
clang/lib/Analysis/LiveVariables.cpp | ||
---|---|---|
316–319 | Well, debug.DumpCFG definitely doesn't indulge me with a separate lambda CFG, so I figured this is a (rightful) optimization or compression. My point is, this entire code snippet is a seriously error prone, best-effort heuristic. Switch casing every small little corner case might be tedious, troublesome in terms of scaling, and I for sure don't want to maintain it 'til eternity, but we have to acknowledge that this isn't a perfect solution either. |
Yeah, that sounds about right; you observed the current behavior to be a counterexample but found no evidence that the current behavior makes any sense.
P.S. We've found an example where ObjCForCollectionStmts break (i.e., your recently added assertion gets hit), i'll reduce and think of a patch. Still shocked this wasn't covered with tests.
clang/lib/Analysis/LiveVariables.cpp | ||
---|---|---|
316–319 |
Does that mean that checkASTCodeBody doesn't get run on lambda bodies, and neither do any of our path-insensitive checks? |
Btw, sorry. I somehow misunderstood the snippet and tought that the test fails after the change. I am also ok with dropping this.
Ok, here's the crashing example with ObjCForCollectionStmt. It should be saved as an .mm file and it crashes under pure --analyze.
@interface Item // ... @end @interface Collection // ... @end typedef void (^Blk)(); struct RAII { Blk blk; public: RAII(Blk blk): blk(blk) {} ~RAII() { blk(); } }; void foo(Collection *coll) { RAII raii(^{}); for (Item *item in coll) {} }
The CFG ("allocate a variable, pick the item and put it into that variable, execute the body, repeat"):
The interesting part of the ExplodedGraph:
And here's the FIXME that you're looking for:
... 44 /// Generate a node in \p Bldr for an iteration statement using ObjC 45 /// for-loop iterator. 46 static void populateObjCForDestinationSet( 47 ExplodedNodeSet &dstLocation, SValBuilder &svalBuilder, 48 const ObjCForCollectionStmt *S, const Stmt *elem, SVal elementV, 49 SymbolManager &SymMgr, const NodeBuilderContext *currBldrCtx, 50 StmtNodeBuilder &Bldr, bool hasElements) { ... 56 SVal hasElementsV = svalBuilder.makeTruthVal(hasElements); 57 58 // FIXME: S is not an expression. We should not be binding values to it. 59 ProgramStateRef nextState = state->BindExpr(S, LCtx, hasElementsV); ...
So, like, the engine is conveniently assigning 0 or 1 to the collection-statement in the Environment when the collection is assumed to be empty or not.
It's obviously a hack. This shouldn't be in the Environment. This should have been a GDM trait attached to the collection. Ideally it should also be modeled, i.e. sometimes we do know whether the collection is empty, and it might even be modeled occasionally. But in any case this shouldn't be in the Environment.
Thank you so much, you really went out of your way to dig that out! I'll try to patch this somehow.
Wow, I never realized I accidentally landed that assert (D82122#2172360), but I guess its great to have that covered. Would you prefer to have that reverted as I'm looking to fix this for good?
I still wonder what made this statement live in my example. There must have been some non-trivial liveness analysis going on that caused a statement to be live; probably something to do with the C++ destructor elements.
It doesn't add any actual functionality so i think it makes sense to revert unless you have a quick fix.
Also if the assert is in fact true then i'd rather make a much stronger statement by banning statements from the Environment entirely, as in, like, at compile time by making it accept Expr * instead of Stmt *.
Pushed rG032b78a0762bee129f33e4255ada6d374aa70c71. Mind that no ObjCForCollectionStmt was found in the DumpLiveStms run.
Rename the live statements checker to live expressions checker. The test file added in a revert commit changed rather heavily, but it makes sense that these entries are removed IMO. Unless somebody objects, I intend to land this as-is.
So now all of the dependencies are landed and this should be ready for its prime time, right?
..this part of the code caused the issue. Looking at the related CFG,
its clear that element 5 added the live statement, and I think that that this entire CFG just simply isn't right. Shouldn't we have a distinct element for the assignment?