This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][Liveness][NFC] Get rid of statement liveness, because such a thing doesn't exist
ClosedPublic

Authored by Szelethus on Jun 25 2020, 1:43 PM.

Details

Summary

The summary and very short discussion in D82122 summarizes whats happening here.

Diff Detail

Event Timeline

Szelethus created this revision.Jun 25 2020, 1:43 PM

I didn't run this on big projects just yet, and judging from the fact that Environment only contained ObjetiveC examples of non-expression statements, I might need a tip on how to test on ObjC code :)

I always wondered if we actually need to track the liveness of exprs at all. We could just kill all subexpr at the end of the full expression without precomputing anything. But I might miss something.

NoQ accepted this revision.Jun 25 2020, 3:41 PM

We could just kill all subexpr at the end of the full expression

I suspect that it would be pretty bad if you, say, kill the condition of the if-statement before picking the branch. Or kill the initializer in the DeclStmt before putting it into the variable (same for CXXCtorInitializer which isn't even a Stmt!).

I might need a tip on how to test on ObjC code :)

Unfortunately i don't have any helpful tips for you :( Such code is not cross-compiled very often. I vaguely remember seeing some Objective-C code in linux software but it's most likely not very modern/representative. So i guess it's on us to inform you of the potential problems.

This revision is now accepted and ready to land.Jun 25 2020, 3:41 PM
In D82598#2115656, @NoQ wrote:

We could just kill all subexpr at the end of the full expression

I suspect that it would be pretty bad if you, say, kill the condition of the if-statement before picking the branch. Or kill the initializer in the DeclStmt before putting it into the variable (same for CXXCtorInitializer which isn't even a Stmt!).

I would argue that the end of the full expression is AFTER the if was evaluated in this case. But I do see what you mean, thanks :)

NoQ added a comment.Jun 26 2020, 10:46 AM

I suspect that it would be pretty bad if you, say, kill the condition of the if-statement before picking the branch. Or kill the initializer in the DeclStmt before putting it into the variable (same for CXXCtorInitializer which isn't even a Stmt!).

I would argue that the end of the full expression is AFTER the if was evaluated in this case. But I do see what you mean, thanks :)

Well, formally it isn't. The notion of full-expression is defined pretty strictly because that's the moment of time when temporary destructors are invoked. So the language pays a lot of attention to what exactly constitutes an expression, and as of now if-statements don't. In particular, temporary destructors will run before the branch is chosen.

I guess your point is that live expressions analysis could be replaced with imperative cleanup requests from, say, ExprEngine to the Environment. I.e., "We've just finished evaluating the if-statement, now we should actively tell the Environment to remove the condition expression". I guess it could totally work that way, but it'd be pretty hard to not forget such cleanups. Given that we'd also barely ever notice that we forgot one of those, i'm very much in favor of having liveness analysis instead, that would declaratively describe which expressions are live when, so that to automatically guarantee that we always only track what's necessary and never snowball our state with dead expressions.

xazax.hun accepted this revision.Jun 29 2020, 3:37 AM

@Szelethus
The patch looks good to me and I find it a welcome change that should make things easier to understand and maybe even a bit more efficient, I hope this won't break ObjC :) My discussion with Artem is orthogonal to this change.

In D82598#2117432, @NoQ wrote:

Well, formally it isn't.

Right. Sorry for being unclear there.

I guess your point is that live expressions analysis could be replaced with imperative cleanup requests from, say, ExprEngine to the Environment. I.e., "We've just finished evaluating the if-statement, now we should actively tell the Environment to remove the condition expression".

Yes. This is what I am proposing and this is what we did with the lifetime analysis. Unfortunately, I did not have time yet to identify all the cleanup points and I do agree that this is not a trivial task but I felt that this is much more lightweight in terms of memory and computation.

I guess it could totally work that way, but it'd be pretty hard to not forget such cleanups.

I agree.

Given that we'd also barely ever notice that we forgot one of those, i'm very much in favor of having liveness analysis instead, that would declaratively describe which expressions are live when, so that to automatically guarantee that we always only track what's necessary and never snowball our state with dead expressions.

Fair point. I do not remember any problem with the current liveness analysis so I guess if something is not broken do not fix it. We also don't have any evidence of this being a bottleneck.

Note to self: rename debug.DumpLiveStmts to debug.DumpLiveExprs. Thanks for the review btw! I don't immediately have something to add to your discussion though :)

NoQ added a comment.Jun 29 2020, 8:35 AM

Given that we'd also barely ever notice that we forgot one of those, i'm very much in favor of having liveness analysis instead, that would declaratively describe which expressions are live when, so that to automatically guarantee that we always only track what's necessary and never snowball our state with dead expressions.

Fair point. I do not remember any problem with the current liveness analysis so I guess if something is not broken do not fix it. We also don't have any evidence of this being a bottleneck.

D55566!

I chased my own tail for weeks before realizing that there is indeed another instance when a live statement is stored, other then ObjCForCollectionStmt...

void clang_analyzer_eval(bool);

void test_lambda_refcapture() {
  int a = 6;
  [&](int &a) { a = 42; }(a);
  clang_analyzer_eval(a == 42); // expected-warning{{TRUE}}
}

// CHECK: [ B0 (live statements at block exit) ]
// CHECK-EMPTY:
// CHECK-EMPTY:
// CHECK-NEXT: [ B1 (live statements at block exit) ]
// CHECK-EMPTY:
// CHECK-EMPTY:
// CHECK-NEXT: [ B2 (live statements at block exit) ]
// CHECK-EMPTY:
// CHECK-NEXT: CompoundStmt {{.*}}
// CHECK-NEXT: `-BinaryOperator {{.*}} 'int' lvalue '='
// CHECK-NEXT:   |-DeclRefExpr {{.*}} 'int' lvalue ParmVar {{.*}} 'a' 'int &'
// CHECK-NEXT:   `-IntegerLiteral {{.*}} 'int' 42

I chased my own tail for weeks before realizing that there is indeed another instance when a live statement is stored, other then ObjCForCollectionStmt...

void clang_analyzer_eval(bool);

void test_lambda_refcapture() {
  int a = 6;
  [&](int &a) { a = 42; }(a);
  clang_analyzer_eval(a == 42); // expected-warning{{TRUE}}
}

Hmm, interesting. I don't really understand why do we need to keep that block live, as we definitely won't use any of the value it provides (since it does not provide a value at all).

I am only doing guessing here, but maybe the side effect of the statement is pruned from the store as soon as it is no longer live?

I might be wrong here, but maybe solving this problem would not be the job of the liveness analysis but we have a bug elsewhere, as this binding should be pruned when a is a dead variable.

Szelethus marked an inline comment as done.Jul 21 2020, 8:26 AM

I chased my own tail for weeks before realizing that there is indeed another instance when a live statement is stored, other then ObjCForCollectionStmt...

void clang_analyzer_eval(bool);

void test_lambda_refcapture() {
  int a = 6;
  [&](int &a) { a = 42; }(a);
  clang_analyzer_eval(a == 42); // expected-warning{{TRUE}}
}

Hmm, interesting. I don't really understand why do we need to keep that block live, as we definitely won't use any of the value it provides (since it does not provide a value at all).

Actually, what I said initially is true:

[...] the only non-expression statements it queried are ObjCForCollectionStmts [...]

so I think it'd be okay to simply drop this. Also, its easy to see why this popped up, because... (cont. in inline)

clang/lib/Analysis/LiveVariables.cpp
316–319

..this part of the code caused the issue. Looking at the related CFG,

void test_lambda_refcapture()
 [B2 (ENTRY)]
   Succs (1): B1

 [B1]
   1: 6
   2: int a = 6;
   3: operator()
   4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int &) const)
   5: [&](int &a) {
    a = 42;
}
   6: [B1.5]
   7: [B1.6] (ImplicitCastExpr, NoOp, const class (lambda at /home/szelethus/Documents/llvm-project/clang/test/Analysis/live-stmts.cpp:183:3))
   8: a
   9: [B1.7]([B1.8]) (OperatorCall)
  10: clang_analyzer_eval
  11: [B1.10] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(_Bool))
  12: a
  13: [B1.12] (ImplicitCastExpr, LValueToRValue, int)
  14: 42
  15: [B1.13] == [B1.14]
  16: [B1.11]([B1.15])
   Preds (1): B2
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

its clear that element 5 added the live statement, and I think that that this entire CFG just simply isn't right. Shouldn't we have a distinct element for the assignment?

xazax.hun added inline comments.Jul 21 2020, 8:57 AM
clang/lib/Analysis/LiveVariables.cpp
316–319

Shouldn't we have a distinct element for the assignment?

Strictly speaking, we have CFGs for a function. The assignment is not in this function, it is in the operator() of the class representing this lambda expression.

So basically, we do have a LambdaExpr to represent the expression, but the body of the lambda is in a separate entity.

Szelethus marked an inline comment as done.Jul 21 2020, 9:47 AM
Szelethus added inline comments.
clang/lib/Analysis/LiveVariables.cpp
316–319

Well, debug.DumpCFG definitely doesn't indulge me with a separate lambda CFG, so I figured this is a (rightful) optimization or compression.

My point is, this entire code snippet is a seriously error prone, best-effort heuristic. Switch casing every small little corner case might be tedious, troublesome in terms of scaling, and I for sure don't want to maintain it 'til eternity, but we have to acknowledge that this isn't a perfect solution either.

NoQ added a comment.Jul 21 2020, 5:54 PM

Hmm, interesting. I don't really understand why do we need to keep that block live, as we definitely won't use any of the value it provides (since it does not provide a value at all).

Actually, what I said initially is true:

[...] the only non-expression statements it queried are ObjCForCollectionStmts [...]

so I think it'd be okay to simply drop this.

Yeah, that sounds about right; you observed the current behavior to be a counterexample but found no evidence that the current behavior makes any sense.

P.S. We've found an example where ObjCForCollectionStmts break (i.e., your recently added assertion gets hit), i'll reduce and think of a patch. Still shocked this wasn't covered with tests.

clang/lib/Analysis/LiveVariables.cpp
316–319

Well, debug.DumpCFG definitely doesn't indulge me with a separate lambda CFG

Does that mean that checkASTCodeBody doesn't get run on lambda bodies, and neither do any of our path-insensitive checks?

Ouch. Let me know how severe this is, because this is a big milestone in my project.

Actually, what I said initially is true:

[...] the only non-expression statements it queried are ObjCForCollectionStmts [...]

Btw, sorry. I somehow misunderstood the snippet and tought that the test fails after the change. I am also ok with dropping this.

NoQ added a comment.Jul 23 2020, 10:18 PM

Ok, here's the crashing example with ObjCForCollectionStmt. It should be saved as an .mm file and it crashes under pure --analyze.

@interface Item
// ...
@end

@interface Collection
// ...
@end

typedef void (^Blk)();

struct RAII {
  Blk blk;

public:
  RAII(Blk blk): blk(blk) {}
  ~RAII() { blk(); }
};

void foo(Collection *coll) {
  RAII raii(^{});
  for (Item *item in coll) {}
}

The CFG ("allocate a variable, pick the item and put it into that variable, execute the body, repeat"):

The interesting part of the ExplodedGraph:

And here's the FIXME that you're looking for:

...
44 /// Generate a node in \p Bldr for an iteration statement using ObjC
45 /// for-loop iterator.
46 static void populateObjCForDestinationSet(
47     ExplodedNodeSet &dstLocation, SValBuilder &svalBuilder,
48     const ObjCForCollectionStmt *S, const Stmt *elem, SVal elementV,
49     SymbolManager &SymMgr, const NodeBuilderContext *currBldrCtx,
50     StmtNodeBuilder &Bldr, bool hasElements) {
...
56     SVal hasElementsV = svalBuilder.makeTruthVal(hasElements);
57
58     // FIXME: S is not an expression. We should not be binding values to it.
59     ProgramStateRef nextState = state->BindExpr(S, LCtx, hasElementsV);
...

So, like, the engine is conveniently assigning 0 or 1 to the collection-statement in the Environment when the collection is assumed to be empty or not.

It's obviously a hack. This shouldn't be in the Environment. This should have been a GDM trait attached to the collection. Ideally it should also be modeled, i.e. sometimes we do know whether the collection is empty, and it might even be modeled occasionally. But in any case this shouldn't be in the Environment.

Thank you so much, you really went out of your way to dig that out! I'll try to patch this somehow.

Wow, I never realized I accidentally landed that assert (D82122#2172360), but I guess its great to have that covered. Would you prefer to have that reverted as I'm looking to fix this for good?

NoQ added a comment.Jul 24 2020, 8:42 AM

I still wonder what made this statement live in my example. There must have been some non-trivial liveness analysis going on that caused a statement to be live; probably something to do with the C++ destructor elements.

Wow, I never realized I accidentally landed that assert (D82122#2172360), but I guess its great to have that covered. Would you prefer to have that reverted as I'm looking to fix this for good?

It doesn't add any actual functionality so i think it makes sense to revert unless you have a quick fix.

Also if the assert is in fact true then i'd rather make a much stronger statement by banning statements from the Environment entirely, as in, like, at compile time by making it accept Expr * instead of Stmt *.

In D82598#2172416, @NoQ wrote:

I still wonder what made this statement live in my example. There must have been some non-trivial liveness analysis going on that caused a statement to be live; probably something to do with the C++ destructor elements.

Pushed rG032b78a0762bee129f33e4255ada6d374aa70c71. Mind that no ObjCForCollectionStmt was found in the DumpLiveStms run.

Szelethus updated this revision to Diff 291231.EditedSep 11 2020, 8:40 AM

Rename the live statements checker to live expressions checker. The test file added in a revert commit changed rather heavily, but it makes sense that these entries are removed IMO. Unless somebody objects, I intend to land this as-is.

xazax.hun accepted this revision.Sep 11 2020, 10:34 AM

So now all of the dependencies are landed and this should be ready for its prime time, right?