This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix liveness of LazyCompoundVals
AbandonedPublic

Authored by steakhal on Aug 19 2022, 8:20 AM.

Details

Summary

The SymbolReaper should consider a region live if that appears in
the store bindings anywhere - even if the region is wrapped by
RegionSymbolVals, SymbolicRegions or LazyCompoundVals.

This mistake by the SymbolReaper materialized in false-positives like
in the following example:

void ptr1(List* n) {
  List* n2 = new List(*n); // cctor
  if (!n->next) {
    if (n2->next) {
      clang_analyzer_warnIfReached(); // FP! This should be unreachable.
    }
  }
  delete n2;
}

The store entry of the pointee of n2 looks something like this:

HeapSymRegion{conj_$1{List *, LC1, S2540, #1}}  0(Default)  lazyCompoundVal{0x5641f1ed8830,Element{SymRegion{reg_$2<List * n>},0 S64b,struct List}}

So, any constraints referring to the VarRegion n supposed to be kept alive by that store binding.
Hence, the analyzer should be able to see that reg_$3<List * Element{SymRegion{reg_$2<List * n>},0 S64b,struct List}.next> { [0, 0] } when it reaches the n2->next.

This patch fixes the issue by reusing the SymbolReaper::markLive(MemRegion) for doing the appropriate traversal.

I'm yet to do the performance testing, given that this is actually in
the hot path.

Co-authored-by: Tomasz Kamiński <tomasz.kaminski@sonarsource.com>

Diff Detail

Event Timeline

steakhal created this revision.Aug 19 2022, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 8:20 AM
steakhal requested review of this revision.Aug 19 2022, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 8:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
NoQ added a comment.Aug 24 2022, 3:57 PM

Nice catch but I don't think this is the right solution.

Symbol liveness corresponds to a concrete runtime property of the program: can the program obtain this value by executing further runtime instructions? The program cannot obtain a pointer to variable n, or even a pointer to a region *n, just by looking at the data in a struct that was copied from region *n previously. Therefore neither n nor *n should not be marked live just because they are mentioned in a base region of a lazy compound value. In particular, memory leaks should be reported against n if the last place it's mentioned is a base region of a lazy compound value, as the program cannot possibly free that memory. You can most likely write a MallocChecker test for that, which will become a false negative after your patch.

For the same reason, for any region R, if reg<R> or derived<R, $x> is live, does not mean R is live. The program cannot guess a memory address by only looking at a value loaded from that address.

If you look at how a simpler example works:

void clang_analyzer_warnIfReached();

void foo(int **n) {
  int *n2 = *n;
  if (!**n) {
    if (*n2) {
      clang_analyzer_warnIfReached();
    }
  }
}

You will notice that it does not try to keep VarRegion n or even the symbol reg_$0<int ** n> alive. It is *sufficient* to keep reg_$1<int * Element{SymRegion{reg_$0<int ** n>},0 S64b,int *}> alive in order to keep the test passing, and it does match the actual definition of live symbol (reg_$0 can no longer be retrieved by the program but reg_$1 can).

The original code tries to do the same for lazy compound values. I suspect that you'll find the actual bug when you descend into getInterestingValues().

What looks fishy about getInterestingValues() is that it assumes that the amount of interesting values is finite. This sounds incredibly wrong to me. If a lazy compound value contains any pointer symbol $p, then all values in the following infinite series are interesting:

$p,  *$p,  **$p,  ***$p,  ...

So I think the correct solution would be to maintain a list of explicitly-live compound values. Just like we already maintain a list of explicitly live symbols - TheLiving, and explicitly live regions - RegionRoots; neither of these lists enumerates all live symbols or regions as there are infinitely many of them, but each of these lists enumerates the ones that we've learned to be definitely live, and they're sufficient to determine liveness of any other symbol through recursive descent into the identity of that symbol. So with a list of live compound value regions, we can ask a question like "is this symbol stored in any of the known-live compound values?" to determine that symbol's liveness. This will keep the stored symbol alive without making the lazy compound value's base region alive.

NoQ added a comment.Aug 24 2022, 6:43 PM

So to be clear, I think your solution is probably an ok stop-gap solution. False negatives aren't that bad, I'm more worried that existing leak reports may become less understandable because they'll be reported later than necessary, which may obscure the reason why we think they're finally leaking. But I really want us to agree on how this facility is supposed to work, and explore the perfect solution, before implementing an imperfect solution.

Maybe we should introduce "weak region roots" that aren't necessarily live themselves but anything derived from them (any SymbolRegionValue<R'> or SymbolDerived<R', $x> for arbitrary sub-region R' of a weak-region-root R) is live. This could be pretty straightforward.

Nice catch but I don't think this is the right solution.

Symbol liveness corresponds to a concrete runtime property of the program: can the program obtain this value by executing further runtime instructions? The program cannot obtain a pointer to variable n, or even a pointer to a region *n, just by looking at the data in a struct that was copied from region *n previously. Therefore neither n nor *n should not be marked live just because they are mentioned in a base region of a lazy compound value. In particular, memory leaks should be reported against n if the last place it's mentioned is a base region of a lazy compound value, as the program cannot possibly free that memory. You can most likely write a MallocChecker test for that, which will become a false negative after your patch.

For the same reason, for any region R, if reg<R> or derived<R, $x> is live, does not mean R is live. The program cannot guess a memory address by only looking at a value loaded from that address.

If you look at how a simpler example works:

void clang_analyzer_warnIfReached();

void foo(int **n) {
  int *n2 = *n;
  if (!**n) {
    if (*n2) {
      clang_analyzer_warnIfReached();
    }
  }
}

You will notice that it does not try to keep VarRegion n or even the symbol reg_$0<int ** n> alive. It is *sufficient* to keep reg_$1<int * Element{SymRegion{reg_$0<int ** n>},0 S64b,int *}> alive in order to keep the test passing, and it does match the actual definition of live symbol (reg_$0 can no longer be retrieved by the program but reg_$1 can).

Nice explanation, thank you very much! I'll need some time to digest it to the full extent.
It makes sense so far, but I imagine it will become clear when I continue the investigation.

The original code tries to do the same for lazy compound values. I suspect that you'll find the actual bug when you descend into getInterestingValues().

What looks fishy about getInterestingValues() is that it assumes that the amount of interesting values is finite. This sounds incredibly wrong to me. If a lazy compound value contains any pointer symbol $p, then all values in the following infinite series are interesting:

$p,  *$p,  **$p,  ***$p,  ...

Well, there two places where I suspected this bug. The place I changed and the other was getInterestingValues(). The first looked more probable, hence we sticked to that.
We'll continue the investigation with the getInterestingValues() next time.

So I think the correct solution would be to maintain a list of explicitly-live compound values. Just like we already maintain a list of explicitly live symbols - TheLiving, and explicitly live regions - RegionRoots; neither of these lists enumerates all live symbols or regions as there are infinitely many of them, but each of these lists enumerates the ones that we've learned to be definitely live, and they're sufficient to determine liveness of any other symbol through recursive descent into the identity of that symbol. So with a list of live compound value regions, we can ask a question like "is this symbol stored in any of the known-live compound values?" to determine that symbol's liveness. This will keep the stored symbol alive without making the lazy compound value's base region alive.

Okay, I'll continue from here. Thanks for the

So to be clear, I think your solution is probably an ok stop-gap solution. False negatives aren't that bad, I'm more worried that existing leak reports may become less understandable because they'll be reported later than necessary, which may obscure the reason why we think they're finally leaking. But I really want us to agree on how this facility is supposed to work, and explore the perfect solution, before implementing an imperfect solution.

Unfortunately, we found many FPs introduced by this change; so this should be definitely blocked.

Maybe we should introduce "weak region roots" that aren't necessarily live themselves but anything derived from them (any SymbolRegionValue<R'> or SymbolDerived<R', $x> for arbitrary sub-region R' of a weak-region-root R) is live. This could be pretty straightforward.

So, what you are suggesting here is an alternative solution to the one you already proposed in your last comment?

For transparency, we, unfortunately, reached the end of our timebox, and I'll work on something else for a couple of weeks.
I plan to come back to this in ~3 weeks if everything works well.

NoQ added a comment.Aug 26 2022, 10:17 AM

So, what you are suggesting here is an alternative solution to the one you already proposed in your last comment?

I was just expanding the initial suggestion but also symbol reaper is too complicated for me to make such suggestions confidently. So I'm just sharing some ideas but you'll probably have to dig deeper to understand whether they're correct.

Unfortunately, we found many FPs introduced by this change; so this should be definitely blocked.

Interesting, I didn't expect that. It could be insanely valuable to make tests out of those so that future generations didn't make the same mistake! Symbol reaper seems to be very under-tested for its complexity.

steakhal planned changes to this revision.Aug 26 2022, 10:56 AM

So, what you are suggesting here is an alternative solution to the one you already proposed in your last comment?

I was just expanding the initial suggestion but also symbol reaper is too complicated for me to make such suggestions confidently. So I'm just sharing some ideas but you'll probably have to dig deeper to understand whether they're correct.

I see. I very much appreciate any suggestions. Many thanks.

Unfortunately, we found many FPs introduced by this change; so this should be definitely blocked.

Interesting, I didn't expect that. It could be insanely valuable to make tests out of those so that future generations didn't make the same mistake! Symbol reaper seems to be very under-tested for its complexity.

I agree. We were also surprised. I havent spent the time to understand the root cause, as only a single path resulted in a 160Mb+ html. So far I can tell that it messed up something including lambdas capturing by reference. I find it always difficult to catch "what is missing" from the puzzle in such cases. Its much easier to catch if something has a wrong binding etc. We'll see how much effort it takes to reduce. For FPs I dont know how to automate this process. :/

Lets see when we get back to this.

NoQ added a comment.Aug 26 2022, 8:48 PM

For FPs I dont know how to automate this process. :/

Given that our initial hypothesis was that there should be zero new false positives, it could probably work with a creduce predicate "this code has a warning after the patch but not before the patch". The initial hypothesis is probably not entirely correct, given that even a slight change in coverage could result in false positives, but I doubt that if you pick 4-5 examples, all of them would reduce to a situation where it's just a change of coverage. Especially given how common you say the problem appears to be. So I think it's worth a try.

steakhal abandoned this revision.Sep 30 2022, 3:34 AM

I'm abandoning this change in favor of D134947.
I'll leave the patch summary and the discussion here for the history.


For FPs I dont know how to automate this process. :/

Given that our initial hypothesis was that there should be zero new false positives, it could probably work with a creduce predicate "this code has a warning after the patch but not before the patch". The initial hypothesis is probably not entirely correct, given that even a slight change in coverage could result in false positives, but I doubt that if you pick 4-5 examples, all of them would reduce to a situation where it's just a change of coverage. Especially given how common you say the problem appears to be. So I think it's worth a try.

D134941 addresses this comment. I'm also about to investigate other cases with differences - but no promises about that.

What looks fishy about getInterestingValues() is that it assumes that the amount of interesting values is finite. This sounds incredibly wrong to me. If a lazy compound value contains any pointer symbol $p, then all values in the following infinite series are interesting:

$p,  *$p,  **$p,  ***$p,  ...

We have also looked into this, and indeed the getInterestingValues() produces the first level of indirection for storage. However, the code in RemoveDeadBindingsWorker is recursively visiting each of the bindings found, so, at least per our understanding, we should visit all regions that are reachable through indirection:

const RegionStoreManager::SValListTy &Vals = RM.getInterestingValues(*LCS);
for (RegionStoreManager::SValListTy::const_iterator I = Vals.begin(),
                                                    E = Vals.end();
     I != E; ++I)
  VisitBinding(*I);

Also, visiting these regions in a current snapshot of the storage seems correct from the temporal perspective - we can reach current state in region, via the pointer to it, that was present at the time of copy.