This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Track terminator conditions on which a tracked expressions depends
ClosedPublic

Authored by Szelethus on Jun 4 2019, 3:51 PM.

Details

Summary

This patch implements the idea discussed on the mailing list, in fact, the included testfile contains the functions example_1 and example_2 exactly how it's described there.

The idea is to, as the title says, to track the value of the condition of the terminator statement on which the reported node depends on:

01 int flag;
02 bool coin();
03 
04 void foo() {
05   flag = coin(); // no note
06 }
07 
08 int main() {
09   int *x = 0; // x initialized to 0
10   flag = 1;
11   foo();
12   if (flag) // assumed false
13     x = new int;
14   foo();
15 
16   if (flag) // assumed true
17     *x = 5; // warn
18 }

We emit a warning at statement 17. The new BugReporter visitor figures out that statement 16 is in fact a control dependency if the reported node, and uses trackExpressionValue() to track it's condition, in this case, flag, resulting in new notes being placed at for the call to foo() on line 14 and a note about flag being invalidated on line 5.

Now, whether this change is any good is practically impossible to tell without evaluation on production code, so I'll get back with that once I gather some data.

Diff Detail

Event Timeline

Szelethus created this revision.Jun 4 2019, 3:51 PM
Szelethus marked an inline comment as done.Jun 4 2019, 5:14 PM

For the test files where plist files have changed: (note to self: there has got to be a better way to show this off)




(same for the .*objcpp file)




clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2039–2040

@NoQ Any feelings on this?

NoQ added a comment.Jun 4 2019, 7:40 PM

Yay, this thing really works!

Now, whether this change is any good is practically impossible to tell without evaluation on production code, so I'll get back with that once I gather some data.

Yes. This patch deserves some data on how much have reports grown in length (i.e., how many have grown in length and how much did they do so). Ideally we should also have a look at how many of the new notes were truly useful. Eg., you can start with the old report, understand it (or fail to do so), then take the new report. If you get this feeling that's like "ugh, if i've seen this note before i would have figured this out much sooner!", then it's a good note. If the bug report was obvious to begin with and the note didn't add much, it's probably not a good note. But it's not necessarily bad as long as the report remains readable.

Additionally, it's likely that some reports will in fact get suppressed due to inline defensive checks suppressions kicking in over condition values. That'd be a lot of fun to see as well.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2039–2040

This makes some sense in the long run but i think you should give up here for now. Unlike CXXForRangeStmt, its Objective-C counterpart doesn't mock up the AST that would have made it look like a regular for-loop, so there just isn't an AST node that would represent the part of it that corresponds to "__begin != __end".

Even in the C++ case, i'm not sure your current behavior would make much sense. We should probably delegate this work to a checker that knows how collections work and what makes them empty or have a certain size, something similar to what the IteratorChecker seems to be becoming :)

Should we give mark the report as invalid when we give up here? I've no idea, i guess i'll have to gather more empirical evidence on that.

2070–2073

What if they're in the same function but in different stack frames, i.e. due to recursion? I think you should just compare stack frame contexts here.

2075–2077

I didn't really understand this part.

clang/test/Analysis/diagnostics/no-store-func-path-notes.m
26 ↗(On Diff #203040)

I suspect we'll have to play with the wording a little bit. The current trackExpressionValue() was worded for the scenario in which there's only one value that we're tracking. So it keeps saying things like "THE VALUE" as if we're focused on one value. In reality, however, "the value 0" isn't the same value as the "undefined or garbage value" that the report is about.

I think we should discriminate between these values somehow. Eg., "Passing the condition value 0 via 2nd parameter 'param'". We have this information when we're attaching the visitor, so we can keep passing it around.

clang/test/Analysis/track-control-dependency-conditions.cpp
35

Why?

xazax.hun requested changes to this revision.Jun 5 2019, 2:35 AM
xazax.hun added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
176 ↗(On Diff #203040)

I think we should make clear that this visitor only operates within one function and does not track controll dependencies across functions.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
744

What was the reason of adding these begin/ends? I prefer to keep such refactorings separate, also it might be controversial whether this change is desired.

2039–2040

I vaguely recall some problem with the results of getTerminatorStmt for logical operators like &&. I believe what we really want is the last expression we evaluated in the basic block which will be the last Stmt of the basic block. So if we can settle with the last stmt we can get rid of this code.

2075–2077

I guess detecting control depdendency only makes sense for the last statement of the basic block (which is the first one we see in the visitor order). But instead of maintaining a state with the visited nodes we could also check if the statement is the last one of its basic block.

This revision now requires changes to proceed.Jun 5 2019, 2:35 AM

I think we should make clear that this visitor only operates within one function and does not track controll dependencies across functions.

Hmm, maybe it's worth investigating whether if we pop the call stack, does the bug report improve if we update the visitor to build control dependencies for the recent function, and update the origin block to be the one where the function call was located.

Szelethus added a comment.EditedJun 10 2019, 2:38 PM

I implemented the fixes you commented on, but during evaluation it turned out that my visitor eats ram for breakfast, and then goes for seconds. I mean, like 5-30x the normal memory consumption, and the same for analysis speed. I counted the number of concurrent instances (incrementing a static counter in ctor, decrementing in dtor), and it seems to go up into the hundreds of thousands, even millions before restarting the counter. I killed the process before crippling the server.

I'm not sure how long it'll take for me to figure out what's wrong, but these numbers are so ridiculous, I suspect a programming error rather then an algorithmic issue.

edit: I seem to have found a solution, I'll share more later when I get to evaluate this :)

NoQ added a comment.Jun 10 2019, 6:59 PM

I'm not sure how long it'll take for me to figure out what's wrong, but these numbers are so ridiculous, I suspect a programming error rather then an algorithmic issue.

edit: I seem to have found a solution, I'll share more later when I get to evaluate this :)

At a glance, it might be that bugreporter::trackExpressionValue() is called on every node within the block, rather than only once in a lifetime of the visitor.

Szelethus updated this revision to Diff 204554.EditedJun 13 2019, 8:42 AM
  • Resolved some reviewer comments
  • Added a BugReport level set to avoid tracking the same condition (which would result in an almost infinite loop)

Aaaand I have some results to show: http://cc.elte.hu:15001/Default/#

Sort by "Storage date", and look for "LLVM/Clang/Clang-tools-extra BEFORE tracking conditions" and "LLVM/Clang/Clang-tools-extra AFTER tracking conditions". I didn't have much time to draw conclusions yet, but it's clear that the amount of extra notes are intolerable.

Also, this visitor should be TU-local, I could have better comments, formatting, etc, but decided to update the patch because this is the version I used for evaluation.

Szelethus marked 6 inline comments as done.Jun 13 2019, 8:45 AM
Szelethus added inline comments.
clang/test/Analysis/track-control-dependency-conditions.cpp
35

I'll remove it in the next diff *pinky swear*.

I marked reports, confusingly, "Confirmed" if the extra notes were meaningful, "Intentional" if they were meaningless, and "False positive" if it's in between.

Szelethus added a comment.EditedJun 13 2019, 5:21 PM

I took a look at the results. You can take a look at selected ones here:

http://cc.elte.hu:15001/Default/#report-hash=&run=LLVM%2FClang%2FClang-tools-extra%20AFTER%20tracking%20conditions&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%2FClang-tools-extra%20AFTER%20tracking%20conditions&is-unique=on

Due to the lack of better categorization, I used the following (so this has in fact no relation to whether the report is correct or not):

  • Intentional reports (green x) are the ones where the bugreport got unquestionably worse.
  • Confirmed reports (red check mark) got definitely better
  • False positives (grey line) are in between.

If you hover your mouse over the icon, you can also read a comment of mine. When you open a report, in the right corner of the source code you'll find a dropdown menu, allowing you to see the report without this patch applied:


There, you can observe the original bug report without this patch.

Some conclusions:

  • Cases where the condition is also a variable initialization tend to expand the bug path greatly. This isn't always bad, but should be noted. In general, unless we have a good heuristic to figure out whether there is meaningful information on the right hand side of the initialization, I think we just shouldn't track these. A good example for this is this one. The 37th event contains pretty much every information we need, it's obvious that the optional could either be None or non-None, since it's in a condition. dyn_cast is a prime example too: in this case, note 9 is all we really need.
  • We should probably believe that operator bool() is implemented correctly, and shouldn't track the value all the way there (at least, when we're only tracking the condition). Example
  • We shouldn't ever track assert-like conditions. Example (note 38-41)
  • When the report didn't suffer from any of the above issues, I found the extra notes to be helpful! :D

I'm doing another analysis to help a bit on evaluation with the following modification:

diff --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index 9c40ebead56..4ac4b873675 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -1892,12 +1892,28 @@ TrackControlDependencyCondBRVisitor::VisitNode(const ExplodedNode *N,
   if (!OriginB || !NB)
     return nullptr;
 
-  if (ControlDepTree.isControlDependency(OriginB, NB))
-    if (const Expr *Condition = getTerminatorCondition(NB))
-      if (BR.addTrackedCondition(Condition))
+  if (ControlDepTree.isControlDependency(OriginB, NB)) {
+    if (const Expr *Condition = getTerminatorCondition(NB)) {
+      if (BR.addTrackedCondition(Condition)) {
         bugreporter::trackExpressionValue(
             N, Condition, BR, /*EnableNullFPSuppression=*/false);
 
+        if (BRC.getAnalyzerOptions().AnalysisDiagOpt == PD_NONE)
+          return nullptr;
+
+        std::string ConditionText = Lexer::getSourceText(
+            CharSourceRange::getTokenRange(Condition->getSourceRange()),
+                                           BRC.getSourceManager(),
+                                           BRC.getASTContext().getLangOpts());
+
+        return std::make_shared<PathDiagnosticEventPiece>(
+            PathDiagnosticLocation::createEnd(
+                Condition, BRC.getSourceManager(), N->getLocationContext()),
+                "Tracking condition " + ConditionText);
+      }
+    }
+  }
+
   return nullptr;
 }
NoQ added a comment.Jun 13 2019, 11:01 PM

*sloowly catches up >.<*

sanaanajjar231288 changed the visibility from "Public (No Login Required)" to "All Users".Jun 14 2019, 5:31 AM

I looked at bug path with the Optional. I did not debug into the analyzer but my intuition is the following: DefChainEnd is interesting. TerminatedPaths is the control dependency of DefChainEnd. And there are probably other things that are the control and/or value dependency of TerminatedPaths . One trick we could try is to limit the number of dependencies we track. For example, we could try what happens if we only track the immediate control dependencies of the original interesting region/symbol and does not add more control dependency recursively.

In case we find limiting the number of dependencies useful, I could imagine a user-facing flag like report-verbosity. The default one could include only the notes we find useful after limiting the tracking. A detailed option could add all the notes from the transitive tracking of dependencies. A verbose option could remove path pruning as well. This is just random brainstorming :) First, we need to so whether non-transitive tracking it is actually improving the situation.

I am not sure about assuming operator bool being correct. I think we first could think about other tricks to limit the tracking (see my idea above) and if we fail I would only add such rules as a last resort.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
157

Do we need this? I wonder if marking the result of the condition as interesting is sufficient.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2009

Why do we need a ptr to non-const CFGBlock?

2054

Maybe this is the worng place to comment, but isControlDependency sounds a bit weird. How about isControlDependenent?

Szelethus changed the visibility from "All Users" to "Public (No Login Required)".EditedJun 14 2019, 7:54 AM

I don't understand why the visibility had to be changed (especially without explanation), and unless there's a good reason for it, I strongly disagree with it.

NoQ added a comment.Jun 14 2019, 1:50 PM

I am not sure about assuming operator bool being correct. I think we first could think about other tricks to limit the tracking (see my idea above) and if we fail I would only add such rules as a last resort.

I think this depends greatly on the stack frame layout. I seem to be perfectly fine with not tracking the value within foo() in

if (foo()) {
  ...
}

whenever foo() returns a bool. Simply because "why else would you add a check if it always returns the same value?".

However, this heuristic breaks when the same code appears in an inlined stack frame. Because given the context, we need to prove that this check makes sense *in this context*.

I strongly suspect that all of the heuristics that you're talking about depend significantly on the surrounding call stacks. Like, i suspect it so strongly that i'm ready to admit that i've no idea what we're doing anymore, and it makes sense to take a pen and a sheet of paper and try to write down the tracking rules, and then come up with an easily editable and extensible architecture for tweaking those rules as we discover more and more of them.

NoQ added a comment.Jun 14 2019, 2:09 PM

Some conclusions:

  • Cases where the condition is also a variable initialization tend to expand the bug path greatly. This isn't always bad, but should be noted. In general, unless we have a good heuristic to figure out whether there is meaningful information on the right hand side of the initialization, I think we just shouldn't track these. A good example for this is this one. The 37th event contains pretty much every information we need, it's obvious that the optional could either be None or non-None, since it's in a condition. dyn_cast is a prime example too: in this case, note 9 is all we really need.
  • We should probably believe that operator bool() is implemented correctly, and shouldn't track the value all the way there (at least, when we're only tracking the condition). Example
  • We shouldn't ever track assert-like conditions. Example (note 38-41)
  • When the report didn't suffer from any of the above issues, I found the extra notes to be helpful! :D

On the other hand, all of these problems seem to be examples of the problem of D62978. Might it be that it's the only thing that we're missing? Like, we need to formulate a rule that'll give us an understanding of when do we track the value past the point where it has been constrained to true or false - do we care about the origins of the value or do we care about its constraints only? In case of flag in the test examples, we clearly do. In case of these bools that come from boolean conversions and assertions and such, we clearly don't. What's the difference?

clang/test/Analysis/track-control-dependency-conditions.cpp
67

It's invalidation of globals by coin() which potentially writes to bar.

81–82

These notes should be in the opposite order. It doesn't matter for the test case, but it improves readability of the test case :)

NoQ added a comment.EditedJun 14 2019, 2:15 PM

How about we track the condition value past its collapse point only if it involves an overwrite of a variable (or other region) from which the value is loaded? Like, if there are no overwrites, stop at the collapse point. If there are overwrites, stop at the oldest overwrite point (i.e., the value was loaded from 'x' which was previously overwritten by the value of value 'y' which was previously overwritten by the initial value of variable 'z' => stop at the overwrite point of 'y').

NoQ added a comment.Jun 14 2019, 2:41 PM

(then, again, not sure what happens if more nested stack frames are around)

In D62883#1544283, @NoQ wrote:

However, this heuristic breaks when the same code appears in an inlined stack frame. Because given the context, we need to prove that this check makes sense *in this context*. I strongly suspect that all of the heuristics that you're talking about depend significantly on the surrounding call stacks.

I don't follow you here, unfortunately, could you please elaborate? Also, I'll come out with this right now: I never really understood these contexts, is there a handy doc or a paper I could take a look at?

Like, i suspect it so strongly that i'm ready to admit that i've no idea what we're doing anymore, and it makes sense to take a pen and a sheet of paper and try to write down the tracking rules, and then come up with an easily editable and extensible architecture for tweaking those rules as we discover more and more of them.

That sounds great. The should-not-have-happened and the must-have-happened cases will all rely on this (since in both cases, at least for now, we'd like to use the extra information to track conditions), so maybe it's due to refresh trackExpressionValue. It would also make publishing and reviewing my works a lot easier.

In D62883#1544302, @NoQ wrote:

On the other hand, all of these problems seem to be examples of the problem of D62978. Might it be that it's the only thing that we're missing? Like, we need to formulate a rule that'll give us an understanding of when do we track the value past the point where it has been constrained to true or false - do we care about the origins of the value or do we care about its constraints only? In case of flag in the test examples, we clearly do. In case of these bools that come from boolean conversions and assertions and such, we clearly don't. What's the difference?

How about we track the condition value past its collapse point only if it involves an overwrite of a variable (or other region) from which the value is loaded? Like, if there are no overwrites, stop at the collapse point. If there are overwrites, stop at the oldest overwrite point (i.e., the value was loaded from 'x' which was previously overwritten by the value of value 'y' which was previously overwritten by the initial value of variable 'z' => stop at the overwrite point of 'y').

(then, again, not sure what happens if more nested stack frames are around)

For now, I'd like to restrict my efforts to simply "we should track this" or "we shouldn't track this". I think simply not tracking cases where the condition is a single variable assignment is a good initial approach. Let how the tracking is done be a worry for another day. Also, I'm a little confused, isn't this comment about how D62978 could be solved?

If you hover your mouse over the icon, you can also read a comment of mine. When you open a report, in the right corner of the source code you'll find a dropdown menu, allowing you to see the report without this patch applied:

I added another run that displays notes at each condition that this new visitor tracks.

Szelethus marked an inline comment as done.Jun 16 2019, 11:44 AM
Szelethus added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
157

Later on, when I'm covering the "should-not-have-happened" case, it's possible that a condition would be added that wasn't even explored by the analyzer, so this is kinda necessary.

Szelethus updated this revision to Diff 204973.Jun 16 2019, 4:11 PM
  • Rebase on the rest of the patches
  • Make TrackControlDependencyCondBRVisitor local to BugReporterVisitors.cpp
  • Hide the current implementation behind the off-by-default analyzer config "track-conditions"
  • Add two more testcases I'd like to improve on in followup patches

Do you think that this patch looks good enough now that it's off by default? Maybe it'd be easier to develop/review followup changes separately.

Szelethus marked 8 inline comments as done.Jun 16 2019, 4:13 PM
Szelethus added inline comments.
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2009

llvm::IDFCalculator takes the blocks as a non-const pointer.

Charusso requested changes to this revision.Jun 16 2019, 5:01 PM

Hey! It is a cool patch as the smallest the scope the better to understand what is going on. I like the direction, but we saw with @NoQ, we have to think about "In which range do we track?".

  • Hide the current implementation behind the off-by-default analyzer config "track-conditions"

Do you think that this patch looks good enough now that it's off by default? Maybe it'd be easier to develop/review followup changes separately.

It would be helpful to see what is exactly new, what trackExpressionValue() cannot provide. Could you adjust the test case with a second run, where you switch off that new feature, please? (I highly recommend that for future developments also.)

In D62883#1544302, @NoQ wrote:

On the other hand, all of these problems seem to be examples of the problem of D62978. Might it be that it's the only thing that we're missing? Like, we need to formulate a rule that'll give us an understanding of when do we track the value past the point where it has been constrained to true or false - do we care about the origins of the value or do we care about its constraints only? In case of flag in the test examples, we clearly do. In case of these bools that come from boolean conversions and assertions and such, we clearly don't. What's the difference?

How about we track the condition value past its collapse point only if it involves an overwrite of a variable (or other region) from which the value is loaded? Like, if there are no overwrites, stop at the collapse point. If there are overwrites, stop at the oldest overwrite point (i.e., the value was loaded from 'x' which was previously overwritten by the value of value 'y' which was previously overwritten by the initial value of variable 'z' => stop at the overwrite point of 'y').

(then, again, not sure what happens if more nested stack frames are around)

For now, I'd like to restrict my efforts to simply "we should track this" or "we shouldn't track this". I think simply not tracking cases where the condition is a single variable assignment is a good initial approach. Let how the tracking is done be a worry for another day. Also, I'm a little confused, isn't this comment about how D62978 could be solved?

As @NoQ pointed out we have some problem with that function. We are tracking *values* without using the redecl()-chain (see in https://clang.llvm.org/doxygen/DeclBase_8h_source.html#l00948), or without tracking conditions. You have solved(?) the latter, but the former is equally important to solve.

As I see that is the current situation:

  • ReturnVisitor: give us too much unrelated information as we go out of scope.
  • FindLastStoreBRVisitor: do not use redecls() so gets confused with all the hackish SVal conversions.
  • TrackControlDependencyCondBRVisitor: swaps notes which is questionable if I want to see what the above two kindly provides for me, or I would remove them instead.

I believe this patch should be on by default, but it requires to fix the "In which range do we track?" problem with D62978.

This revision now requires changes to proceed.Jun 16 2019, 5:01 PM
Szelethus marked an inline comment as done.EditedJun 16 2019, 5:24 PM

Hey! It is a cool patch as the smallest the scope the better to understand what is going on. I like the direction, but we saw with @NoQ, we have to think about "In which range do we track?".

  • Hide the current implementation behind the off-by-default analyzer config "track-conditions"

Do you think that this patch looks good enough now that it's off by default? Maybe it'd be easier to develop/review followup changes separately.

It would be helpful to see what is exactly new, what trackExpressionValue() cannot provide. Could you adjust the test case with a second run, where you switch off that new feature, please? (I highly recommend that for future developments also.)

You can see how this would look like by checking the history of this revision: https://reviews.llvm.org/D62883?vs=204554&id=204973&whitespace=ignore-most#toc. Adding a second RUN: line might be a good idea though.

In D62883#1544302, @NoQ wrote:

On the other hand, all of these problems seem to be examples of the problem of D62978. Might it be that it's the only thing that we're missing? Like, we need to formulate a rule that'll give us an understanding of when do we track the value past the point where it has been constrained to true or false - do we care about the origins of the value or do we care about its constraints only? In case of flag in the test examples, we clearly do. In case of these bools that come from boolean conversions and assertions and such, we clearly don't. What's the difference?

How about we track the condition value past its collapse point only if it involves an overwrite of a variable (or other region) from which the value is loaded? Like, if there are no overwrites, stop at the collapse point. If there are overwrites, stop at the oldest overwrite point (i.e., the value was loaded from 'x' which was previously overwritten by the value of value 'y' which was previously overwritten by the initial value of variable 'z' => stop at the overwrite point of 'y').

(then, again, not sure what happens if more nested stack frames are around)

For now, I'd like to restrict my efforts to simply "we should track this" or "we shouldn't track this". I think simply not tracking cases where the condition is a single variable assignment is a good initial approach. Let how the tracking is done be a worry for another day. Also, I'm a little confused, isn't this comment about how D62978 could be solved?

As @NoQ pointed out we have some problem with that function. We are tracking *values* without using the redecl()-chain (see in https://clang.llvm.org/doxygen/DeclBase_8h_source.html#l00948), or without tracking conditions. You have solved(?) the latter, but the former is equally important to solve.

I'm a little confused, which is the "former" and "latter" you're referring to?

As I see that is the current situation:

  • ReturnVisitor: give us too much unrelated information as we go out of scope.
  • FindLastStoreBRVisitor: do not use redecls() so gets confused with all the hackish SVal conversions.
  • TrackControlDependencyCondBRVisitor: swaps notes which is questionable if I want to see what the above two kindly provides for me, or I would remove them instead.

I believe this patch should be on by default, but it requires to fix the "In which range do we track?" problem with D62978.

I disagree with this. The reason why I'd like to make this an off-by-default option is to implement my followup improvements incrementally (not only that, but a whole family of conditions is going to be added), and allow us to observe the changes those make in relation to this patch -- besides, I don't really see this patch changing even if I manage to fix those issues. Would you like to see a change being made to this specific patch?

As @NoQ pointed out we have some problem with that function. We are tracking *values* without using the redecl()-chain (see in https://clang.llvm.org/doxygen/DeclBase_8h_source.html#l00948), or without tracking conditions. You have solved(?) the latter, but the former is equally important to solve.

I'm a little confused, which is the "former" and "latter" you're referring to?

I believe you have solved the condition tracking as you move in-place what is going on.

I believe this patch should be on by default, but it requires to fix the "In which range do we track?" problem with D62978.

I disagree with this. The reason why I'd like to make this an off-by-default option is to implement my followup improvements incrementally (not only that, but a whole family of conditions is going to be added), and allow us to observe the changes those make in relation to this patch -- besides, I don't really see this patch changing even if I manage to fix those issues. Would you like to see a change being made to this specific patch?

As you moving stuff around it just bypasses the usefulness of the mentioned two visitors, which is a problem. Also you have mentioned some very crazy bad side-effects which is yes, related to in which range do we operate.

Szelethus updated this revision to Diff 204980.Jun 16 2019, 5:51 PM

Add another RUN: line to see more clearly the effects of this patch.

Szelethus added a comment.EditedJun 16 2019, 5:57 PM

As @NoQ pointed out we have some problem with that function. We are tracking *values* without using the redecl()-chain (see in https://clang.llvm.org/doxygen/DeclBase_8h_source.html#l00948), or without tracking conditions. You have solved(?) the latter, but the former is equally important to solve.

I'm a little confused, which is the "former" and "latter" you're referring to?

I believe you have solved the condition tracking as you move in-place what is going on.

I'm still not sure I follow :) What are the two distinct things to be solved here?

I believe this patch should be on by default, but it requires to fix the "In which range do we track?" problem with D62978.

I disagree with this. The reason why I'd like to make this an off-by-default option is to implement my followup improvements incrementally (not only that, but a whole family of conditions is going to be added), and allow us to observe the changes those make in relation to this patch -- besides, I don't really see this patch changing even if I manage to fix those issues. Would you like to see a change being made to this specific patch?

As you moving stuff around it just bypasses the usefulness of the mentioned two visitors, which is a problem. Also you have mentioned some very crazy bad side-effects which is yes, related to in which range do we operate.

Could you please elaborate? The particular thing I'm missing, I guess, is the need to get those issues fixed before this patch, and making this on-by-default. If anything, enabling a flag like this could really demonstrate changes on those problems. I also like to think that tracking a condition (especially a condition of a control dependency of yet another condition) is a drastically different case that tracking a "regular" variable (in particular, I think more aggressive pruning could be used), and such a change would definitely be out of scope for this patch.

I like to think that most of the unimportant notes can be easily categorized, as seen above. With some, admittedly crude heuristics, we could cut these down greatly, and leave some breathing room for fine-tuning it.

Charusso accepted this revision.Jun 17 2019, 7:58 AM

My opinion based on the Summary, which is the kind of the opposite what we have. No invalidation happens, so no swap happens, so it is kind of misleading like writing out "Returning something" where we cannot be sure what we return ("resulting in new notes being placed at for the call to foo() on line 14 and a note about flag being invalidated on line 5.").
Please update the Summary according to the changes in the patch.

Also please note that, we are very pessimistic and everything is uninteresting (like returning unknown values), you only could markInteresting().

I like to think that most of the unimportant notes can be easily categorized, as seen above. With some, admittedly crude heuristics, we could cut these down greatly, and leave some breathing room for fine-tuning it.

I am not against that direction, just you throw oil on fire, and we already have a huge fire (D62978). I believe with that we could see more bad-patterns and we could answer "In which range do we track?".
I hope you find the answer, and thanks for working on that to solve the problem!

NoQ added inline comments.Jun 17 2019, 2:29 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1673–1677

A bit clearner:

auto StmtElem = B->rbegin().getAs<CFGStmt>();
if (!StmtElem)
  return nullptr;

const Stmt *S = StmtElem->getStmt();

Also how about CFGBlock::getTerminatorCondition()?

1704–1706

// TODO: This can be cached.

1710

All right, i still don't understand this caching based on condition expression.

You mean, like, if we're encountering the same condition multiple times (say, in a loop), we should only track it once? Why? Like, its values (which are the thing we'll really be tracking) may be different (say, on different loop iterations).

Szelethus marked 3 inline comments as done.Jun 17 2019, 3:17 PM
Szelethus added inline comments.
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1673–1677

I peeked at it's implementation, and it seems to be incorrect.

Referencing @xazax.hun's inline:

I vaguely recall some problem with the results of getTerminatorStmt for logical operators like &&. I believe what we really want is the last expression we evaluated in the basic block which will be the last Stmt of the basic block. So if we can settle with the last stmt we can get rid of this code.

And this is what CFGBlock::getTerminatorCondition() does:

Stmt *Terminator = getTerminatorStmt();
if (!Terminator)
  return nullptr;
 
Expr *E = nullptr;

switch (Terminator->getStmtClass()) {
  // ...
  case Stmt::BinaryOperatorClass: // '&&' and '||'
    E = cast<BinaryOperator>(Terminator)->getLHS();
  // ....
}
return E;

But nevertheless, maybe it'd be good to fix this in there and use it here.

1704–1706

Like, caching the CFGBlocks for given ExplodedNodes?

1710

Yup, can't argue with that, I'll revise this.

NoQ added inline comments.Jun 17 2019, 3:23 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1704–1706

I mean, on this line it's the same block every time (per instance of a visitor).

Szelethus set the repository for this revision to rG LLVM Github Monorepo.Jun 20 2019, 2:29 PM
Szelethus updated this revision to Diff 205916.Jun 20 2019, 3:41 PM
Szelethus marked an inline comment as done.
  • Now using CFGBlock::getTerminatorConditionExpr()
  • Uniqueing already tracked conditions as an (Expr, ExplodedNode) pair instead of on Expr

I would be surprised to see the same ExploderNode multiple time a lot. Also, ExploderNode will have a program point, so having both an ExploderNode and an Expr feels redundant to me.

NoQ added a comment.Jun 20 2019, 5:08 PM
  • Uniqueing already tracked conditions as an (Expr, ExplodedNode) pair instead of on Expr

I would be surprised to see the same ExploderNode multiple time a lot. Also, ExploderNode will have a program point, so having both an ExploderNode and an Expr feels redundant to me.

(Expr, LocationContext) sounds right.

NoQ added a comment.Jun 20 2019, 5:09 PM

Oh, wait, no, loops.

Szelethus updated this revision to Diff 205954.Jun 21 2019, 2:53 AM
  • Collect only ExplodedNodes. Lesson learned!
  • Add a TODO: in trackExpressionValue about maybe tracking conditions to all bug locations, rather than only for tracked variables.
Szelethus marked an inline comment as done.Jun 21 2019, 2:53 AM
Szelethus retitled this revision from [analyzer] Track conditions of terminator statements on which the reported node depends on to [analyzer] Track terminator conditions on which a tracked expressions depends.Jun 21 2019, 4:28 AM
Szelethus set the repository for this revision to rG LLVM Github Monorepo.
NoQ added a comment.Jun 21 2019, 2:40 PM
  • Add a TODO: in trackExpressionValue about maybe tracking conditions to all bug locations, rather than only for tracked variables.

What do you mean by "all bug locations"?

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
355–357

Pls add a comment that explains when does this function return true or false. I always forget what does insert().second do :)

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1692

Maybe let's add a comment that this is for inter-visitor communication only. Because otherwise we won't visit the same node twice in the same visitor.

In D62883#1554339, @NoQ wrote:
  • Add a TODO: in trackExpressionValue about maybe tracking conditions to all bug locations, rather than only for tracked variables.

What do you mean by "all bug locations"?

This visitor is added when a variable is tracked, yet I feel the control dependency of every report is important (else we would never have discovered the error).

NoQ accepted this revision.Jun 21 2019, 4:13 PM

Aha, ok, got it. I guess the official term is "error node" (where "error" means "warning").

clang/test/Analysis/track-control-dependency-conditions.cpp
2–9

Btw did you try -verify= as in D60732? It might make the test more readable.

NoQ added a comment.Jun 21 2019, 4:14 PM

It should be pretty easy to implement, just add your new visitor to the list of default visitors in findValidReport().

Woohoo! Thanks for everything, this is the most fun I've had working on this project! Let's wait for @xazax.hun to have the final say.

In D62883#1554494, @NoQ wrote:

It should be pretty easy to implement, just add your new visitor to the list of default visitors in findValidReport().

The thinking was to preserve this patch (roughly) in the state as the analyses were run. I also plan to patch CodeChecker so that it can diff two runs only only by bug report hash, but also by bug report length, which should make evaluation a lot easier. This may imply that I'll delay tracking control dependencies to all error nodes for later, as LLVM literally takes a day to analyze each time (and the sheer size of it gives invaluable data, which is why I plan doing it, but will check other projects too).

Szelethus marked an inline comment as done.Jun 21 2019, 4:32 PM
Szelethus added inline comments.
clang/test/Analysis/track-control-dependency-conditions.cpp
2–9

So thats how you do it! Yea, this looks a lot better.

NoQ added a comment.Jun 22 2019, 10:43 PM
In D62883#1554494, @NoQ wrote:

It should be pretty easy to implement, just add your new visitor to the list of default visitors in findValidReport().

The thinking was to preserve this patch (roughly) in the state as the analyses were run.

Sure!

xazax.hun accepted this revision.Jun 25 2019, 6:37 AM

Artem had some comments that are not marked as done, but LGTM!

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
157

When the analyzer did not explore a given code snippet we will not have an exploded node, but we can change this in a followup patch.

355–357

This is not done yet :)

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1692

Also not done.

This revision is now accepted and ready to land.Jun 25 2019, 6:37 AM
Szelethus marked an inline comment as done.Jun 25 2019, 10:27 AM

I usually do the final update right before commiting. There still are non-accepted dependencies, who knows what'll happen.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
157

Ugh, right. Let's keep this as is is for now, dunno how I forgot about it.

Szelethus updated this revision to Diff 208084.Jul 4 2019, 2:46 PM
Szelethus marked 6 inline comments as done.
  • Add two more test cases when a "Returning value" note is meaningful, and one where it's not
  • Fix inlines!
NoQ accepted this revision.Jul 4 2019, 7:52 PM

Accept².

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1824–1825

To think: not sure you want to do this for memory leak reports, because the place where the leak is discovered (i.e., where we randomly decided to run garbage collection and noticed a leak) may be fairly weird. Path to that point may still be moderately interesting (after all, it presumably doesn't leak on other paths), but i'm still worried that we may bring in some completely unrelated stuff. I might be wrong.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2019, 6:30 AM