This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] ConditionBRVisitor: Enhance to write out more information
ClosedPublic

Authored by Charusso on Oct 10 2018, 6:28 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Charusso updated this revision to Diff 168994.Oct 10 2018, 6:29 AM

Added the mentioned second diff.

I'm sadly not that knowledgeable on visitors to help you, but the idea is awesome, thank you for working on this!

The change makes sense to me, but:

  1. Note that "Assuming X" directives are useful for the analyzer developers, since they know that's where the checker makes arbitrary assumptions, but to end users that mostly feels like noise ("Taking true branch" is there already, why there should be "Assuming 'i' is > 0" as well?)
  1. @NoQ do you know why the GDM comparison was there in the first place? The commit was made by Ted in 2011, maybe constraint changes had to be reflected in the GDM at that point (?)

The change makes sense to me, but:

  1. Note that "Assuming X" directives are useful for the analyzer developers, since they know that's where the checker makes arbitrary assumptions, but to end users that mostly feels like noise ("Taking true branch" is there already, why there should be "Assuming 'i' is > 0" as well?)
  1. @NoQ do you know why the GDM comparison was there in the first place? The commit was made by Ted in 2011, maybe constraint changes had to be reflected in the GDM at that point (?)

Based on @baloghadamsoftware's idea (D34508) the main goal is to print out trivial values for the Clang SA, but lot of time consuming nonsense for developers. As you mentioned it is bad to have multiple assumptions, but half of the reporters behave like that, and the other half does not print any/useful information.

That two separate project based on that current project: create the reports, and make the style identical, so that we could enhance them. The best example here is test/Analysis/diagnostics/macros.cpp: on line 27 you could see the message Assuming the condition is true but the upper conditions miss this report.

NoQ added a comment.EditedOct 10 2018, 2:50 PM
  1. Note that "Assuming X" directives are useful for the analyzer developers, since they know that's where the checker makes arbitrary assumptions, but to end users that mostly feels like noise ("Taking true branch" is there already, why there should be "Assuming 'i' is > 0" as well?)

I believe that distinction between "Assuming..." (event piece, yellow bar in scan-build) and "Taking..." (control-flow piece, grey bar in scan-build, depicted with arrows without text in a number of other UIs) is useful for the users as well, not just for hackers. I agree that the distinction isn't obvious and for now this part of the UI is not very useful. But now that you're at it, i think it'd be better to fix this problem by making diagnostics more understandable rather than by simplifying out the distinction.

For example, in the inline-plist.c's bar() on line 45, Static Analyzer indeed doesn't assume that p is equal to null; instead, Static Analyzer *knows* it for sure. There's no guess made here, and that's not an assumption that the user would need to agree or disagree with while looking at the report. Instead, it is an inevitable consequence of the previous events that occurred on this path. So i guess we could say something like "Knowing that 'p' is equal to null" or "'p' is inevitably null" and it should make the distinction apparent to the user. The user would notice that there's a change in the way we're talking about the fact.

The other reason why it's important is that those arbitrary assumptions are one of the fundamental weakness of the technique behind Static Analyzer: the user is indeed allowed to disagree with these assumptions and then mark the positive as false and suppress it with an assertion. In a code with a single branch such approach works fine because it is based on "presumption of no deadcode" (i.e., if there's an if in the code, both branches should be reached sometimes), but when there are N consequent branches, it is not necessary for all 2^N possible execution paths to be feasible: an O(N) number of paths can cover all the branches. But when it comes to actual facts that are inevitably true after the user has agreed with all previous assumptions on the path, the user can't disagree with those facts anymore, and that's an important piece of info to convey.

  1. @NoQ do you know why the GDM comparison was there in the first place? The commit was made by Ted in 2011, maybe constraint changes had to be reflected in the GDM at that point (?)

It's likely that back then GDM only contained constraints and checker info (and program point kind guaranteed that checker info could not have changed). But that's not the case anymore (we have more core traits - dynamic type info, C++ constructor support, taint, etc.), so this code is definitely incorrect; not sure how often does it matter. In order to produce an actual "correct" logic (correct in the sense that it preserves the old, pre-patch behavior), we probably need to add a method to ConstraintManager to ask whether constraints have changed between two states.

dkrupp added a subscriber: dkrupp.Oct 16 2018, 2:10 AM
Charusso updated this revision to Diff 170314.Oct 20 2018, 12:23 PM
Charusso retitled this revision from [analyzer] WIP: Enhance ConditionBRVisitor to write out more information to [analyzer] Enhance ConditionBRVisitor to write out more information.
  • Fixed the duplicated reports.
  • Removed the EndPath function what is remained somehow and confused me.
Charusso added a comment.EditedOct 20 2018, 12:58 PM
In D53076#1261134, @NoQ wrote:

For example, in the inline-plist.c's bar() on line 45, Static Analyzer indeed doesn't assume that p is equal to null; instead, Static Analyzer *knows* it for sure.

Thanks you! This is a great example what I have to cover later on. I have a patch where we print out known integers. The basic style is the following: Assuming 'x' is not equal to 1. I would like to emphasize the value and if it a known value, make it looks like this: Variable 'x' is equal to '1', or Variable '*ptr' is equal to '1'. (If this is the situation: Constant 'x' is equal to '1' would be cool as well.)

I made that patch in a separated file called BugReporterHelpers.cpp next to the BugReporterVisitors. I also would like to move all the helper functions from BugReporterVisitors.cpp to that source file. My first idea with that to create a live documentation, how would a new clang-hacker obtain a value from a certain position (me with testing those things out). Also what you mentioned with this flow-sensitive chaining, this is could not be a short patch, so I think this is the time when we want to have something like this.

What do you think? If this patch goes well, should I attach the mentioned new patch to this, or create a new one?

Charusso edited the summary of this revision. (Show Details)Oct 20 2018, 1:16 PM
Charusso edited the summary of this revision. (Show Details)Oct 20 2018, 1:22 PM
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
80 ↗(On Diff #170314)

Traditionally LLVM projects use llvm::DenseMap

183 ↗(On Diff #170314)

return Constraints.insert(make_pair(Cond, Message)).second ?

george.karpenkov requested changes to this revision.Oct 22 2018, 7:00 PM
george.karpenkov added inline comments.
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
202 ↗(On Diff #170314)

These constraints are conceptually part of the visitor, not part of the constraint manager. Could they be simply stored in the visitor?

include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
235 ↗(On Diff #170314)

Probably should be replaced by the expression above and inlined.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
216

spurious change

1848

capital letter

This revision now requires changes to proceed.Oct 22 2018, 7:00 PM
Szelethus added inline comments.Oct 23 2018, 4:48 AM
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
182 ↗(On Diff #170314)

I would not expect a function called isChanged to change the state of an object. Can you find a name that doesn't feel like it's a simple getter function?

Charusso updated this revision to Diff 170808.Oct 23 2018, 9:27 PM
Charusso marked 7 inline comments as done.
Charusso added a subscriber: gsd.

Refactor.

whisperity added inline comments.Oct 24 2018, 1:22 AM
include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
238 ↗(On Diff #170808)

insertOrUpdateContraintMessage

and mention in the documentation that it returns whether or not the actual insertion or update change took place

george.karpenkov requested changes to this revision.Oct 24 2018, 10:34 AM
george.karpenkov added inline comments.
include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
80 ↗(On Diff #170808)

Generally, StringRef's shouldn't be stored in containers, as containers might outlive them.
It might be fine in this case, but I would prefer to be on the safe side and just use std::string

238 ↗(On Diff #170808)

If constraints is now a property of the visitor, shouldn't this method with the typedef above be moved to the visitor as well?

244 ↗(On Diff #170808)

Isn't that equivalent to Constraints.insert(make_pair(Cond, Message)).second ?
I think I have written that before.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2263

It does not seem necessary, because a new copy of the visitor is created for each new bug report.

This revision now requires changes to proceed.Oct 24 2018, 10:34 AM
Charusso updated this revision to Diff 171037.Oct 24 2018, 9:50 PM
Charusso marked 5 inline comments as done.

More refactor.

Charusso added inline comments.Oct 24 2018, 9:55 PM
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
202 ↗(On Diff #170314)

My idea was to have a generic constraint map as @NoQ mentioned, then we could attach this to other places to reduce noisy reports. But probably this is the best place for now.

include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
244 ↗(On Diff #170808)

We have multiple messages at the same place so we have to update the message. The problem with your code is insert operates with disjunct keys, not values.

george.karpenkov requested changes to this revision.Oct 25 2018, 10:10 AM

Thanks a lot, this is almost ready!
I think it should be good to go after this round of nitpicks.

include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
168 ↗(On Diff #171037)

From a brief inspection this indeed seems dead code. However, this removal should be moved into a separate revision (sorry!)

include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
680 ↗(On Diff #171037)

Same: should be moved into a separate revision, same as the other removal.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1866

How do we know that it's always an Expr?

1867

/*tookTrue=*/tag == tag.first

1995
  1. What about the refactoring I have previously suggested twice?
  2. I know you have started with that -- and sorry for spurious changes -- but I also think your original name of constraintsChanged is better.
lib/StaticAnalyzer/Core/ExprEngine.cpp
2261 ↗(On Diff #171037)

Should be moved together with other two removals.

lib/StaticAnalyzer/Core/ProgramState.cpp
29 ↗(On Diff #171037)

While this minor formatting is correct, it's better to remove it to simplify future archaeology.

This revision now requires changes to proceed.Oct 25 2018, 10:10 AM
Charusso updated this revision to Diff 171143.Oct 25 2018, 10:53 AM
Charusso marked 9 inline comments as done.

Removed the unnecessary changes.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1866

VisitTrueTest operates with Expr only, so I just left the type.

1995

We have multiple messages at the same place so we have to update the message. The problem with your code is insert operates with disjunct keys, not values.

I have commented this after the last refactor, sorry for the inconvenience.

This revision is now accepted and ready to land.Oct 25 2018, 10:58 AM

Thanks everyone for the review!

@george.karpenkov could you commit it please? I am interested in your ideas in the little discussion started with @NoQ at the beginning of the project.

george.karpenkov requested changes to this revision.Oct 25 2018, 11:48 AM

Actually, apologies again: I have rushed through this too much.
@NoQ has a good point: we need to preserve the distinction between the things analyzer "assumes" and the things analyzer "knows".

This revision now requires changes to proceed.Oct 25 2018, 11:48 AM
NoQ added a comment.Oct 25 2018, 11:51 AM

I mean, the idea of checking constraints instead of matching program points is generally good, but the example i gave above suggests that there's a bug somewhere.

@NoQ has a good point: we need to preserve the distinction between the things analyzer "assumes" and the things analyzer "knows".

Yes, but I think it should be a new patch because I would like to add a value-dump method on integers/booleans and stuff I already mentioned in the beginning.

In D53076#1276277, @NoQ wrote:

I mean, the idea of checking constraints instead of matching program points is generally good, but the example i gave above suggests that there's a bug somewhere.

I think it is an unimplemented feature which appear like 1:500 time, but we will see.

NoQ added a comment.Oct 25 2018, 1:01 PM
In D53076#1276277, @NoQ wrote:

I mean, the idea of checking constraints instead of matching program points is generally good, but the example i gave above suggests that there's a bug somewhere.

I think it is an unimplemented feature which appear like 1:500 time, but we will see.

The original behavior is perfectly consistent with my understanding of Static Analyzer's reports that i've been reading continuously for years. There might have been a few places where assumption is made but isn't highlighted, but the opposite has never happened, and also the opposite is more confusing to the user because it demonstrates an explicit text that the user has to trust. So i think that one way or another, the new behavior is a regression from the original behavior on an on-by-default functionality on some test cases, and we should not commit this patch until this regression is debugged and fixed.

I highlight a few more test cases that i believe have regressed in inline comments.

test/Analysis/MisusedMovedObject.cpp
187 ↗(On Diff #171143)

These assumptions were already made on the previous branches. There should be no extra assumptions here.

221 ↗(On Diff #171143)

We have assumed that i is >= 10 on the previous branch. It imples that i is greater than 5, so no additional assumption is being made here.

test/Analysis/NewDelete-path-notes.cpp
9–10

Static Analyzer knows that the standard operator new never returns null. Therefore no assumption is being made here.

test/Analysis/diagnostics/macros.cpp
33 ↗(On Diff #171143)

This one's good. Static Analyzer doesn't understand floats, so this branch is indeed non-trivial. There should indeed be an assuming... piece here.

test/Analysis/diagnostics/no-store-func-path-notes.cpp
23–24

This method is called from use() with param equal to concrete 0. It is not analyzed as a top-level function. There is no assumption made here, like in most other places in this file.

test/Analysis/diagnostics/no-store-func-path-notes.m
13–14

This method is called from foo() with param equal to concrete 0. It is not analyzed as a top-level function. There is no assumption made here.

26–27

-initVar returns concrete 0 when called with these parameters. There is no assumption being made here.

34–35

This function is called from inifFromBlock() with x equal to concrete 0. It is not analyzed as a top-level function. Therefore, no assumption is being made here.

test/Analysis/inline-plist.c
46–47

The condition !!p above being assumed to false ensures that p is equal to null here. We are not assuming it again here.

test/Analysis/uninit-vals.m
167–168

These are pretty weird. As far as i understand the test, these should be there. But i'm suddenly unable to debug why were they not shown before, because there's either something wrong with exploded graph dumps or with the exploded graph itself; it appears to be missing an edge right after size > 0 is assumed. I'll look more into those.

Charusso updated this revision to Diff 171398.Oct 27 2018, 6:07 AM
Charusso marked 7 inline comments as done.
Charusso edited the summary of this revision. (Show Details)

Added a function to write out known integers and booleans.

test/Analysis/MisusedMovedObject.cpp
187 ↗(On Diff #171143)

Agree but only if there is no extra constraint EventPiece between them.

221 ↗(On Diff #171143)

Agree but only if there is no extra constraint EventPiece between them.

test/Analysis/NewDelete-path-notes.cpp
9–10

As I see SA knows nothing. Where to teach it?

test/Analysis/inline-plist.c
46–47

Agree but only if there is no extra constraint EventPiece between them.

Charusso added inline comments.Oct 27 2018, 6:14 AM
test/Analysis/Inputs/expected-plists/cxx-for-range.cpp.plist
900

Double negating is not in standard English, so this behaviour is documented here.

test/Analysis/Inputs/expected-plists/edges-new.mm.plist
9028

@NoQ this is the only place (as I see) where SA works with flow-sensitive constraint-handling what you mentioned. Is it a good change?

NoQ added a comment.Nov 1 2018, 8:52 PM

@Charusso, i believe you have some misconception about what constraints mean and how they work. Not sure what this misconception is, so i'll make a blind guess and annoy you a little bit with a narcissistic rant and you'll have to bear me, sry!

The most important thing to know about constraints is that they are applied to "symbols", not to variables. We explain it to the user in terms of variables (when possible) because symbols are immaterial in the code and as such cannot be easily explained to the user, but in reality constraints are entirely about symbols.

Let me explain this with an example. If you say "let x denote the number of sheep in a herd" and then a new sheep is born and joins the herd, the number of sheep becomes (x + 1) and no longer remains equal to x, while x can still be used to represent the *original* number of sheep. In this example, the number of sheep is the variable, and x is the symbol.

The good thing about symbols is that they denote the same unknown value throughout the whole analysis. Values of variables may change (hence the nomenclature), but values denoted by symbols never change. They are kinda like Static Analyzer's substitute for transforming the program into SSA notation. When constraints are assumed, we make assumptions about the unknown value denoted by the symbol and it becomes "less unknown", but it is still the same value, it's just we know more about it. Which means that:

  • "The Golden Rule of Constraint Solving": (i came up with this title 3 seconds ago) In any two points A and B during analysis (i.e., nodes in the exploded graph), if A precedes B along any execution path (i.e., there's a directed path from A to B in the exploded graph), constraints for any symbol x in point B are a subset of constraints for x in point A.

For example, if Static Analyzer assumes that a symbol x is greater than 10, it cannot later assume that x is less than 5, because [-∞, 4] is not a subset of [11, +∞]. Therefore Static Analyzer will not explore the path on which x is less than 5, and it will not update range constraints for the symbol. But it will explore the path on which x is also less than 15 by setting range constraints for x to [11, 14] because [11, 14] is indeed a subset of [11, +∞].

The formal definition of "Assuming" pieces is that they indicate that the constraints change for the symbol (eg., for the symbol that represents the *current* value of the variable mentioned in the message of the event piece). This is the precise meaning of "Static Analyzer is assuming". And when explained in terms of variables, this precise meaning is comprehend-able by the user and should be conveyed properly. The distinction between presence and absence of assumptions is crucial to convey. Therefore, if the branch condition is denoted by a symbol and constraints over that symbol do not change, there should be no "Assuming..." piece for the branch condition symbol, and the respective mechanism should not be removed. But there may be a new sort of event piece, eg. "Knowing...", if you still want to highlight the motivation behind "Taking..." a specific branch, which seems to be the purpose of your patch.

"Assuming..." pieces over the same variable may contradict each other (i.e., look as if they violate the Golden Rule) when contents of the variable change. So, yeah, in this case another "Assuming..." piece will be necessary in some cases. But it doesn't mean that the Golden Rule is actually violated; it just means that the variable now has different value and different assumptions can be made about it. This doesn't happen in the examples that i pointed out, so there should not be an "Assuming..." piece.

In an inline comment i show what this means on the test case that seems to be the easiest.

test/Analysis/MisusedMovedObject.cpp
187 ↗(On Diff #171143)

I think i answered this concern in the out-of-line comment. Because constraints never change in a contradictory manner but only grow shorter, a much stronger statement can be made here: there are also no extra assumptions when the variable is not reassigned since the last "Assuming..." piece. Of course, even if it is reassigned, we may still not want to have the "Assuming..." piece - it depends entirely on the current constraints over the symbol that represents the branch condition.

test/Analysis/NewDelete-path-notes.cpp
9–10

It does know it. The ultimate way to observe what Static Analyzer knows or thinks at any point during analysis and how its knowledge evolves is to dump the "exploded graph":

As you see, as soon as operator new is evaluated, even before the if-statement is evaluated, the symbol conj_$0{int, LC1, S772, #1} that represents the unknown value of the pointer produced by operator new becomes constrained to [1, 2⁶⁴ - 1] (assuming the target system is 64-bit), which means exactly that: it can take any value except 0.

Therefore when Static Analyzer reaches the if-statement later, it has no choice but to proceed to the true-branch, and not only the false branch is entirely skipped (the execution path doesn't split in two), but also the set of constraints remains unchanged, indicating that no assumption is being made:

For the reference, here's the complete exploded graph for the test:

Charusso updated this revision to Diff 174750.Nov 20 2018, 4:54 AM
Charusso edited the summary of this revision. (Show Details)
  • Implemented a 'look forward' feature to see if the constraint changed.
  • Removed known value printing.
  • Removed 'Assuming...' pieces where the condition is known to be true.
  • Side effect: too many generic assumption message.
Charusso marked 8 inline comments as done.Nov 20 2018, 4:56 AM

@NoQ thanks you for the great explanation! I really wanted to write out known constant integers and booleans but I think 'Knowing...' pieces would be more cool.

I tried to minimize the changes after a little e-mailing with @george.karpenkov on the mailing list. The problem is the following: we go backwards on the bug-path, that is why we see range information changes backwards, but the user information goes forwards: from the top to the bottom. I have implemented the 'check upwards' feature, so now everything working fine, but I think it is a very time-consuming approach. (The idea is copied from BugReporter.cpp/generateVisitorsDiagnostics() where I tried to flip the path, unsuccessfully: http://clang-developers.42468.n3.nabble.com/Visit-nodes-in-the-order-of-the-user-output-to-create-better-reports-td4062898.html )

Charusso added inline comments.Nov 20 2018, 6:09 AM
test/Analysis/diagnostics/macros.cpp
33 ↗(On Diff #171143)

A little bit misunderstandable, because we do not know anything about doubles. May the generic message fit better?

test/Analysis/diagnostics/undef-value-param.m
56 ↗(On Diff #174750)

I am not sure why but the range here [1, (2^64)-1]. There is would not be any number like 0?

test/Analysis/new-ctor-malloc.cpp
11 ↗(On Diff #174750)

The range information is [1, (2^64)-1] but as I saw malloc could return null. Who is lying here?

test/Analysis/uninit-vals.m
167–168

We gather information with clang-analyzer-eval functions so the conditions would be Knowing... pieces but the BugReporter.cpp does not know about these extra assumptations. Where to teach to do not Assuming..., but Knowing... these?

test/Analysis/virtualcall.cpp
172

Because this X x(i - 1); we assume these conditions? This one is tricky to be Knowing....

In the meanwhile we managed to figure out where the problem lays, so if you're interested, I'm going to document it here.

The problem this patch attempts to solve is that ConditionBRVisitor adds event pieces to the bugpath when the state has changed from the previous state, not only when the ConstraintManager differs in between the changes [1]. The patch attempts to solve this by inspecting the actual condition, and tries to find out whether the symbols inside that condition has changed in between the states, but this is overkill solution, as comparing the two ConstraintManager objects would achieve the same thing. This should probably be executed by extending the interface of ConstraintManager with a comparison method.

[1] https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp#L1808

In the meanwhile we managed to figure out where the problem lays, so if you're interested, I'm going to document it here.

The patch attempts to solve this by inspecting the actual condition, and tries to find out whether the symbols inside that condition has changed in between the states, but this is overkill solution, as comparing the two ConstraintManager objects would achieve the same thing.

I tried to investigate what is happening after we entered into a condition, which is working well with 'Knowing...' pieces, but with too much overhead. Because this patch is all about write out more information, rather than removing the GDM-checking I will move forward with 'Knowing...' pieces. The current questions are real for that project, just let me simplify the code and create that feature.

Charusso retitled this revision from [analyzer] Enhance ConditionBRVisitor to write out more information to [analyzer] WIP: Enhance ConditionBRVisitor to write out more information.Nov 21 2018, 1:35 PM
Charusso edited the summary of this revision. (Show Details)
Charusso updated this revision to Diff 175177.Nov 25 2018, 7:41 AM
Charusso marked 4 inline comments as done.
Charusso retitled this revision from [analyzer] WIP: Enhance ConditionBRVisitor to write out more information to [analyzer] Enhance ConditionBRVisitor to write out more information.
Charusso edited the summary of this revision. (Show Details)
  • Better approach to prevent duplicated report messages.

Side effects:

  • Because we rely on constraint range information we measure the bounds with equality operators so lost precision, but now the style of the reports is identical.
  • We create Assuming... pieces on unknown values where generic message has not shown up.

Sorry for the huge noise! It took me a while to see what is the problem.

I think the reason why the printed message was either along the lines of "this is 0" and "this is non-0", is that we don't necessarily know what constraint solver we're using, and adding more non-general code BugReporter is most probably a bad approach. How about we tinker with ContraintManager a little more, maybe add an explainValueOfVarDecl, that would return the value of a variable in a given ProgramState? We could implement it for RangeConstraintSolver (essentially move the code you already wrote there), and leave a TODO with a super generic implementation for Z3.

Although, I can't quite write an essay on top of my head about golden rules of constraint solving, so take my advice with a grain of salt.

include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
163

We only get to know what this field is for while reading the actual implementation, please add comments.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1846–1874

ConstraintRange, as far as I know, is the data RangedConstraintManager stores in the GDM. What if we're using a different constraint solver? I think it'd be better to take the more abstract approach, extend ConstraintManager's interface with an isEqual or operator== pure virtual method, and implement it within it's descendants.

Charusso updated this revision to Diff 175181.Nov 25 2018, 1:21 PM
Charusso marked 8 inline comments as done.
  • The final piece of that puzzle: apply extra Assuming... pieces without the generic message. (A little bit too many.)
  • Comment on IsAssuming.

@Szelethus thanks you for the comments!

I think the reason why the printed message was either along the lines of "this is 0" and "this is non-0", is that we don't necessarily know what constraint solver we're using, and adding more non-general code BugReporter is most probably a bad approach.

It was an unimplemented feature to write out known stuff. There is no constraint solving, it is rather constant value dumping.

How about we tinker with ContraintManager a little more, maybe add an explainValueOfVarDecl, that would return the value of a variable in a given ProgramState? We could implement it for RangeConstraintSolver (essentially move the code you already wrote there), and leave a TODO with a super generic implementation for Z3.

I agree with @george.karpenkov, we want to have the smallest scope.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1846–1874

Because we only operate in the ConditionBRVisitor it would be useless for now.

Charusso marked an inline comment as done.Nov 25 2018, 1:22 PM
gerazo added a subscriber: gerazo.Nov 26 2018, 7:20 AM

This looks more reasonable, thanks!
What about just dropping the Knowing prefix?
Just having arr is null, taking true branch seems considerably more readable.

george.karpenkov requested changes to this revision.Nov 26 2018, 1:25 PM
george.karpenkov added inline comments.
test/Analysis/uninit-vals.m
222

Same here: we should know that testObj->size == 0

324

That does not seem right: from calloc the analyzer should know that the testObj->size is actually zero.

test/Analysis/virtualcall.cpp
170

Could you describe what happens here?
Why the assuming notes weren't there before?

This revision now requires changes to proceed.Nov 26 2018, 1:25 PM
Charusso marked 2 inline comments as done.Nov 26 2018, 2:37 PM

@george.karpenkov thanks you for the comments!

What about just dropping the Knowing prefix?
Just having arr is null, taking true branch seems considerably more readable.

I wanted to create something identical to the current reports. If we are about to change readability, I would change that two first:

  • identical operator printing with <, <=, >..., so equal to is = and not equal to is !=.
  • emphasize the value information with quotes, so Assuming 'i' >= '2'.
test/Analysis/uninit-vals.m
324

That Assuming... piece is rely on the change between the last two diffs: I just print out more VarDecls in patternMatch(). I thought everything works fine, so I will check that problem if @NoQ will not be faster as he already found some problematic code piece in that test file.

test/Analysis/virtualcall.cpp
170

We do not have range information in the *newly seen Assuming... pieces. Because the analyzer has not learnt new information - as there is no information - we have not entered to VisitTrueTest() to report these. A good reference for that new behaviour is in test/Analysis/Inputs/expected-plists/cxx-for-range.cpp.plist.

*I have added these with https://reviews.llvm.org/D53076?id=175177

NoQ added a comment.Nov 29 2018, 6:44 PM

There still seem to be a couple of regressions with Assuming... pieces, but other than that, i believe that an awesome piece of progress has been made here. I really like how it looks now. I agree with George that dropping "Knowing" from the message looks fancier.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
85–95

The usual style in LLVM for this sort of stuff is

if (const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr)) {
  // Do stuff with DRE.
}

CondVarExpr definitely cannot successfully cast to both DeclRefExpr and BinaryOperator at the same time, so this sort of structure is much easier to understand. Also checks after && are definitely redundant.

115

Hmm, why are you even able to access it here? The get<> key should only be visible in a single translation unit in which it is defined, otherwise it's buggy because it relies on addresses of static globals to work.

Actually, could you separate the work of displaying the exact range boundary (this whole RangeSet business) into a separate diff? I think it's a separate change that requires separate discussion.

Also instead of accessing the constraint manager's private trait directly, you should use the getRange() method. It adds all the necessary implicit assumptions that aren't represented explicitly as items in the map.

126

Why int?

That's not what getExtValue() returns (so you can accidentally print a truncated value), and generally the size of the int type on the host machine shouldn't affect analysis results.

I think you should print the APSInt directly instead of converting it into a raw integer.

130–132

Even if the Store is currently null, every live variable in the program inevitably has a value, so i don't think we should bail out here.

139–141

There are cornercases where the range consists of exactly one point, so the value is technically concrete but wasn't yet collapsed into a concrete value by the engine.

In any case, as far as i understand, this code is there only for optimization, and i don't think it's actually making things faster because the lookup is already expensive.

144

auto shouldn't be used here and in a few other places according to the coding guidelines.

Essentially, it should only be used for cast results, where the type is already written out.

146

What do you mean by "inline"? There are no function calls going on here.

1839–1842

I'm worried that checker transitions can screw up the order of nodes here.

Every time a checker does addTransition() or generateNonFatalErrorNode(), it injects an intermediate node, and there can be indefinitely many of those between the nodes you're actually looking for.

You should probably rely on program points instead, i.e. "this is the node in which the terminator condition was evaluated", "this is the node in which we jumped from one CFG block to another".

2173

Did you mean ->end()->To()? We need a test in which the RangeSet contains more than one range.

test/Analysis/diagnostics/macros.cpp
33 ↗(On Diff #171143)

Yup, the generic message should be perfectly fine, but i also don't see any problems with the non-generic message, as it is "technically the truth".

test/Analysis/diagnostics/undef-value-param.m
56 ↗(On Diff #174750)

That's suspicious, could you have a look at when exactly does this range appear? There should be an Assuming... piece here. I suspect that you might be looking at the state after the split, which already has the range, and not looking at the state before the split, which doesn't have the range.

test/Analysis/new-ctor-malloc.cpp
11 ↗(On Diff #174750)

Yes, malloc is nullable. There should be an Assuming... piece here. It's probably the same problem that we had with anotherCreateRef().

test/Analysis/uninit-vals.m
167–168

Aha, i see.

testObj->size evaluates to pure UnknownVal because region store cannot deal with two overlapping lazy bindings (which is the whole point of the test).

Therefore no new constraints are added. So we cannot figure out what happens by looking at the constraints.

So it's roughly the same thing that happened in the example with floats.

We should just outright always add an "Assuming..." piece when the condition evaluates to Unknown. The new behavior is correct here.

324

testObj->size is an UnknownVal at this point (see the parallel comment on the older revision's thread). So there should be an Assuming... piece here. Ultimately there shouldn't, but throwing an Assuming... piece is the correct behavior for the sub-system that decides whether an Assuming... piece should be emitted.

test/Analysis/virtualcall.cpp
172

These are the notes for X x1(1); in main. In this case i is a concrete value: exactly 1 on the first iteration and exactly 0 on the second iteration.

The new behavior is incorrect; there should not be an Assuming... piece here.

Charusso updated this revision to Diff 184329.Jan 30 2019, 11:15 AM
Charusso marked 19 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Better way to measure if the value is known.
  • Moved extra Assuming... pieces into a different patch (D57410).
  • New generic assumptions created on known MemberExprs and entire conditions.
  • Side effect: ninja check-clang-analyzer-z3 passes without the error mentioned in D54811. ("Like magic and unicorns?")

TODO list for upcoming patches:

  • Removed bounds checking with RangeSets.
  • Removed bool value support.
  • Removed MemberExpr value support.

Thanks you @NoQ! Luckily it is rely on ProgramPoints now. The only problem as I see the mentioned Z3-test, where I have no idea what happened.

NoQ added inline comments.Jan 31 2019, 7:30 PM
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
166–173

I'll take a closer look at the rest of the patch, but this code snippet looks suspicious to me at a glance. Does it ever add anything on top of the "hard-coded" case? When it does, is it correct?

I mean, the "hard-coded" case does not actually correspond to an integer literal expression; it corresponds to any circumstances in which the Static Analyzer could compute that the value is going to be exactly that concrete integer on this execution path. If the Static Analyzer could not compute that, i doubt that this simple procedure will do better.

It might be that this is needed because you're looking at a wrong ExplodedNode on which the condition expression is either not computed yet or already dropped. In this case i'd prefer to try harder to find the right node, because getSVal(CondVarExpr, LCtx) is just so much more powerful.

Charusso updated this revision to Diff 185543.Feb 6 2019, 6:44 AM
  • Rebased.
  • Removed DeclRefExpr part from getConcreteIntegerValue().
  • Added constant integer early-return.
Charusso updated this revision to Diff 185544.Feb 6 2019, 6:45 AM
  • Added back DeclRefExpr part to getConcreteIntegerValue() for proof by example.
Charusso marked an inline comment as done.Feb 6 2019, 6:45 AM
Charusso updated this revision to Diff 186659.Feb 13 2019, 7:35 AM
  • Generic known condition evaluations.
  • check-clang-analyzer-z3 passes as expected:
Failing Tests (1):
    Clang :: Analysis/plist-macros.cpp

  Expected Passes    : 565
  Expected Failures  : 2
  Unsupported Tests  : 3
  Unexpected Failures: 1
NoQ added a comment.Feb 19 2019, 7:01 PM

I'll take a closer look in a few days, sorry for the delays! The progress you're making is fantastic and i really appreciate your work, but i have an urgent piece of work of my own to deal with this week.

In D53076#1403436, @NoQ wrote:

I'll take a closer look in a few days, sorry for the delays! The progress you're making is fantastic and i really appreciate your work, but i have an urgent piece of work of my own to deal with this week.

@NoQ, please take your time, that new bug-report logic is the end of my work on reporting business (as I mentioned I wanted to create the same patch), so that is more important than this.

Charusso updated this revision to Diff 189921.Mar 8 2019, 1:52 PM
Charusso retitled this revision from [analyzer] Enhance ConditionBRVisitor to write out more information to [analyzer] ConditionBRVisitor: enhance to write out more information.
  • Rebased.
NoQ added a comment.Mar 25 2019, 2:19 PM

Btw, did you try running this patch against big projects? 'Cause it's a high-impact change and we'd have to be careful with it.

I ran it on LLVM real quick and noticed that 97.3% (!) of reports contained at least one of the new notes, with number of HTML path pieces increased by 23.7% on average (note that the impact on plist reports would be much higher because they don't have any text for gray pieces).

I'll try to get back with some actual hand-picked examples to demonstrate my impressions. My overall feel was that it increases quality of life quite a bit by reducing the amount of time spent scrolling and jumping around the report - the necessary information becomes available exactly where you need it and it's wonderful. At the same time, i noticed a few corner-cases where the new pieces were duplicating the existing information or otherwise not helpful:

This doesn't look like a fundamental problem to me; it should be possible to improve upon these corner-cases by varying note messages; probably it also makes sense to skip some of them when they're clearly saying the same info 10 times in a row, but i'm not sure it's easy to decide where to apply that.

Because it might be a large amount of work, i'm thinking about committing this new feature off-by-default first (eg., -analyzer-config knowing-notes=[false|true]) and then doing follow-up commits (you already have quite a bit!) to make sure it's polished when we turn it on for everyone, WDYT?

include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
169–173

Does this absolutely need to be a member variable? Maybe pass it down the stack, like tookTrue? I understand that there are already a lot of flags passed down, but global mutable state is still more dangerous and can easily get out of hand.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
187–190

I don't see this triggered on tests. It looks to me that this function is for now only applied to DeclRefExprs that are always glvalues and therefore should never be evaluated to a nonloc value.

1991–1992

I advocate for more parentheses :)

NoQ added a comment.Mar 28 2019, 5:32 PM

Instead of having those as events similar to "Assuming", we could turn the new "Knowing" pieces into floating pop-ups - imagine you hover the mouse over a condition foo() and it says "foo() is false". That is, instead of PathDiagnosticsEventPiece, a new kind of piece can be introduced that is shown in scan-build as a pop-up similar to macro pop-ups. Like this:

We already have infrastructure for hover pop-up hints in HTML reports - we use it to display macro expansions, so it could be something similar, just of a different color.

Charusso updated this revision to Diff 195071.Apr 14 2019, 9:02 AM
Charusso marked 4 inline comments as done.
Charusso retitled this revision from [analyzer] ConditionBRVisitor: enhance to write out more information to [analyzer] ConditionBRVisitor: Enhance to write out more information.

Address comments:

  • Removed member variable.
  • Introducing PathDiagnosticPopUpPiece(D60670)

Extras:

  • Removed generic known condition evaluation printing
  • So that we could not hook integers, removed that early-return code as well.
Charusso added a comment.EditedApr 14 2019, 9:24 AM

Thanks you @NoQ for the great idea!

Without the duplicated BugReporter.cpp reports it could be an on-by-default patch, see:

Before:

After:

It is not perfect, but I like that and I have already got ideas how to use these new pieces.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
187–190

Good catch!

NoQ accepted this revision.Apr 16 2019, 4:30 PM

Yay!! I'm happy with this change and redirected all my wrath on its dependencies.

Charusso edited the summary of this revision. (Show Details)May 29 2019, 12:42 PM
Charusso set the repository for this revision to rC Clang.
This revision was not accepted when it landed; it landed in state Needs Review.May 29 2019, 1:04 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2019, 1:04 PM