Page MenuHomePhabricator

[analyzer] ConditionBRVisitor: Enhance to write out more information
ClosedPublic

Authored by Charusso on Oct 10 2018, 6:28 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Charusso added inline comments.Nov 20 2018, 6:09 AM
test/Analysis/diagnostics/macros.cpp
33 ↗(On Diff #171143)

A little bit misunderstandable, because we do not know anything about doubles. May the generic message fit better?

test/Analysis/diagnostics/undef-value-param.m
56 ↗(On Diff #174750)

I am not sure why but the range here [1, (2^64)-1]. There is would not be any number like 0?

test/Analysis/new-ctor-malloc.cpp
11 ↗(On Diff #174750)

The range information is [1, (2^64)-1] but as I saw malloc could return null. Who is lying here?

test/Analysis/uninit-vals.m
167–168

We gather information with clang-analyzer-eval functions so the conditions would be Knowing... pieces but the BugReporter.cpp does not know about these extra assumptations. Where to teach to do not Assuming..., but Knowing... these?

test/Analysis/virtualcall.cpp
172

Because this X x(i - 1); we assume these conditions? This one is tricky to be Knowing....

In the meanwhile we managed to figure out where the problem lays, so if you're interested, I'm going to document it here.

The problem this patch attempts to solve is that ConditionBRVisitor adds event pieces to the bugpath when the state has changed from the previous state, not only when the ConstraintManager differs in between the changes [1]. The patch attempts to solve this by inspecting the actual condition, and tries to find out whether the symbols inside that condition has changed in between the states, but this is overkill solution, as comparing the two ConstraintManager objects would achieve the same thing. This should probably be executed by extending the interface of ConstraintManager with a comparison method.

[1] https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp#L1808

In the meanwhile we managed to figure out where the problem lays, so if you're interested, I'm going to document it here.

The patch attempts to solve this by inspecting the actual condition, and tries to find out whether the symbols inside that condition has changed in between the states, but this is overkill solution, as comparing the two ConstraintManager objects would achieve the same thing.

I tried to investigate what is happening after we entered into a condition, which is working well with 'Knowing...' pieces, but with too much overhead. Because this patch is all about write out more information, rather than removing the GDM-checking I will move forward with 'Knowing...' pieces. The current questions are real for that project, just let me simplify the code and create that feature.

Charusso retitled this revision from [analyzer] Enhance ConditionBRVisitor to write out more information to [analyzer] WIP: Enhance ConditionBRVisitor to write out more information.Nov 21 2018, 1:35 PM
Charusso edited the summary of this revision. (Show Details)
Charusso updated this revision to Diff 175177.Nov 25 2018, 7:41 AM
Charusso marked 4 inline comments as done.
Charusso retitled this revision from [analyzer] WIP: Enhance ConditionBRVisitor to write out more information to [analyzer] Enhance ConditionBRVisitor to write out more information.
Charusso edited the summary of this revision. (Show Details)
  • Better approach to prevent duplicated report messages.

Side effects:

  • Because we rely on constraint range information we measure the bounds with equality operators so lost precision, but now the style of the reports is identical.
  • We create Assuming... pieces on unknown values where generic message has not shown up.

Sorry for the huge noise! It took me a while to see what is the problem.

I think the reason why the printed message was either along the lines of "this is 0" and "this is non-0", is that we don't necessarily know what constraint solver we're using, and adding more non-general code BugReporter is most probably a bad approach. How about we tinker with ContraintManager a little more, maybe add an explainValueOfVarDecl, that would return the value of a variable in a given ProgramState? We could implement it for RangeConstraintSolver (essentially move the code you already wrote there), and leave a TODO with a super generic implementation for Z3.

Although, I can't quite write an essay on top of my head about golden rules of constraint solving, so take my advice with a grain of salt.

include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
163

We only get to know what this field is for while reading the actual implementation, please add comments.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1819–1847

ConstraintRange, as far as I know, is the data RangedConstraintManager stores in the GDM. What if we're using a different constraint solver? I think it'd be better to take the more abstract approach, extend ConstraintManager's interface with an isEqual or operator== pure virtual method, and implement it within it's descendants.

Charusso updated this revision to Diff 175181.Nov 25 2018, 1:21 PM
Charusso marked 8 inline comments as done.
  • The final piece of that puzzle: apply extra Assuming... pieces without the generic message. (A little bit too many.)
  • Comment on IsAssuming.

@Szelethus thanks you for the comments!

I think the reason why the printed message was either along the lines of "this is 0" and "this is non-0", is that we don't necessarily know what constraint solver we're using, and adding more non-general code BugReporter is most probably a bad approach.

It was an unimplemented feature to write out known stuff. There is no constraint solving, it is rather constant value dumping.

How about we tinker with ContraintManager a little more, maybe add an explainValueOfVarDecl, that would return the value of a variable in a given ProgramState? We could implement it for RangeConstraintSolver (essentially move the code you already wrote there), and leave a TODO with a super generic implementation for Z3.

I agree with @george.karpenkov, we want to have the smallest scope.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1819–1847

Because we only operate in the ConditionBRVisitor it would be useless for now.

Charusso marked an inline comment as done.Nov 25 2018, 1:22 PM
gerazo added a subscriber: gerazo.Nov 26 2018, 7:20 AM

This looks more reasonable, thanks!
What about just dropping the Knowing prefix?
Just having arr is null, taking true branch seems considerably more readable.

george.karpenkov requested changes to this revision.Nov 26 2018, 1:25 PM
george.karpenkov added inline comments.
test/Analysis/uninit-vals.m
222

Same here: we should know that testObj->size == 0

324

That does not seem right: from calloc the analyzer should know that the testObj->size is actually zero.

test/Analysis/virtualcall.cpp
170

Could you describe what happens here?
Why the assuming notes weren't there before?

This revision now requires changes to proceed.Nov 26 2018, 1:25 PM
Charusso marked 2 inline comments as done.Nov 26 2018, 2:37 PM

@george.karpenkov thanks you for the comments!

What about just dropping the Knowing prefix?
Just having arr is null, taking true branch seems considerably more readable.

I wanted to create something identical to the current reports. If we are about to change readability, I would change that two first:

  • identical operator printing with <, <=, >..., so equal to is = and not equal to is !=.
  • emphasize the value information with quotes, so Assuming 'i' >= '2'.
test/Analysis/uninit-vals.m
324

That Assuming... piece is rely on the change between the last two diffs: I just print out more VarDecls in patternMatch(). I thought everything works fine, so I will check that problem if @NoQ will not be faster as he already found some problematic code piece in that test file.

test/Analysis/virtualcall.cpp
170

We do not have range information in the *newly seen Assuming... pieces. Because the analyzer has not learnt new information - as there is no information - we have not entered to VisitTrueTest() to report these. A good reference for that new behaviour is in test/Analysis/Inputs/expected-plists/cxx-for-range.cpp.plist.

*I have added these with https://reviews.llvm.org/D53076?id=175177

NoQ added a comment.Nov 29 2018, 6:44 PM

There still seem to be a couple of regressions with Assuming... pieces, but other than that, i believe that an awesome piece of progress has been made here. I really like how it looks now. I agree with George that dropping "Knowing" from the message looks fancier.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
85–95

The usual style in LLVM for this sort of stuff is

if (const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr)) {
  // Do stuff with DRE.
}

CondVarExpr definitely cannot successfully cast to both DeclRefExpr and BinaryOperator at the same time, so this sort of structure is much easier to understand. Also checks after && are definitely redundant.

115

Hmm, why are you even able to access it here? The get<> key should only be visible in a single translation unit in which it is defined, otherwise it's buggy because it relies on addresses of static globals to work.

Actually, could you separate the work of displaying the exact range boundary (this whole RangeSet business) into a separate diff? I think it's a separate change that requires separate discussion.

Also instead of accessing the constraint manager's private trait directly, you should use the getRange() method. It adds all the necessary implicit assumptions that aren't represented explicitly as items in the map.

126

Why int?

That's not what getExtValue() returns (so you can accidentally print a truncated value), and generally the size of the int type on the host machine shouldn't affect analysis results.

I think you should print the APSInt directly instead of converting it into a raw integer.

130–132

Even if the Store is currently null, every live variable in the program inevitably has a value, so i don't think we should bail out here.

139–141

There are cornercases where the range consists of exactly one point, so the value is technically concrete but wasn't yet collapsed into a concrete value by the engine.

In any case, as far as i understand, this code is there only for optimization, and i don't think it's actually making things faster because the lookup is already expensive.

144

auto shouldn't be used here and in a few other places according to the coding guidelines.

Essentially, it should only be used for cast results, where the type is already written out.

146

What do you mean by "inline"? There are no function calls going on here.

1812–1815

I'm worried that checker transitions can screw up the order of nodes here.

Every time a checker does addTransition() or generateNonFatalErrorNode(), it injects an intermediate node, and there can be indefinitely many of those between the nodes you're actually looking for.

You should probably rely on program points instead, i.e. "this is the node in which the terminator condition was evaluated", "this is the node in which we jumped from one CFG block to another".

2143

Did you mean ->end()->To()? We need a test in which the RangeSet contains more than one range.

test/Analysis/diagnostics/macros.cpp
33 ↗(On Diff #171143)

Yup, the generic message should be perfectly fine, but i also don't see any problems with the non-generic message, as it is "technically the truth".

test/Analysis/diagnostics/undef-value-param.m
56 ↗(On Diff #174750)

That's suspicious, could you have a look at when exactly does this range appear? There should be an Assuming... piece here. I suspect that you might be looking at the state after the split, which already has the range, and not looking at the state before the split, which doesn't have the range.

test/Analysis/new-ctor-malloc.cpp
11 ↗(On Diff #174750)

Yes, malloc is nullable. There should be an Assuming... piece here. It's probably the same problem that we had with anotherCreateRef().

test/Analysis/uninit-vals.m
167–168

Aha, i see.

testObj->size evaluates to pure UnknownVal because region store cannot deal with two overlapping lazy bindings (which is the whole point of the test).

Therefore no new constraints are added. So we cannot figure out what happens by looking at the constraints.

So it's roughly the same thing that happened in the example with floats.

We should just outright always add an "Assuming..." piece when the condition evaluates to Unknown. The new behavior is correct here.

324

testObj->size is an UnknownVal at this point (see the parallel comment on the older revision's thread). So there should be an Assuming... piece here. Ultimately there shouldn't, but throwing an Assuming... piece is the correct behavior for the sub-system that decides whether an Assuming... piece should be emitted.

test/Analysis/virtualcall.cpp
172

These are the notes for X x1(1); in main. In this case i is a concrete value: exactly 1 on the first iteration and exactly 0 on the second iteration.

The new behavior is incorrect; there should not be an Assuming... piece here.

Charusso updated this revision to Diff 184329.Jan 30 2019, 11:15 AM
Charusso marked 19 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Better way to measure if the value is known.
  • Moved extra Assuming... pieces into a different patch (D57410).
  • New generic assumptions created on known MemberExprs and entire conditions.
  • Side effect: ninja check-clang-analyzer-z3 passes without the error mentioned in D54811. ("Like magic and unicorns?")

TODO list for upcoming patches:

  • Removed bounds checking with RangeSets.
  • Removed bool value support.
  • Removed MemberExpr value support.

Thanks you @NoQ! Luckily it is rely on ProgramPoints now. The only problem as I see the mentioned Z3-test, where I have no idea what happened.

NoQ added inline comments.Jan 31 2019, 7:30 PM
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
166–173

I'll take a closer look at the rest of the patch, but this code snippet looks suspicious to me at a glance. Does it ever add anything on top of the "hard-coded" case? When it does, is it correct?

I mean, the "hard-coded" case does not actually correspond to an integer literal expression; it corresponds to any circumstances in which the Static Analyzer could compute that the value is going to be exactly that concrete integer on this execution path. If the Static Analyzer could not compute that, i doubt that this simple procedure will do better.

It might be that this is needed because you're looking at a wrong ExplodedNode on which the condition expression is either not computed yet or already dropped. In this case i'd prefer to try harder to find the right node, because getSVal(CondVarExpr, LCtx) is just so much more powerful.

Charusso updated this revision to Diff 185543.Feb 6 2019, 6:44 AM
  • Rebased.
  • Removed DeclRefExpr part from getConcreteIntegerValue().
  • Added constant integer early-return.
Charusso updated this revision to Diff 185544.Feb 6 2019, 6:45 AM
  • Added back DeclRefExpr part to getConcreteIntegerValue() for proof by example.
Charusso marked an inline comment as done.Feb 6 2019, 6:45 AM
Charusso updated this revision to Diff 186659.Feb 13 2019, 7:35 AM
  • Generic known condition evaluations.
  • check-clang-analyzer-z3 passes as expected:
Failing Tests (1):
    Clang :: Analysis/plist-macros.cpp

  Expected Passes    : 565
  Expected Failures  : 2
  Unsupported Tests  : 3
  Unexpected Failures: 1
NoQ added a comment.Feb 19 2019, 7:01 PM

I'll take a closer look in a few days, sorry for the delays! The progress you're making is fantastic and i really appreciate your work, but i have an urgent piece of work of my own to deal with this week.

In D53076#1403436, @NoQ wrote:

I'll take a closer look in a few days, sorry for the delays! The progress you're making is fantastic and i really appreciate your work, but i have an urgent piece of work of my own to deal with this week.

@NoQ, please take your time, that new bug-report logic is the end of my work on reporting business (as I mentioned I wanted to create the same patch), so that is more important than this.

Charusso updated this revision to Diff 189921.Mar 8 2019, 1:52 PM
Charusso retitled this revision from [analyzer] Enhance ConditionBRVisitor to write out more information to [analyzer] ConditionBRVisitor: enhance to write out more information.
  • Rebased.
NoQ added a comment.Mar 25 2019, 2:19 PM

Btw, did you try running this patch against big projects? 'Cause it's a high-impact change and we'd have to be careful with it.

I ran it on LLVM real quick and noticed that 97.3% (!) of reports contained at least one of the new notes, with number of HTML path pieces increased by 23.7% on average (note that the impact on plist reports would be much higher because they don't have any text for gray pieces).

I'll try to get back with some actual hand-picked examples to demonstrate my impressions. My overall feel was that it increases quality of life quite a bit by reducing the amount of time spent scrolling and jumping around the report - the necessary information becomes available exactly where you need it and it's wonderful. At the same time, i noticed a few corner-cases where the new pieces were duplicating the existing information or otherwise not helpful:

This doesn't look like a fundamental problem to me; it should be possible to improve upon these corner-cases by varying note messages; probably it also makes sense to skip some of them when they're clearly saying the same info 10 times in a row, but i'm not sure it's easy to decide where to apply that.

Because it might be a large amount of work, i'm thinking about committing this new feature off-by-default first (eg., -analyzer-config knowing-notes=[false|true]) and then doing follow-up commits (you already have quite a bit!) to make sure it's polished when we turn it on for everyone, WDYT?

include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
170–174

Does this absolutely need to be a member variable? Maybe pass it down the stack, like tookTrue? I understand that there are already a lot of flags passed down, but global mutable state is still more dangerous and can easily get out of hand.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
161–164

I don't see this triggered on tests. It looks to me that this function is for now only applied to DeclRefExprs that are always glvalues and therefore should never be evaluated to a nonloc value.

1962–1963

I advocate for more parentheses :)

NoQ added a comment.Mar 28 2019, 5:32 PM

Instead of having those as events similar to "Assuming", we could turn the new "Knowing" pieces into floating pop-ups - imagine you hover the mouse over a condition foo() and it says "foo() is false". That is, instead of PathDiagnosticsEventPiece, a new kind of piece can be introduced that is shown in scan-build as a pop-up similar to macro pop-ups. Like this:

We already have infrastructure for hover pop-up hints in HTML reports - we use it to display macro expansions, so it could be something similar, just of a different color.

Charusso updated this revision to Diff 195071.Apr 14 2019, 9:02 AM
Charusso marked 4 inline comments as done.
Charusso retitled this revision from [analyzer] ConditionBRVisitor: enhance to write out more information to [analyzer] ConditionBRVisitor: Enhance to write out more information.

Address comments:

  • Removed member variable.
  • Introducing PathDiagnosticPopUpPiece(D60670)

Extras:

  • Removed generic known condition evaluation printing
  • So that we could not hook integers, removed that early-return code as well.
Charusso added a comment.EditedApr 14 2019, 9:24 AM

Thanks you @NoQ for the great idea!

Without the duplicated BugReporter.cpp reports it could be an on-by-default patch, see:

Before:

After:

It is not perfect, but I like that and I have already got ideas how to use these new pieces.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
161–164

Good catch!

NoQ accepted this revision.Apr 16 2019, 4:30 PM

Yay!! I'm happy with this change and redirected all my wrath on its dependencies.

Charusso edited the summary of this revision. (Show Details)Wed, May 29, 12:42 PM
Charusso set the repository for this revision to rC Clang.
This revision was not accepted when it landed; it landed in state Needs Review.Wed, May 29, 1:04 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptWed, May 29, 1:04 PM