This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Eliminate analyzer limitations on symbolic constraint generation
ClosedPublic

Authored by ddcc on Jan 20 2017, 8:09 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ddcc created this revision.Jan 20 2017, 8:09 AM
ddcc updated this revision to Diff 85139.Jan 20 2017, 8:09 AM

Fix rebase

ddcc updated this revision to Diff 85314.Jan 22 2017, 8:24 PM

Rebase

xazax.hun edited edge metadata.May 9 2017, 2:56 AM

Do you have a benchmark how this affects the performance and memory usage when the old constraint manager is used? I wonder if most of people are using the old one, it might make no sense to generate symbolic expressions that can not be solved anyway.
Maybe the analyzer could only generate these symbolic expressions when Z3 is used?

ddcc added a comment.EditedMay 10 2017, 3:26 PM

It's been a while since I looked at the code, but I don't believe that all of the new constraints are necessarily unsupported by the current range constraint manager. Rather, they were just not being generated by the SimpleSValBuilder. The changes pass the testsuite for both the Range and Z3 constraint managers, without any special testcase handling that is Z3 specific.

Here are some runtime performance numbers on the testsuite:

Pre-Commit
RangeConstraintManager: 8.24s
Z3ConstraintManager: 247.29s

Post-Commit
RangeConstraintManager: 10.23s
Z3ConstraintManager: 250.35s

And for memory usage (max RSS):

Pre-Commit
edges-new.mm (RangeConstraintManager): 42688k
edges-new.mm (Z3ConstraintManager): 52916k
malloc-plist.c (RangeConstraintManager): 39812k
malloc-plist.c (Z3ConstraintManager): 50544k

Post-Commit
edges-new.mm (RangeConstraintManager): 43032k
edges-new.mm (Z3ConstraintManager): 52956k
malloc-plist.c (RangeConstraintManager): 40204k
malloc-plist.c (Z3ConstraintManager): 50364k

Edit: The Z3ConstraintManager runtime numbers are inflated, because I forgot that llvm-lit now runs both constraint managers with the new test setup. So the actual time for Z3 needs to have the RangeConstraintManager time subtracted out.

ddcc updated this revision to Diff 98545.May 10 2017, 3:34 PM

Rebase

ddcc updated this revision to Diff 99392.May 17 2017, 10:10 PM

Address SVal simplification from D31886

ddcc added a comment.EditedMay 17 2017, 10:18 PM

I've updated this revision to account for the recent SVal simplification commit by @NoQ, but now there is an exponential blowup problem that prevents testcase PR24184.cpp from terminating, due to an interaction between Simplifier::VisitNonLocSymbolVal() and SValBuilder::makeSymExprValNN() when expanding the loop in case 3. I'm not quite sure what the best way to resolve this is; from some blind testing, I ended up needing to set MaxComp to 10 to force termination in a reasonable amount of time, but this restricts its usefulness for generating other symbolic constraints.

ddcc updated this revision to Diff 99521.May 18 2017, 10:36 PM

Fix typo in SymbolCast simplification

zaks.anna added inline comments.Jun 14 2017, 4:09 PM
lib/StaticAnalyzer/Core/SValBuilder.cpp
356 ↗(On Diff #99521)

I am concerned that removing the guard will regress performance in the vanilla case. (Note that Z3 support as well as taint are not on by default.)

I am curious how much of the regression you've measured could be gained back if we make this conditional.

363 ↗(On Diff #99521)

Reducing the MaxComp is going to regress taint analysis..

I've updated this revision to account for the recent SVal simplification commit by @NoQ,

Which commit introduced the regression?

but now there is an exponential blowup problem that prevents testcase PR24184.cpp from terminating,

What triggers the regression? Removing the if statement above? Does the regression only effect the Z3 "mode" (I am guessing not since you said "due to an interaction between Simplifier::VisitNonLocSymbolVal() and SValBuilder::makeSymExprValNN()")?

ddcc added inline comments.Jun 15 2017, 5:09 PM
lib/StaticAnalyzer/Core/SValBuilder.cpp
356 ↗(On Diff #99521)

To clarify, the changes made in this patch aren't specific to Z3 support, especially simplifying SymbolCast and IntSymExpr. With the exception of PR24184.cpp and plist-macros.cpp, all testcases pass with both the default and Z3 constraint managers. However, creating additional constraints does have performance overhead, and it may be useful to consider the parameters for gating this functionality.

On a combined execution (Range + Z3) through the testcases, except the two mentioned above, the runtime is 327 sec with this patch applied, and 195 sec without this patch applied. On a separate execution through the testcases with only the Z3 constraint manager, I get runtimes 320 and 191, respectively.

For testing purposes, I also tried the following code, which has combined runtime 311 sec, but loses the accuracy improvements with the Range constraint manager on bitwise-ops.c, conditional-path-notes.c, explain-svals.cpp, and std-c-library-functions.c.

ConstraintManager &CM = getStateManager().getConstraintManager();
if (!State->isTainted(RHS) && !State->isTainted(LHS) && !CM.isZ3())
363 ↗(On Diff #99521)

Reducing the MaxComp is going to regress taint analysis..

I think the original intention was to increase MaxComp, not decrease it, but I will restore the original value here.

What triggers the regression? Removing the if statement above? Does the regression only effect the Z3 "mode"

No, the regression isn't specifically due to this code, but with @NoQ 's patch for SVal simplification (rL300178), and this commit extending it to handle SymbolCast and IntSymExpr, the cast of ST * used in the loop of case 3 of PR24184.cpp becomes "simplified" (inlined) repeatedly on each recursive iteration through Simplifier::VisitNonLocSymbolVal() -> SValBuilder::makeSymExprValNN(), causing a big slowdown in runtime performance.

The naive way to prevent it is to drop MaxComp (but this isn't reasonable, since it needs to be absurdly low, e.g. 10). Alternatively, simplification for SymbolCast can be dropped from this commit (but it will eliminate some of the other analysis improvements), or, most likely, introduce another parameter to reduce recursion between these two functions.

zaks.anna added inline comments.Jun 17 2017, 12:04 AM
lib/StaticAnalyzer/Core/SValBuilder.cpp
356 ↗(On Diff #99521)

Thanks for the explanation!

Regressing the current default behavior is my main concern. By looking at the numbers you provided before (Pre-commit and Post-Commit for RangeConstraintManager), it seems that this patch will introduce a performance regression of about 20%, which is large. But you are pointing out that there are improvements to the modeling as well and those are captured by the updated tests.

Here are a couple of ideas that came into mind on gaining back performance for the RangeConstraintManager case:

  • Can we drop computing these for some expressions that we know the RangeConstraintManager will not utilize?
  • We could implement the TODO described below and possibly also lower the MaxComp value. This means that instead of keeping a complex expression and constraints on every symbol used in that expression, we would conjure a new symbol and associate a new constrain derived from the expression with it. (Would this strategy also work for the Z3 case?)

I have to point out that this code is currently only used by taint analysis, which is experimental and the current MaxComp value is targeted at that. We would probably need to refine the strategy here if we want to apply this logic to a general case. It's possible that different MaxComp should be used for different cases.

It would be valuable to run this on real code and measure how the number of reports we get differs depending on these values.

NoQ edited edge metadata.Jun 17 2017, 2:33 AM

Hmm, curious, having a look. A couple of blind guesses before i actually understand what's going on:

(1) The simplifySVal() code has its own complexity threshold:

1060     SVal VisitNonLocSymbolVal(nonloc::SymbolVal V) {
1061       // Simplification is much more costly than computing complexity.
1062       // For high complexity, it may be not worth it.
1063       if (V.getSymbol()->computeComplexity() > 100)
1064         return V;
1065       return Visit(V.getSymbol());
1066     }

Would it help to decrease 100 to 10 here?

(2) With RangeConstraintManager, simplification is not entirely idempotent: we may simplify a symbol further when one of its sub-symbols gets constrained to a constant in a new state. However, apart from that, we might be able to avoid re-simplifying the same symbol by caching results based on the (symbol, state's constraint data) pair. Probably it may work with Z3 as well.

ddcc added a comment.Jun 20 2017, 11:08 AM

Can we drop computing these for some expressions that we know the RangeConstraintManager will not utilize?

It's possible, though I'm not sure what the actual limitations of the RangeConstraintManager are, since there are a lot of intermediate steps that attempt to transform unsupported expressions into supported ones.

We could implement the TODO described below and possibly also lower the MaxComp value. This means that instead of keeping a complex expression and constraints on every symbol used in that expression, we would conjure a new symbol and associate a new constrain derived from the expression with it. (Would this strategy also work for the Z3 case?)

I think it's feasible, but would probably require some more to code to handle SymbolConjured (right now all SymbolData are treated as variables).

Would it help to decrease 100 to 10 here?

Yes, changing it to 10 eliminates the excessive recursion; combined runtime drops to 282 sec on the testcases (~8 sec for Range only, ~271 sec for Z3 only; doesn't sum due to separate executions). This appears to be the most straightforward solution.

(2) With RangeConstraintManager, simplification is not entirely idempotent: we may simplify a symbol further when one of its sub-symbols gets constrained to a constant in a new state. However, apart from that, we might be able to avoid re-simplifying the same symbol by caching results based on the (symbol, state's constraint data) pair. Probably it may work with Z3 as well.

Yeah, that would also fix this issue. In general, I think there's plenty of room for improvements from caching, especially for queries to e.g. getSymVal().

ddcc updated this revision to Diff 103239.Jun 20 2017, 11:09 AM

Rebase, decrease simplification complexity

ddcc added a comment.EditedJun 20 2017, 11:12 AM

I forgot to mention that the only remaining (Z3) test failure is on plist-macros.cpp; there is a Assuming condition is true path note that only appears with the RangeConstraintManager but not with Z3ConstraintManager, and I can't #ifdef it because the annotations are checked by FileCheck.

NoQ added a comment.Jul 6 2017, 4:23 AM

Because this patch affects the default behavior, i think it's necessary to understand the performance impact on a large codebase. I may lend a hand eventually, but if you're in a hurry you could probably at least run an overnight analysis over llvm and sqlite with range constraint manager and see how performance changes with this patch, and ideally also if we get new reports or lose old reports.

In D28953#785679, @ddcc wrote:

I forgot to mention that the only remaining (Z3) test failure is on plist-macros.cpp; there is a Assuming condition is true path note that only appears with the RangeConstraintManager but not with Z3ConstraintManager, and I can't #ifdef it because the annotations are checked by FileCheck.

This test case could be separated into a different file if nothing else helps.

lib/StaticAnalyzer/Core/SValBuilder.cpp
363 ↗(On Diff #99521)

We talked a bit and we're pretty much fine with reduce MaxComp to 10 if it's worth it in terms of performance.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
997–1003 ↗(On Diff #103239)

These two ifs are completely identical (apart from the fact that the one you added ignores symbolic region values due to explicit NonLoc cast). In fact getAsSymbol() (with its optional argument equal to false) and getAsSymExpr() are equivalent; we need to refactor this API.

I believe that the FIXME should in fact be addressed in RangeConstraintManager, in which currently getSymVal() works for atomic symbols only. Or, alternatively, we should stick simplifySVal() here - i think i'd have a look at this soon.

In any case, i suggest removing this change for now, unless i missed anything and it does in fact have an effect.

test/Analysis/plist-macros.cpp
2 ↗(On Diff #103239)

path-diagnostics-alternate=ture

Ouch. Thanks for fixing this.

Maybe it'd be great to implement an option that enables validating analyzer config option values, and turn this option on in %clang_analyze_cc1.

ddcc updated this revision to Diff 105913.Jul 10 2017, 1:31 PM
ddcc marked an inline comment as done.

Drop duplicate code

ddcc added a comment.Jul 10 2017, 1:31 PM

I tested the following software, both before and after applying this patch, using RangeConstraintManager.

Software, Version, Compile Time (before), Bugs Reported (before), Compile Time (after), Bugs Reported (after)
openssl, 1.1.0f, 11 min, 126, 12 min, 126
sqlite, 3.18.2, 31 min, 86, 31 min, 82
wireshark, 2.2.7, 105 min, 109, 119 min, 108

For sqlite, the differences of four reported bugs all appear to be false positives, but for wireshark, the one difference appears to be a false negative.

lib/StaticAnalyzer/Core/SValBuilder.cpp
363 ↗(On Diff #99521)

Just to clarify, the current change from 100 to 10 in simplifySVal::SimplyVisitNonLocSymbolVal resolves the problem; the value of MaxComp here is unchanged.

test/Analysis/plist-macros.cpp
2 ↗(On Diff #103239)

I agree that'd be useful, but I think that it should be a separate future patch.

ddcc added a comment.Jul 10 2017, 1:45 PM

I've also uploaded the results to https://dcddcc.com/csa

NoQ accepted this revision.Jul 11 2017, 1:36 AM

This looks great, thank you very much.

test/Analysis/plist-macros.cpp
2 ↗(On Diff #103239)

Sure sure.

This revision is now accepted and ready to land.Jul 11 2017, 1:36 AM
ddcc updated this revision to Diff 106084.Jul 11 2017, 12:46 PM

Split plist-macros.cpp, and update analyzer_test.py to support tests that require not z3

This revision was automatically updated to reflect the committed changes.

r307833 is causing the sanitizer-x86_64-linux-fast buildbot to fail during clang regression tests with the following error:

clang: /mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp:55: virtual clang::ento::ProgramStateRef clang::ento::RangedConstraintManager::assumeSym(clang::ento::ProgramStateRef, clang::ento::SymbolRef, bool): Assertion `Loc::isLocType(SSE->getLHS()->getType())' failed.

Could you please revert and/or fix this revision?

ddcc added a comment.Jul 12 2017, 2:48 PM
This comment was removed by ddcc.