Page MenuHomePhabricator

[Analyzer] SValBuilder Comparison Rearrangement (with Restrictions and Analyzer Option)
ClosedPublic

Authored by baloghadamsoftware on Jan 11 2018, 1:44 AM.

Details

Summary

This patch is a "light" version of D35109:

Since the range-based constraint manager (default) is weak in handling comparisons where symbols are on both sides it is wise to rearrange them to have symbols only on the left side. Thus e.g. A + n >= B + m becomes A - B >= m - n which enables the constraint manager to store a range m - n .. MAX_VALUE for the symbolic expression A - B. This can be used later to check whether e.g. A + k == B + l can be true, which is also rearranged to A - B == l - k so the constraint manager can check whether l - k is in the range (thus greater than or equal to m - n).

The restriction in this version is the the rearrangement happens only if both the symbols and the concrete integers are within the range [min/4 .. max/4] where min and max are the minimal and maximal values of their type.

The rearrangement is not enabled by default. It has to be enabled by using -analyzer-config aggressive-relational-comparison-simplification=true.

Co-author of this patch is Artem Dergachev (NoQ).

Diff Detail

Repository
rL LLVM

Event Timeline

baloghadamsoftware edited the summary of this revision. (Show Details)Jan 11 2018, 6:15 AM
NoQ added a comment.Jan 17 2018, 10:16 AM

I think it'd be fine to do the rearrangement for additive ops without the option check, as we discussed. I.e., rearrange when it's either an additive op, or both the flag is set and the values are overflow-clamped. And remove the clamp checks (which are presumably heavy) for the additive ops, as they're not needed.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
591–596 ↗(On Diff #129406)

Fixed some typos:

/// Returns true if SValBuilder should rearrange comparisons of symbolic
/// expressions which consist of a sum of a symbol and a concrete integer
/// into the format where symbols are on the left-hand side and the integer is on the
/// right. This is only done if both concrete integers are greater than or equal to
/// the quarter of the minimum value of the type and less than or equal to the
/// quarter of the maximum value of that type.

(line breaks might need fixing)

test/Analysis/svalbuilder-rearrange-comparisons.c
5 ↗(On Diff #129406)

I guess you're not going to need the printState thing once these tests are settled down.

Updated according to the comments.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
597 ↗(On Diff #130403)

High level comment: can you list all rewriting rules you are applying in a formula explicitly in a comment?
From the phabricator discussion I see that you are applying

A + n >= B + m -> A - B >= n + m // A, B symbolic, n and m are integers

but from the code it seems that you are applying more canonicalization rules?

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
317 ↗(On Diff #130403)

Can we have a better function name here? I can't think of one straight away, but "clamped" is not very self-descriptive. isWithinConstantOverflowBounds?

326 ↗(On Diff #130403)

Would just division produce the same result? Also probably it's better to make "4" a constant, at least with #define

330 ↗(On Diff #130403)

6 lines of branching is probably better expressed as

if (!isa<DefinedSVal>(IsCappedFromAbove) || State->assume(*dyn_cast<DefinedSVal>(IsCappedFromAbove), false))
   return false
337 ↗(On Diff #130403)

326-335 duplicates 338-346.
Perhaps we could have

static bool isCappedFrom(bool Above, llvm::APSInt bound, ...)

?

352 ↗(On Diff #130403)

Again, could we do better with a function name?

357 ↗(On Diff #130403)

I think you want

return (I <= Max) && (I >= -Max)

instead of 358-365

376 ↗(On Diff #130403)

Is it beneficial to do this though? At the point where decomposeSymbol is called we are still checking whether the rearrangement could be performed, so maybe just returning a false flag would be better?

403 ↗(On Diff #130403)

I think this is a separate rewrite, which is better performed in a separate function.
If you move it to decomposeSymbol or to a separate function, you would get a lot of simplifications:

  1. This function would be performing only a single rewrite.
  2. You would no longer need to take Lop and Rop as parameters
  3. At that point, two separate cases would be clearly seen: Op is a comparison operator, or Op is an additive operator.

I would separate those two rewrites in separate functions, distinguishing between them at the callsite.

429 ↗(On Diff #130403)

It seems that having a concrete positive RHS is a part of your rewrite rule. In that case, that should be documented in a high-level overview of the rewrite rules.

432 ↗(On Diff #130403)

I think this could be shortened and made more explicit by constructing the LHS and RHS first, and then reversing both and the comparison operator if RHS is negative.

472 ↗(On Diff #130403)

I am confused why the option shouldAggressivelySimplifyRelationalComparison is checked here. Do we rewrite cases where Op is additive even if the option is not checked? It's generally better to check the flag outside of the rearranging function, so that the high-level logic can be seen.

489 ↗(On Diff #130403)

This check is weird: you have just set SingleTy to ResultTy at line 477.
Instead of this whole if-block (from line 487), why not just set SingleTy above line 473. where you have already checked that the operator is performing comparison?

494 ↗(On Diff #130403)

I would also move this assert up, below line 481.

497 ↗(On Diff #130403)

this check should be inside the block at line 477, since it does not make sense for relational comparisons.

508 ↗(On Diff #130403)

Why isComparisonOp() check is done in 512-516, but not here? Additionally, can this check be factored out into a function?

774 ↗(On Diff #130403)

I would expect that checking the analyzer option would be performed here?

test/Analysis/svalbuilder-rearrange-comparisons.c
19 ↗(On Diff #130403)

assert would express intent better

28 ↗(On Diff #130403)

Can we also have integration-like tests where rearrangement would affect observable behavior?

Also, clang_analyzer_dump is a debugging function, and I am not sure whether we should rely on its output in tests.

Thank you for your comments. Since the original author of this particular code is Artem, I think he will answer your questions.

NoQ added a comment.Jan 24 2018, 10:41 AM

George's style comments are usually super spot-on. Please feel free to improve my code. Also it was only written as a proof-of-concept because i failed to explain my approach with natural language, so it definitely needs polishing. I'd let you know when i disagree with anything :)

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
472 ↗(On Diff #130403)

Do we rewrite cases where Op is additive even if the option is not checked?

Yes. Because for additive ops the presumably-expensive clamped-by-max/4 check is not necessary (and is indeed skipped below).

This should be documented though.

test/Analysis/svalbuilder-rearrange-comparisons.c
19 ↗(On Diff #130403)

Yeah, but it's also a bit more painful to write in tests (i.e. __builtin_assert() and macros).

28 ↗(On Diff #130403)

Yeah, this test would be painful to maintain if we change the dump output. I guess we definitely need more integration tests.

I still wish there were ways to write "this" sort of tests without relying on dumps. Eg.:

clang_analyzer_denote(x, "$x");
clang_analyzer_denote(y, "$y");
clang_analyzer_express(x == y); // expected-warning{{($x - $y) == 0}}

It should be easy to make an SValVisitor that does that.

Given how painful this would be to actually update these numerous tests, i'm fine with leaving a comment explaining the above, as a bit of technical debt - for anyone who would need to update them in the future.

We still need more integration tests though.

100 ↗(On Diff #130403)

This one, for instance, is essentially an integration test. You can easily replace the last dump with clang_analyzer_eval(x == y) and the test would still test the same thing. All tests here that expect a concrete integer in the last check should probably be just converted to clang_analyzer_eval tests.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
330 ↗(On Diff #130403)

SVal is not a pointer, so isa<>() and dyn_cast<>() does not work here.

376 ↗(On Diff #130403)

Failing to decompose a symbol does not mean the rearrangement could not be performed. If we have just A on the left side instead of A+m or A-m, then we regard it as A+0.

432 ↗(On Diff #130403)

Are you sure it would be shorter? Anyway, how to reverse a SymSymExpr?

Updated according to the comments.

baloghadamsoftware marked 12 inline comments as done.Mar 19 2018, 9:44 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
326 ↗(On Diff #130403)

I changed it to division, but I am not sure if we a constant would be more readable here than 4.

337 ↗(On Diff #130403)

isInRelation(), because the opcode itself is passed instead of a bool.

NoQ accepted this revision.Mar 27 2018, 4:08 PM

This looks good with a super tiny nit regarding comments about signed integers. George, are you happy with the changes? (:

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
675–677 ↗(On Diff #138954)

I believe that we should mention that the integers are signed.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
330 ↗(On Diff #130403)

.getAs<>() would have been similar, but i'm not fond of double checks either.

429 ↗(On Diff #130403)

That's an invariant that we're maintaining in other places as well.

774 ↗(On Diff #130403)

Nope, not yet. The flag is only for range-checked rearrangements.

This revision is now accepted and ready to land.Mar 27 2018, 4:08 PM
baloghadamsoftware marked an inline comment as done.

Comment fixed.

baloghadamsoftware marked an inline comment as done.Mar 28 2018, 3:47 AM
baloghadamsoftware added inline comments.
include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
675–677 ↗(On Diff #138954)

Also the symbols were forgotten. They must be in the range as well and must also be signed integers.

baloghadamsoftware marked an inline comment as done.Apr 9 2018, 1:32 AM

George or Devin, please accept it or give me some feedback if not. Since this patch affects the core infrastructure I think it is wise to merge it only if at least two of you have accepted it. Artem is one, I need a second one as well.

george.karpenkov accepted this revision.Apr 9 2018, 10:51 AM

LGTM. Sorry for the delay, I thought that a single acceptance was sufficient.

This revision was automatically updated to reflect the committed changes.
NoQ added a comment.Apr 11 2018, 3:32 PM

We've found a crash on our internal buildbot, would you like to have a look?:

$ cat repro.c

int foo(int x, int y) {
  short a = x - 1U;
  return a - y;
}

$ clang -cc1 -analyze -analyzer-checker=core repro.c

Assertion failed: (APSIntType(LInt) == BV.getAPSIntType(SymTy) && "Integers are not of the same type as symbols!"), function doRearrangeUnchecked, file /Users/adergachev/llvm/tools/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp, line 383.

1U can be replaced with 1L or 1UL.

NoQ added a comment.Jul 18 2018, 3:57 PM

There are still performance regressions coming in, and this time it doesn't look like it's my fault: https://bugs.llvm.org/show_bug.cgi?id=38208

I suspect that this might be because we aren't enforcing complexity thresholds over all the symbols we're constructing, but that's not certain, we need to get to the bottom of it some day.

I suggest reverting the patch or putting it behind an off-by-default flag until we debug these cases.

In D41938#1167313, @NoQ wrote:

There are still performance regressions coming in, and this time it doesn't look like it's my fault: https://bugs.llvm.org/show_bug.cgi?id=38208

I suspect that this might be because we aren't enforcing complexity thresholds over all the symbols we're constructing, but that's not certain, we need to get to the bottom of it some day.

I suggest reverting the patch or putting it behind an off-by-default flag until we debug these cases.

The flag is off by default. Except the rearrangement of additive operations. Should we put it also behind the flag?

The flag is off by default. Except the rearrangement of additive operations. Should we put it also behind the flag?

I did it as a temporary quick fix: D49536.