This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Basic support for multiplication and division in the constraint manager
Needs ReviewPublic

Authored by baloghadamsoftware on Jul 9 2018, 5:19 AM.

Details

Summary

Add support for multiplications and divisions of symbols when using comparison operators >, >=, < and <= in the range-based constraint manager.

Diff Detail

Event Timeline

All existing tests pass. I could not find the appropriate test file so I created a new one, but this is probably not the correct way.

@baloghadamsoftware @dkrupp @xazax.hun Interesting. What do you think about instead using Z3 cross-check functionality recently added, to solve this and all other similar problems instead?

george.karpenkov requested changes to this revision.Jul 9 2018, 10:57 AM

Let's discuss alternatives first.

This revision now requires changes to proceed.Jul 9 2018, 10:57 AM
NoQ added inline comments.Jul 9 2018, 3:36 PM
test/Analysis/constraint_manager_scale.c
78 ↗(On Diff #154578)

If int is 32-bit and x equal to 2³¹ - 1, we have x * 2 equal to -2 which is less than 8, while x is greater than 4.

The overall point is that writing this kind of code is *extremely* error-prone.
We are actually considering going in a different direction and doing a rollback for the previous rearrangement patches due to some issues.
Could you see whether Z3 visitor would meet your needs?

baloghadamsoftware added a comment.EditedJul 11 2018, 12:24 AM

@baloghadamsoftware @dkrupp @xazax.hun Interesting. What do you think about instead using Z3 cross-check functionality recently added, to solve this and all other similar problems instead?

As far as I know the purpose of the cross-check functionality is to execute Z3 only on actual bug report paths. This may remove false-positives (like the example above), but we surely cannot find new errors where multiplicative operations are used.

As we discussed with @dcoughlin at the Euro LLVM 2018 the community should consider a new constraint manager, which is in between the range-based one and the Z3 both feature and performance wise. However, this is something long term, until then we should improve the existing one as far as reasonable. I am sure that multiplicative operations can be solved similarly to the additive ones.

test/Analysis/constraint_manager_scale.c
78 ↗(On Diff #154578)

Thank you for pointing me out this! I uploaded this patch as WIP since I was sure I missed something because of modular arithmetic. I think this can be solved, even if not so easily. Maybe we should restrict the first version to division only?

The overall point is that writing this kind of code is *extremely* error-prone.

Every modifications on the core is extremely error-prone. That is why we must cross check it and only apply it if it is 100% mathematically correct. This patch is only WIP for now. Freezing the core in its current state helps us to not introduce new errors but also prevents improving the current functionality which is not yet sufficient for many of our customers. What I did is not hacking, multiplicative operations can be implemented a similar way to additive ones, they are just a bit more complex.

We are actually considering going in a different direction and doing a rollback for the previous rearrangement patches due to some issues.

What issues could it cause since it is guarded by an option? As for the rearrangement of the additive operations it was @NoQ's idea but I agree there. Actually I feel this rearrangement patch a bit of hacking, but I also agree with @NoQ that it may be useful for other checkers as well. That was the reason to put it into SValBuilder instead of the checker itself. As you may be aware of it I originally did the rearrangement inside the checker.

Could you see whether Z3 visitor would meet your needs?

We will look at it, but please read my previous answer.

This comment was removed by baloghadamsoftware.

What issues could it cause since it is guarded by an option?

I've meant the rearrangement. Actually, I would like to apologize for that point, I thought it was causing some issues, but it wasn't, we've just checked it yesterday.

As far as I know the purpose of the cross-check functionality is to execute Z3 only on actual bug report paths.

Yes.

This may remove false-positives (like the example above), but we surely cannot find new errors where multiplicative operations are used.

Do you have examples of those? In my understanding, (almost) all of the imprecisions in the solver lead purely to false positives.
The only example where you lose bugs is when you stop exploring due to reaching a (false) fatal error.

This may remove false-positives (like the example above), but we surely cannot find new errors where multiplicative operations are used.

Do you have examples of those? In my understanding, (almost) all of the imprecisions in the solver lead purely to false positives.
The only example where you lose bugs is when you stop exploring due to reaching a (false) fatal error.

This is not an example from the iterator checker (hopefully Adam will provide us with one example from there) but think of division by zero. The division by zero check will only warn if a value is constrained to be zero. The imprecision in the built in solver might result in failure to constrain a value to zero while the Z3 might be able to do that.

The imprecision in the built in solver might result in failure to constrain a value to zero while the Z3 might be able to do that.

Ah right, that's a good point, we only throw a null-pointer deref/division by zero when the pointer/value is definitely zero.

Do you have *motivating* examples though, like an actual code samples where you got a bug and the analyzer does not warn?
In my experience, we almost never warn about division by zero anyway,
and the null pointer dereference check is too noisy if anything.

MTC added a subscriber: MTC.Jul 11 2018, 7:21 PM
NoQ added a comment.Jul 12 2018, 1:54 PM

I'd also rather stick to integer arithmetic and avoid using floats even in intermediate calculations. It'd be hard to make sure that no rounding errors kick in if we use floats.

In D49074#1160793, @NoQ wrote:

I'd also rather stick to integer arithmetic and avoid using floats even in intermediate calculations. It'd be hard to make sure that no rounding errors kick in if we use floats.

Yes, I agree. I think an integer plus a flag (whether to take the reciprocal) is better since we only use n or 1/n for scaling.

baloghadamsoftware edited the summary of this revision. (Show Details)

Completely rewritten: works correctly for modular arithmetic (multiplication), works correctly for truncation (division), only uses integers, no floats.

@baloghadamsoftware @xazax.hun we've had very promising results with using Z3 for refutation so far, almost at no cost, see Mikhail's recent email on cfe-dev (and sometimes at a negative cost!).
Do you still not want to try it first? False negatives could be addressed as well separately (e.g. by giving a checkers an option to cross-check the report with a more expensive solver before throwing), currently false positives are a far more pressing problem.

Of course we would like to try the Z3 for refutation, we do not dispute its usefulness. This patch is about something really different. It extends the range-based constraint manager in a very natural way. Is the code of the range-based constraint manager frozen for some reason? If not, then why not add multiplication and division as well, where addition and subtraction is already supported?

where addition and subtraction is already supported?

Is it though? From what I recall we had to disable it from the default set of options due to the fact that it gives rise to the exponential running time in some cases.
I am somewhat afraid of similar unexpected side-effects from this and subsequent patches.

Disabled? No, that was a different thing. I have some ideas there as well, how to solve it, but it was the rearrangement in the SValBuilder. This one is one level lower, the ConstraintManager which only calculate the range for a symbol where we have and assumption for it. Here operations such as Symbol + Integer1 <= Integer2 have been supported for many years (+ can be - as well, similarily we can replace <= with any other comparison operator). This patch adds support for * and / so it extends support for expressions like Symbol * Integer0 + Integer1 <= Integer2. It has marginal impact on the performance, a constant and surely not exponential.

This also solves the examples in the comment of bug 32911.

Since modifications of the infrastructure are always error-prone I tried to generate tests to have full coverage for smaller numbers. However for full coverage I need type signed char and unsigned char, but these tests failed because of this bug: Bug 39138. If that bug is fixed we can have full coverage for both signed and unsigned char types which is more than enough. Of course the generated test should not be part of the standard test suite because of its execution time.

Rebased on the previous part. Tests updated for the now default eagerly assume mode.

Here is the proof of the algorithm for signed char and unsigned char types. The dumper which tests every n for the n*k <comparison operator> m relation and creates ranges from values satisfying the expression for every k and m produces exactly the same output as the generator which generates the ranges using the algorithm from the patch. One the machine I run them they take between 30 and 60 seconds to run.

I tested this on several open-source projects. Lots of false-positives disappeared. There are also some new findings, mainly new false-positives but not because of this feature but of "loops executed 0 times".

I evaluated this patch on different open-source projects, with Z3 refutation off and on:

I cannot always decide whether a bug is false positive or true positive. Most of them seem false positives. Honestly, we cannot except too many true positives in these projects. My findings:

  • Execution time is largely unaffected. The most interesting thing here is that in one case (PostGreS) Z3 refutation causes 7x execution time with the original range-based constraint manager, but only 1,5x with our patch. The reason for this could be that with the more advanced range-based constraint manager the analyzer does not produce the problematic false positive on which the Z3 refutation spends a huge time.
  • In case of Curl and also in case of BitCoin we found false positives not produced with our patch but produced with the original, even with Z3 enabled.
  • The extra false positives produced by the analyzer with our patch are not caused by bugs in the implementation of division and multiplication of ranges but the less "unknown" symbolic values. Without the patch the result of a multiplication or division was unknown so the analyzer could not reason about it. Now with the patch it has a valid range so the analyzer can reason about it and report bugs (which may be false positives as well).
baloghadamsoftware retitled this revision from [Analyzer] [WIP] Basic support for multiplication and division in the constraint manager to [Analyzer] Basic support for multiplication and division in the constraint manager.Jul 26 2019, 7:11 AM

We have applied this patch in our internal release and our users have used it for more than half a year without any problems: no crashes, no freezes and no suspicious false positives caused by it.

ping! Any chance of this patch being accepted? This patch can help some SA false positives.