This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Introduce reasoning about symbolic remainder operator
ClosedPublic

Authored by vsavchenko on May 18 2020, 4:20 AM.

Details

Summary

New logic tries to narrow possible result values of the remainder operation
based on its operands and their ranges. It also tries to be conservative
with negative operands because according to the standard the sign of
the result is implementation-defined.

Diff Detail

Event Timeline

vsavchenko created this revision.May 18 2020, 4:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2020, 4:20 AM

Here is a short summary of the performance testing I conducted across a bunch of open-source projects:

vimgittmuxrediscmakepytorchbitcoinprotobuf
Time (before)20m56s18m41s11m40s1h15m34s30m34s6h35m18s9m27s6m03s
Time (after)22m16s19m58s29m52s1h17m32s33m03s9h46m41s9m33s6m03s
Delta +6.4% +6.9% +155.8% +2.6% +8.1% +48.4% -1.1% +0.1%

Time (before) was measured on a commit before any of my solver changes.

This shows that performance tweaks discussed in various TODOs are indeed required to reduce the hit.

NoQ added inline comments.May 18 2020, 8:30 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504

(Origin.From() + 1).isMinSignedValue() is another sufficient condition(?)

507

You mean zero, right?

vsavchenko marked 2 inline comments as done.May 18 2020, 8:39 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504

I'm sorry, I don't quite get what cases does this check cover. Can you please explain what you have in mind?

507

No, not always. It still can be signed at this point.

NoQ added inline comments.May 18 2020, 8:55 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
507

Ok, so i misunderstood. This function computes range of abs($x) aka |$x| given the range for $x, right?

vsavchenko marked an inline comment as done.May 18 2020, 9:13 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
507

I guess I should fix my comments (and maybe the name for this function).
This function finds absolute maximum, i.e. the value C: |$x| <= C and returns the range [-C, C] for signed $xs and [0, C] for unsigned $xs.

So this new range is guaranteed to include the original range.

NoQ added inline comments.May 18 2020, 9:19 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504

Aha, ok, nvm, a different issue then: For range [INT_MIN + 1, INT_MAX], the correct answer should be `[INT_MIN + 1, INT_MAX] (which is [-C, C] for C = INT_MAX]) rather than [INT_MIN, INT_MAX].

Fix code review remarks.

xazax.hun added inline comments.May 19 2020, 5:38 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
686

I wonder if we actually need this? I vaguely recall that we are doing a lot of simplifications during building symbolic expressions. I would be surprised if this identity is not handled there. (And in that case, probable this should be added there.) Or we might need a comment to explain why do we need this simplification at both places.

@vsavchenko
I've made some assumptions.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504–514

I think you should swap if statements. I'll explain.
Let's consider the input is an uint8 range [42, 242] and you will return [0, 242] in the second if.
But if the input is an uint8 range [128, 242] you will return [0, 255] in the first if, because 128 is an equivalent of -128(INT8_MIN) in binary representation so the condition in the first if would be true.
What is the great difference between [42, 242] and [128, 242] to have different results? Or you've just missed this case?

P.S. I think your function's name doesn't fit its body, since absolute value is always positive (without sign) from its definition, but you output range may have negative values. You'd better write an explanation above the function and rename it.

526

As for me, the last reason fully covers previous special cases, so you can omit those ones, thus simplify the comment.

712

Extend the comment, please, why we should move bounds to zero at all.

737

Is it OK to return this rangeset in case when one of operands(or both) is negative, since this rangeset can vary from specific implementation?

Move 0 % x case to SValBuilder

vsavchenko marked 5 inline comments as done.May 27 2020, 9:10 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504–514

It is a valid point, I will add this test and change this behaviour!

The name is confusing indeed, maybe you have any ideas what would be more appropriate?

526

I really want to be clear about the first two cases to explain why this works for any sign of From and To

686

Yeah, we don't do it in SValBuilder, but it is definitely a better place for that particular case. I'll move it.

712

Good point!

737

Yes, it is a conservative range for any ranges because only the sign of the operation is specific to different implementations

vsavchenko marked an inline comment as done.May 27 2020, 9:11 AM

Fix code review remarks

vsavchenko marked 3 inline comments as done.May 27 2020, 10:02 AM
vsavchenko marked an inline comment as done.
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
504–514

@NoQ , @ASDenysPetrov what do you think about this name instead (i.e. getSymmetricalRange).

Performance-wise, I've investigated huge slowdowns on tmux and pytorch.

  • pytorch build produces a lot of warnings and simply trashed my terminal. I guess one time it had more troubles with displaying all that than the other. Here is a table with new times:
pytorch
Time (before)2h21m33s
Time (after)2h19m23s
Delta -1.5%

As you can see, these numbers are way smaller than the original ones.

  • tmux is a much smaller project, so I decided to run it 20 times for each case.


After consistently shows slower runtimes, but the overall difference (for median times) is only +3%.

I believe that as of now we can submit these modifications as is and explore performance optimizations later if needed.

NoQ accepted this revision.May 28 2020, 3:45 AM

Aha, so performance regressions on real code weren't real, that's a relief :)

I believe that as of now we can submit these modifications as is and explore performance optimizations later if needed.

I still encourage you to explore the tests we have from our previous attempts to simplify expressions recursively without memoization (test/Analysis/hangs.c). I'm asking because these aren't all that artificial: this kind of code was previously reported by a frustrated user as "the analyzer started hanging on my code". Like, please replace a bunch of +es with &/|/% and see if this causes your code to perform exponentially over the size of the program. If so, i'd rather have us hurry up and implement memoization.

The math in this patch looks great, thanks!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
506–507

I suggest not trying to express signed types and unsigned types in a single formula, the reader will have to unwrap it back into the two cases anyway in order to understand what's going on.

The following would imho be easier to read: "If T is signed, return the smallest range [-x..x] that covers the original range, or [-min(T), max(T)] if the aforementioned symmetric range doesn't exist due to original range covering min(T)). If T is unsigned, return the smallest range [0..x] that covers the original range".

This revision is now accepted and ready to land.May 28 2020, 3:45 AM
vsavchenko marked an inline comment as done.May 28 2020, 5:55 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
506–507

That is a perfect explanation, thanks!

NoQ added a comment.May 28 2020, 7:35 AM
In D80117#2059567, @NoQ wrote:

I believe that as of now we can submit these modifications as is and explore performance optimizations later if needed.

I still encourage you to explore the tests we have from our previous attempts to simplify expressions recursively without memoization (test/Analysis/hangs.c). I'm asking because these aren't all that artificial: this kind of code was previously reported by a frustrated user as "the analyzer started hanging on my code". Like, please replace a bunch of +es with &/|/% and see if this causes your code to perform exponentially over the size of the program. If so, i'd rather have us hurry up and implement memoization.

Ok, looks like my memories on this subject are heavily messed up. The actual problem that made us hang was solved by D47155. This is a dumb bug that would have been avoided if we had memoization but it doesn't require memoization to be avoided and it doesn't look like this code risks repeating that mistake.

Then, our experience with memoization in D47402 wasn't as good as i expected; it turned out that there are other exponential parts of the analysis in such cases that we still couldn't avoid. We should probably still do it (given how difficult it is now to identify these "other parts" that are exponential, i'd rather not add more such parts consciously) but i guess it's not that much of a blocker.

Fix code review remarks

This revision was automatically updated to reflect the committed changes.