New logic tries to narrow possible result values of the remainder operation
based on its operands and their ranges. It also tries to be conservative
with negative operands because according to the standard the sign of
the result is implementation-defined.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Here is a proof in Z3:
https://gist.github.com/SavchenkoValeriy/559ca923b050f2c01e340c1be543b7e0
Here is a short summary of the performance testing I conducted across a bunch of open-source projects:
vim | git | tmux | redis | cmake | pytorch | bitcoin | protobuf | |
---|---|---|---|---|---|---|---|---|
Time (before) | 20m56s | 18m41s | 11m40s | 1h15m34s | 30m34s | 6h35m18s | 9m27s | 6m03s |
Time (after) | 22m16s | 19m58s | 29m52s | 1h17m32s | 33m03s | 9h46m41s | 9m33s | 6m03s |
Delta | +6.4% | +6.9% | +155.8% | +2.6% | +8.1% | +48.4% | -1.1% | +0.1% |
Time (before) was measured on a commit before any of my solver changes.
This shows that performance tweaks discussed in various TODOs are indeed required to reduce the hit.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
507 | Ok, so i misunderstood. This function computes range of abs($x) aka |$x| given the range for $x, right? |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
507 | I guess I should fix my comments (and maybe the name for this function). So this new range is guaranteed to include the original range. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
504 | Aha, ok, nvm, a different issue then: For range [INT_MIN + 1, INT_MAX], the correct answer should be `[INT_MIN + 1, INT_MAX] (which is [-C, C] for C = INT_MAX]) rather than [INT_MIN, INT_MAX]. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
686 | I wonder if we actually need this? I vaguely recall that we are doing a lot of simplifications during building symbolic expressions. I would be surprised if this identity is not handled there. (And in that case, probable this should be added there.) Or we might need a comment to explain why do we need this simplification at both places. |
@vsavchenko
I've made some assumptions.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
504–514 | I think you should swap if statements. I'll explain. P.S. I think your function's name doesn't fit its body, since absolute value is always positive (without sign) from its definition, but you output range may have negative values. You'd better write an explanation above the function and rename it. | |
526 | As for me, the last reason fully covers previous special cases, so you can omit those ones, thus simplify the comment. | |
712 | Extend the comment, please, why we should move bounds to zero at all. | |
737 | Is it OK to return this rangeset in case when one of operands(or both) is negative, since this rangeset can vary from specific implementation? |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
504–514 | It is a valid point, I will add this test and change this behaviour! The name is confusing indeed, maybe you have any ideas what would be more appropriate? | |
526 | I really want to be clear about the first two cases to explain why this works for any sign of From and To | |
686 | Yeah, we don't do it in SValBuilder, but it is definitely a better place for that particular case. I'll move it. | |
712 | Good point! | |
737 | Yes, it is a conservative range for any ranges because only the sign of the operation is specific to different implementations |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
504–514 | @NoQ , @ASDenysPetrov what do you think about this name instead (i.e. getSymmetricalRange). |
Performance-wise, I've investigated huge slowdowns on tmux and pytorch.
- pytorch build produces a lot of warnings and simply trashed my terminal. I guess one time it had more troubles with displaying all that than the other. Here is a table with new times:
pytorch | |
---|---|
Time (before) | 2h21m33s |
Time (after) | 2h19m23s |
Delta | -1.5% |
As you can see, these numbers are way smaller than the original ones.
- tmux is a much smaller project, so I decided to run it 20 times for each case.
After consistently shows slower runtimes, but the overall difference (for median times) is only +3%.
I believe that as of now we can submit these modifications as is and explore performance optimizations later if needed.
Aha, so performance regressions on real code weren't real, that's a relief :)
I believe that as of now we can submit these modifications as is and explore performance optimizations later if needed.
I still encourage you to explore the tests we have from our previous attempts to simplify expressions recursively without memoization (test/Analysis/hangs.c). I'm asking because these aren't all that artificial: this kind of code was previously reported by a frustrated user as "the analyzer started hanging on my code". Like, please replace a bunch of +es with &/|/% and see if this causes your code to perform exponentially over the size of the program. If so, i'd rather have us hurry up and implement memoization.
The math in this patch looks great, thanks!
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
506–507 | I suggest not trying to express signed types and unsigned types in a single formula, the reader will have to unwrap it back into the two cases anyway in order to understand what's going on. The following would imho be easier to read: "If T is signed, return the smallest range [-x..x] that covers the original range, or [-min(T), max(T)] if the aforementioned symmetric range doesn't exist due to original range covering min(T)). If T is unsigned, return the smallest range [0..x] that covers the original range". |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
506–507 | That is a perfect explanation, thanks! |
Ok, looks like my memories on this subject are heavily messed up. The actual problem that made us hang was solved by D47155. This is a dumb bug that would have been avoided if we had memoization but it doesn't require memoization to be avoided and it doesn't look like this code risks repeating that mistake.
Then, our experience with memoization in D47402 wasn't as good as i expected; it turned out that there are other exponential parts of the analysis in such cases that we still couldn't avoid. We should probably still do it (given how difficult it is now to identify these "other parts" that are exponential, i'd rather not add more such parts consciously) but i guess it's not that much of a blocker.
(Origin.From() + 1).isMinSignedValue() is another sufficient condition(?)