Dear All,
We have been looking at the following problem, where any code after the constant bound loop is not analyzed because of the limit on how many times the same block is visited, as described in bugzillas #7638 and #23438. This problem is of interest to us because we have identified significant bugs that the checkers are not locating. We have been discussing a solution involving ranges as a longer term project, but I would like to propose a patch to improve the current implementation.
Example issue:
for (int i = 0; i < 1000; ++i) {...something...} int *p = 0; *p = 0xDEADBEEF;
The proposal is to go through the first and last iterations of the loop. The patch creates an exploded node for the approximate last iteration of constant bound loops, before the max loop limit / block visit limit is reached. It does this by identifying the variable in the loop condition and finding the value which is “one away” from the loop being false. For example, if the condition is (x < 10), then an exploded node is created where the value of x is 9. Evaluating the loop body with x = 9 will then result in the analysis continuing after the loop, providing x is incremented.
The patch passes all the tests, with some modifications to coverage.c, in order to make the ‘function_which_gives_up’ continue to give up, since the changes allowed the analysis to progress past the loop.
This patch does introduce possible false positives, as a result of not knowing the state of variables which might be modified in the loop. I believe that, as a user, I would rather have false positives after loops than do no analysis at all. I understand this may not be the common opinion and am interested in hearing your views. There are also issues regarding break statements, which are not considered. A more advanced implementation of this approach might be able to consider other conditions in the loop, which would allow paths leading to breaks to be analyzed.
Lastly, I have performed a study on large code bases and I think there is little benefit in having “max-loop” default to 4 with the patch. For variable bound loops this tends to result in duplicated analysis after the loop, and it makes little difference to any constant bound loop which will do more than a few iterations. It might be beneficial to lower the default to 2, especially for the shallow analysis setting.
Please let me know your opinions on this approach to processing constant bound loops and the patch itself.
Regards,
Sean Eveson
SN Systems - Sony Computer Entertainment Group
What is the purpose of checking whether the block has already been visited by some other path?
If I understand correctly, this will stop the analyzer from widening before the "last" iteration through the loop and so will result in a sink after that iteration. What this means is that we will never explore the code beyond the loop in the widened state -- but isn't this the whole point of the widening?
So, for example, in your variable_bound_exiting_loops_not_widened() test, don't we want the clang_analyzer_eval() statement after the loop to be symbolically executed on 4 separate paths? That is, once when i is 0, once when i is 1, once when i is 2 and once when i is $conj_i$ + 1 where $conj_i$ is the value conjured for i when widening.
Also, in general, analysis of one path should not depend at all on whether another path has been explored. This is because we want the analyzer to be free choose different strategies about path exploration (e.g., BFS vs. DFS, prioritizing some paths over others, etc.) without changing the issues reported on along any given path. For this reason, I don't think we necessarily want to track and expose blockWasVisited() on CoreEngine or use this to determine when to permit a sink.