This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/StaticAnalyzer/Core/
-
clang/
-
StaticAnalyzer/
-
Core/
2/2
AnalyzerOptions.h
-
PathSensitive/
-
LoopWidening.h
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
AnalyzerOptions.cpp
-
CMakeLists.txt
5
ExprEngine.cpp
7/8
LoopWidening.cpp
-
test/Analysis/
-
Analysis/
-
analyzer-config.c
-
analyzer-config.cpp
3/6
loop-widening.c

Differential D12358

[Analyzer] Widening loops which do not exit
ClosedPublic

Authored by seaneveson on Aug 26 2015, 3:59 AM.

Download Raw Diff

Details

Reviewers

dcoughlin
krememek
zaks.anna
xazax.hun
jordan_rose

Commits

rG83390e45b3c5: [Analyzer] Widening loops which do not exit
rC251621: [Analyzer] Widening loops which do not exit
rL251621: [Analyzer] Widening loops which do not exit

Summary

Dear All,

We have been looking at the following problem, where any code after the constant bound loop is not analyzed because of the limit on how many times the same block is visited, as described in bugzillas #7638 and #23438. This problem is of interest to us because we have identified significant bugs that the checkers are not locating. We have been discussing a solution involving ranges as a longer term project, but I would like to propose a patch to improve the current implementation.

Example issue:

for (int i = 0; i < 1000; ++i) {...something...}
int *p = 0;
*p = 0xDEADBEEF;

The proposal is to go through the first and last iterations of the loop. The patch creates an exploded node for the approximate last iteration of constant bound loops, before the max loop limit / block visit limit is reached. It does this by identifying the variable in the loop condition and finding the value which is “one away” from the loop being false. For example, if the condition is (x < 10), then an exploded node is created where the value of x is 9. Evaluating the loop body with x = 9 will then result in the analysis continuing after the loop, providing x is incremented.

The patch passes all the tests, with some modifications to coverage.c, in order to make the ‘function_which_gives_up’ continue to give up, since the changes allowed the analysis to progress past the loop.

This patch does introduce possible false positives, as a result of not knowing the state of variables which might be modified in the loop. I believe that, as a user, I would rather have false positives after loops than do no analysis at all. I understand this may not be the common opinion and am interested in hearing your views. There are also issues regarding break statements, which are not considered. A more advanced implementation of this approach might be able to consider other conditions in the loop, which would allow paths leading to breaks to be analyzed.

Lastly, I have performed a study on large code bases and I think there is little benefit in having “max-loop” default to 4 with the patch. For variable bound loops this tends to result in duplicated analysis after the loop, and it makes little difference to any constant bound loop which will do more than a few iterations. It might be beneficial to lower the default to 2, especially for the shallow analysis setting.

Please let me know your opinions on this approach to processing constant bound loops and the patch itself.

Regards,

Sean Eveson
SN Systems - Sony Computer Entertainment Group

Diff Detail

Event Timeline

seaneveson updated this revision to Diff 33190.Aug 26 2015, 3:59 AM

seaneveson retitled this revision from to [Analyzer] Handling constant bound loops.

seaneveson updated this object.

seaneveson added a subscriber: cfe-commits.

xazax.hun added reviewers: zaks.anna, krememek, jordan_rose, dcoughlin, xazax.hun.Aug 26 2015, 3:08 PM

xazax.hun added a subscriber: xazax.hun.

I agree that the way the static analyzer handles loops with known bounds is quite bad and we loose coverage because of it, so this is definitely an important problem to solve!

I've briefly looked at the patch and the main problem is that while you reset the value of the counter variable, you do not change the values of the other variables. This will leave us in a state that is wrong and will likely lead to false positives and inconsistencies, avoiding which is extremely important.

A way this could be improved is by invalidating all the values that the loops effects, both the values on the stack and on the heap. (We could even start overly conservative and invalidate all the values in the state; setting the known values to unknown values.)

This revision now requires changes to proceed.Aug 26 2015, 6:29 PM

A conservative solution should work.

But if we want to preserve some precision I think it should be possible to not to invalidate those values on the stack that are not effected by the loop. The hard problem here is that, it is impossible to reason about the heap without global knowledge, so if there is a pointer on the heap to a local variable that should be invalidated as well.

What do you think?

In D12358#233983, @zaks.anna wrote:

A way this could be improved is by invalidating all the values that the loops effects, both the values on the stack and on the heap. (We could even start overly conservative and invalidate all the values in the state; setting the known values to unknown values.)

I worry that this would be far too conservative, and would introduce new false positives because of lost precision.

One unsound hack would be to analyze a loop once by unrolling, and then invalidate anything that was touched in the ProgramState during that loop iteration. That invalidation could also be transitive, e.g. any subregions of touched state, etc.

Another, more principled hack, would be to look at all DeclRefExprs within a loop and invalidate all memory in the cone-of-influence of those variables (i.e., values they point to, etc.), but that's it.

Then there is the problem of called functions within the loop, as they won't be analyzed. Those could interfere with the ability of a checker to do its job.

My recommendation is that we still unroll loops a couple times, getting full precision, and then employ a widening technique like the ones being discussed to allow the last loop iteration to act as the last one, but as a conservative over-approximation.

msg-12340-169.txt159 BDownload

FWIW, I do think this is a great problem to work on. It is easy to come up with solutions that work for specific examples but fall over on general code. I completely agree that failing to analyzing code after the loop is a major hole and lost opportunity to find bugs, but fixing that should not be at a tradeoff of a huge influx in false positives. Some basic invalidation of values touched by the loop, which includes possibly invalidating checker state, will likely be necessary. I think this is what Anna was getting to in her comment.

One thing about the tests passing: that's great, but that's obviously insufficient. We probably have fewer tests showing that the analyzer can't handle something than tests that show it does handle something.

When a patch like this lands, which expands the analysis scope of the core analyzer, a few things are worth measuring:

(1) The impacted analysis time on a typical project.

(2) The impacted memory usage of the analyzer on a typical project.

(3) Increased analysis coverage. Are we covering more statements in practice? Etc. Since that is the motivation of this patch, it would be good to benchmark it on some real code.

We should, as a community, start keeping track of such things for a few projects so we know whether or not experiments like these are objectively a real win.

In D12358#233983, @zaks.anna wrote:

This will leave us in a state that is wrong and will likely lead to false positives and inconsistencies, avoiding which is extremely important.

I accept that my current patch is not a comprehensive solution to the problem and that it may introduce false positives, however I do think it is an improvement, where it is preferable to have false positives over doing no analysis after the loop.

In D12358#234016, @krememek wrote:

My recommendation is that we still unroll loops a couple times, getting full precision, and then employ a widening technique like the ones being discussed to allow the last loop iteration to act as the last one, but as a conservative over-approximation.

In the patch, constant bound loops are unrolled with the “max-loop” setting (default of 4), (i = 0, 1, 2, 99) rather than (i = 0, 1, 2, 3); as such, we analyze the same number of paths through the loop body.

In my experience, constant bound loops are normally used to make simple modifications to fixed length collections of data, I think the behaviour of the majority of these loops will be represented by the first and last iteration.

In D12358#234016, @krememek wrote:

Another, more principled hack, would be to look at all DeclRefExprs within a loop and invalidate all memory in the cone-of-influence of those variables (i.e., values they point to, etc.), but that's it.

Could this be done easily with a visitor and the existing invalidate methods? In cases where the anaylzer is unsure which values might be modified by the loop, it could fall back to invalidating all the values in the state.

In D12358#234016, @krememek wrote:

Then there is the problem of called functions within the loop, as they won't be analyzed. Those could interfere with the ability of a checker to do its job.

My recommendation is that we still unroll loops a couple times, getting full precision, and then employ a widening technique like the ones being discussed to allow the last loop iteration to act as the last one, but as a conservative over-approximation.

Wouldn’t widening before the last iteration result in more paths being explored than additional unrolling? i.e. as a result of values in conditions being unknown.

Regards,

Sean Eveson
SN Systems - Sony Computer Entertainment Group

I accept that my current patch is not a comprehensive solution to the problem and that it may introduce > false positives, however I do think it is an improvement, where it is preferable to have false positives
over doing no analysis after the loop.

We try to avoid false positives as much as possible. They are very painful for users to deal with.

In my experience, constant bound loops are normally used to make simple modifications to fixed
length collections of data, I think the behaviour of the majority of these loops will be represented by
the first and last iteration.

The main issue with the patch is that it produces a false path on which value of only one of the variables is reset to the last iteration of the loop and the rest of them are set as if it is the 3d iteration. A way to solve this is to compute what can be invalidated by the loop and set those to unknown values (a widening operation).

You should develop this feature behind a flag. This would allow for incremental development and simplify evaluation.

In D12358#234949, @zaks.anna wrote:

I accept that my current patch is not a comprehensive solution to the problem and that it may introduce > false positives, however I do think it is an improvement, where it is preferable to have false positives
over doing no analysis after the loop.

We try to avoid false positives as much as possible. They are very painful for users to deal with.

In my experience, constant bound loops are normally used to make simple modifications to fixed
length collections of data, I think the behaviour of the majority of these loops will be represented by
the first and last iteration.

The main issue with the patch is that it produces a false path on which value of only one of the variables is reset to the last iteration of the loop and the rest of them are set as if it is the 3d iteration. A way to solve this is to compute what can be invalidated by the loop and set those to unknown values (a widening operation).

You should develop this feature behind a flag. This would allow for incremental development and simplify evaluation.

I agree this should be under a flag, or more specifically, an -analyzer-config option.

I think the functionality started here by doing something clever with loops is complex enough to warrant putting this into a separate .cpp file. We can keep this here for now, but this seems like complexity that is going to naturally creep up and make this file even more monolithic than it already is.

This change also unconditionally never evaluates the body of a loop if the loop bound is constant. That seems like a major degradation in precision, as it is worth analyzing the loop body at least once to get full precision of the paths explored within the loop body. That's why I suggest unrolling at least a couple times. The first unrolling will get good coverage of the code within the loop, and the second unrolling will capture some correlated conditions across loop iterations that can diagnose some interesting behavior. After the second unrolling (or 'nth' for some 'n'), this widening seems applicable. I've also seen some static analyzers that focused on buffer overflow detection unroll looks like this by simply extending 'n' to be the constant loop bounds, opting for maximum precision at the cost of analyzing all those iterations (which isn't necessarily the best option either, but it is an option).

It seems the patch needs two things:

A general scheme for widening, which can just be invalidation of anything touched within a loop. I think that can be done by having an algorithm that does an AST walk, and looks at all variables whose lvalues are taken (used to store values or otherwise get their address using '&') or anything passed-by-reference to a function. That set of VarDecls can be cached in a side table, mapping ForStmt*'s (other loops as well) to the set of VarDecls that are invalidated. Those could then be passed to something that does the invalidation using the currently invalidation logic we have in place. The invalidation logic should also handle checker state, which also needs to get invalidated alongside variable values.
We need to consult the number of times a loop has been executed to determine when to apply this widening. We can look at the basic block counts, which are already used to cull off too many loop iterations.

This patch cannot go in as is, as it will seriously degrade the quality of the analyzer without further refinements. It is a great start, and I would be supportive of the patch landing more-or-less as is if it is behind a flag, and not on by default until the improvements suggested are made.

Hi Sean,

I responded with some more feedback. Conceptually the patch is quite simple, but I think Anna’s points are all spot on. I think we’d all be quite amendable for this work to go in under a flag, with further improvements going in on top of that. That way we can all collectively start hashing this out in trunk instead of feature creeping a patch.

Ted

In D12358#234975, @krememek wrote:

We need to consult the number of times a loop has been executed to determine when to apply this widening. We can look at the basic block counts, which are already used to cull off too many loop iterations.

On closer inspection, this is actually already in the patch:

BldCtx.blockCount() == AMgr.options.maxBlockVisitOnPath - 1

I do wonder if nested loops will cause a problem here, however. If a loop is nested, blockCount() could already be the same as maxBlockVisitOnPath() because of the outer loop. For example:

for (int i = 0; i < 3; i++) {
  for (int j = 0; j < 2; j++) { ... }
}

On the second iteration of the outer loop, but the first iteration of the inner loop within that second outer loop iteration, the block count for the body of the nested loop will already be 2. The analyzer will prune paths, regardless of this loop widening heuristic. I don't think the suggested patch actually helps with this case because the block count gets exhausted. We may need a way to enhance the accounting of block counts when taking into account heuristics for loops.

In D12358#234949, @zaks.anna wrote:

We try to avoid false positives as much as possible. They are very painful for users to deal with.

You should develop this feature behind a flag. This would allow for incremental development and simplify evaluation.

I absolutely want to minimize false positives, but I also want to make a start on just doing something after the loop, where the solution could then be improved upon. Developing behind a flag seems to be an ideal solution to me.

In D12358#234975, @krememek wrote:

I think the functionality started here by doing something clever with loops is complex enough to warrant putting this into a separate .cpp file.

Good idea.

In D12358#234986, @krememek wrote:

I do wonder if nested loops will cause a problem here

Yes the patch only works with the inner most loop. A fix to this should ideally prevent inner loops being analyzed again from an identical state. I haven’t looked into if this will ‘just happen’ as a result of the framework, or how to do it otherwise.

In D12358#234977, @krememek wrote:

I think we’d all be quite amendable for this work to go in under a flag, with further improvements going in on top of that. That way we can all collectively start hashing this out in trunk instead of feature creeping a patch.

If I refactor this patch in its current state into a separate file and put it behind a flag, would it then be acceptable? I would then like to take some time to look at the invalidation improvements as discussed in this thread.

In D12358#235142, @seaneveson wrote:

If I refactor this patch in its current state into a separate file and put it behind a flag, would it then be acceptable? I would then like to take some time to look at the invalidation improvements as discussed in this thread.

I think I'd like to see two things, primarily:

The logic behind and -analyzer-config option.
Very basic invalidation of values, as Anna suggests, to make this patch minimally viable.

For the latter, I think there is a simple solution. Notice on ProgramState there is a method called invalidateRegions. This is a worklist algorithm that takes a set of memory regions, and invalidates all memory regions (including checker state) that is touched by those regions, or reachable via memory bindings, etc. It's the complete transitive closure. To get the conservative invalidation that Anna suggests (actually, a little less conservative), I think you can just pass in a two MemRegions as the input to that method and get a new ProgramState with the appropriate regions invalidated. That MemRegions are the MemRegion corresponding to the current stack frame, specifically divided into locals and parameters. You can get them by getting the current StackFrameContext (which you get from the LocationContext), and using the following two methods form MemRegionManager:

getStackLocalsRegion(const StackFrameContext *STC);
getStackArgumentsRegion(const StackFrameContext *STC);

Those two MemRegions will be the parent regions of all VarRegions, etc. on the current stack frame. If you pass those to invalidateRegions, that will invalidate all memory on the current stack frame, and anything they reference.

Actually, it would probably be also good to pass in the MemRegion returned by getGlobalsRegion() as well, to invalidate all global memory. But just those three will give the conservative invalidation Anna suggested, and would give a viable initial version of the patch (which should still be guarded under a flag). All told, that's probably about 10 lines of code.

With that, I think you would have a viable initial patch, and something that would be meaningfully testable.

Thank you for working on this!

In D12358#237099, @krememek wrote:

To get the conservative invalidation that Anna suggests (actually, a little less conservative), I think you can just pass in a two MemRegions as the input to that method and get a new ProgramState with the appropriate regions invalidated.

Thank you for the suggestion. Unfortunately nothing seems to get invalidated when I call invalidateRegions, in the following code:

const StackFrameContext *STC = LCtx->getCurrentStackFrame();
MemRegionManager &MRMgr = svalBuilder.getRegionManager();
const MemRegion *Regions[] = {
  MRMgr.getStackLocalsRegion(STC),
  MRMgr.getStackArgumentsRegion(STC),
  MRMgr.getGlobalsRegion()
};
ProgramStateRef State;
State = PrevState->invalidateRegions(Regions, Cond, BlockCount, LCtx, false);

Do you have any suggestions on what I have done wrong?

If there is no simple way to invalidate all three regions I will work on implementing something like the following:

In D12358#234975, @krememek wrote:

A general scheme for widening, which can just be invalidation of anything touched within a loop. I think that can be done by having an algorithm that does an AST walk, and looks at all variables whose lvalues are taken (used to store values or otherwise get their address using '&') or anything passed-by-reference to a function. That set of VarDecls can be cached in a side table, mapping ForStmt*'s (other loops as well) to the set of VarDecls that are invalidated. Those could then be passed to something that does the invalidation using the currently invalidation logic we have in place. The invalidation logic should also handle checker state, which also needs to get invalidated alongside variable values.

In D12358#238303, @seaneveson wrote:
In D12358#237099, @krememek wrote:

To get the conservative invalidation that Anna suggests (actually, a little less conservative), I think you can just pass in a two MemRegions as the input to that method and get a new ProgramState with the appropriate regions invalidated.

Thank you for the suggestion. Unfortunately nothing seems to get invalidated when I call invalidateRegions, in the following code:
const StackFrameContext *STC = LCtx->getCurrentStackFrame();
MemRegionManager &MRMgr = svalBuilder.getRegionManager();
const MemRegion *Regions[] = {
  MRMgr.getStackLocalsRegion(STC),
  MRMgr.getStackArgumentsRegion(STC),
  MRMgr.getGlobalsRegion()
};
ProgramStateRef State;
State = PrevState->invalidateRegions(Regions, Cond, BlockCount, LCtx, false);
Do you have any suggestions on what I have done wrong?

I suspect this has to do with invalidateRegions itself. I will take a look. In the meantime, can you provide an updated patch that I can try out so I can step through the algorithm (if necessary) in the debugger and reproduce what you are seeing?

Refactored into a new file: LoopWidening.cpp (and LoopWidening.h).
Added an analyzer-config option, which defaults to false: widen-constant-bound-loops=false
Modified analyzer-config tests to check for the new option.
Added code to invalidate the stack locals, stack arguments and global regions (WIP).
Added a test to check variables are unknown after a widened loop, which currently fails.

Hi Sean,

Ted provided more details off-list. He suspects that the problem is that we likely don't add MemSpaceRegions to the worklist because every region is a subregion of a MemSpaceRegion, and thus we would invalidate, by default, all regions that were in the same MemSpace as the regions we were invalidating. He thinks we want to not change that behavior, but also provide a way of invalidating an entire MemSpace if so desired. That's probably just a slight tweak to the algorithm.

I’ll take a look at your updated patch to reproduce what you are seeing and investigate to see if that is what is going on.

Thanks,
Devin

In D12358#241631, @cfe-commits wrote:

Hi Sean,

Ted provided more details off-list. He suspects that the problem is that we likely don't add MemSpaceRegions to the worklist because every region is a subregion of a MemSpaceRegion, and thus we would invalidate, by default, all regions that were in the same MemSpace as the regions we were invalidating. He thinks we want to not change that behavior, but also provide a way of invalidating an entire MemSpace if so desired. That's probably just a slight tweak to the algorithm.

I’ll take a look at your updated patch to reproduce what you are seeing and investigate to see if that is what is going on.

Thanks,
Devin

Hi Devin,

That sounds good, thanks for looking at this. Can you let me know when you will be able to look at this? I am doing some work that builds on this patch's functionality and having this patch accepted would help.

Thanks,
Sean

I'll look at this today. Thanks for your patience!

I looked at the behavior of invalidateRegions() under the patch. It looks like MemSpaceRegions are being added to the worklist but these regions don't have clusters associated with them so invalidating the regions themselves doesn't remove any bindings.

Ted: What would you think about extending the logic in GenerateClusters() that invalidates globals for function calls to handle this more general case? That logic already checks to see if a region is in a particular global MemSpaceRegion and -- if so -- adds it to the worklist.

We could, for example, add a new RegionAndSymbolInvalidationTraits (e.g., TK_InvalidateEntireMemSpace). Clients of invalidateRegions could set this trait on the MemSpace they want to invalidate and GenerateClusters() could add any region with a memory space that has this trait to the work list. Does this seem like a reasonable approach?

There is a patch to add memspace region invalidation in http://reviews.llvm.org/D12993.

The TK_EntireMemSpace trait is now used when invalidating. The trait was added in http://reviews.llvm.org/D12993, thank you Devin for the patch.
Updated to the latest trunk.

seaneveson added a parent revision: D12993: [analyzer] Add TK_EntireMemSpace invalidation trait..Sep 21 2015, 3:03 AM

nit: Please, use proper punctuation in the comments.

lib/StaticAnalyzer/Core/ExprEngine.cpp
23	Do we loose precision for loops that need to be executed exactly maxBlockVisitOnPath times?
lib/StaticAnalyzer/Core/LoopWidening.cpp
11	This would allow to widen not just the constant bound loops! I think we should widen any loop that we know we did not fully execute.
172	iteration -> iteration What do you mean by "approximate"? Note that people rely on the analyzer to track the exact values of variables. They expect it to know what the value is. It can be very confusing if the analyzer states that it knows the value of the variable and it is the wrong value. I am concerned with changing the value of a single variable of the loop based on a syntactic check. Also, I am not sure there is a strong need for preserving the value of the index variable either.
test/Analysis/constant-bound-loops.c
1 ↗	(On Diff #35216)	Should we test that the loop is still executed n times before it is widened?
174 ↗	(On Diff #35216)	I think we should extend this test case. Ex: what about heap, what about variables touched by the loop variables declared in the loop's scope?

dcoughlin added inline comments.Sep 21 2015, 10:38 PM

lib/StaticAnalyzer/Core/LoopWidening.cpp
150	This doesn't seem quite right. Consider: int i; for (i = 0; i < 21; i += 3) {} clang_analyzer_eval(i == 23); The value of `i` should be 21 after the loop, but this code sets it to 23. And what happens if `i` starts at 1 instead of 0? Another (ridiculous) case to consider: for (i = 0; i < 21; i += 3) { if (random() % 2 == 1) { i = i * i; } else { i--; } } What are the possible values of `i` after the loop?

Thank you for your comments.

@zaks.anna wrote:

What do you mean by "approximate"?

I think @dcoughlin gave a perfect example in the following comment:

@dcoughlin wrote:
This doesn't seem quite right. Consider:
int i;
for (i = 0; i < 21; i += 3) {}
clang_analyzer_eval(i == 23);
The value of i should be 21 after the loop, but this code sets it to 23. And what happens if i starts at 1 instead of 0?

The patch only examines the loop condition to avoid having to consider any of the loop body. It approximates an exit value for the variable in the condition by evaluating an iteration with the variable at its maximum/minimum value. This is loosely based on the widening and narrowing described in this paper: http://dl.acm.org/citation.cfm?id=512973

@zaks.anna wrote:

Note that people rely on the analyzer to track the exact values of variables. They expect it to know what the value is. It can be very confusing if the analyzer states that it knows the value of the variable and it is the wrong value.

I am concerned with changing the value of a single variable of the loop based on a syntactic check. Also, I am not sure there is a strong need for preserving the value of the index variable either.

Since the goal of this patch is to get the analyzer to move past loops, I see no problem with invalidating the index variable. That said, I still think it is worth doing the approximation, as it identifies infinite loops.

Further work could then be done to set the value of the index variable after 'simple' loops, where the analyzer can be sure of the value. In addition to this, the detection of infinite loops could be improved to work for any loop. This would allow the widening to be applied to any non-infinite, non-exiting loop.

lib/StaticAnalyzer/Core/ExprEngine.cpp
23	Yes. To be precise, the loss is for loops that need to be executed (maxBlockVisitOnPath - 1) times, because processCFGBlockEntrance generates a sink if blockCount >= maxBlockVisitOnPath. With the default value for maxBlockVisitOnPath, a loop that iterated three times would be fully analyzed with widen=false, but would be unnecessarily widened if widen=true. This could be fixed by (effectively) incrementing the value of maxBlockVisitOnPath, when the widening was enabled. e.g. replacing the following line from processCFGBlockEntrance: if (nodeBuilder.getContext().blockCount() >= AMgr.options.maxBlockVisitOnPath) { ... With something like: int blockLimit = AMgr.options.maxBlockVisitOnPath + (AMgr.options.shouldWidenConstantBoundLoops() ? 1 : 0); if (nodeBuilder.getContext().blockCount() >= blockLimit) { ... Does that seem resonable?
test/Analysis/constant-bound-loops.c
174 ↗	(On Diff #35216)	Good point. I actually encountered a false positive while improving this case. int h_ptr = malloc(sizeof(int)); h_ptr = 3; for (int i = 0; i < 10; ++i) {} // WARNING: Potential leak of memory pointed to by 'h_ptr' The value of h_ptr is invalidated, but I'm not sure about *h_ptr. Is it likely this warning is produced because the pointer is invalidated, but not the memory?

zaks.anna added inline comments.Sep 23 2015, 10:14 AM

lib/StaticAnalyzer/Core/ExprEngine.cpp
23	This would increment the value for all loops... I think we should first answer the question below. Do we want to widen all loops, not just loops with "constant bounds"?
test/Analysis/constant-bound-loops.c
174 ↗	(On Diff #35216)	Could you provide the full example? There is leak in the code segment above.

Sean,

One important difference between what you are proposing and what Cousot and Cousot describe in the paper you cite is that they don't drop coverage. They widen for termination on infinite-height lattices and their narrowing still maintains an over-approximation (it covers all potential paths through a loop -- and perhaps some extra ones as well). What the analyzer had before with constant-bound loops was an under-approximation: it would just stop exploring paths after a while

What you are proposing is neither an under-approximation nor an over-approximation because it will both drop coverage (e.g., the path in my simple example above where i is 21 after the loop) and add spurious paths (e.g., the path where i is 23). This is something that it is best to avoid when possible because it will cause the analyzer to both miss true positives and lead to false positives.

My initial approach was for the analyzer to have as much information as possible after the loop. This means there are cases where the information is incorrect. Future work would be to reduce these cases.

I believe your preferred approach is to have no inaccuracies after the loop, which can initially be achieved by providing less information. Further work would add more (guaranteed accurate) information. In this way the analyzer is naturally conservative when it isn't sure of something. I now agree that this is a better approach to take.

What do people think about me creating a new patch based on your feedback? The aim would be to widen any non-exiting loops by invalidation. The initial patch would be behind a flag and would use the TK_EntireMemSpace trait to conservatively invalidate 'everything' when a loop does not exit. It would then run through the loop body in the invalidated state, resulting in the analysis of all possible paths. The loop would then exit from the (now possibly false) loop condition, or a (now reachable) break statement. Loops that never exit regardless of any variables would be an exception; see my comment below for thoughts on handling infinite loops.

Future improvements would prevent unnecessary invalidation and calculate the values of modified variables (after the loop).

lib/StaticAnalyzer/Core/LoopWidening.cpp
11	I agree that the goal should be to widen any loop that isn't fully executed, but we need to consider infinite loops, which might be falsely exited after widening. The way I see it, assuming the widening will be done by invalidating variables/memory, we could either: Only widen loops which definitely exit. Widen all loops unless they are definitely infinite (or widen them in a way which minimizes the number of infinite loops which then exit). Come up with a method which decides if a loops is infinite or not (but tolerate mistakes either way) and widen when we "think" the loop will exit. This is similar to the current approach for constant bound loops. My current preference would be option 2. This is based on the assumption that loops are generally written to exit at some point, and if they aren't, they are unlikely to have code after them anyway. When it does not know which branch the program will go down, the analyzer's approach is to go down both. Similarly if the analyzer does not know whether the loop exits, it should optimistically go down the exit branch (IMO).
test/Analysis/constant-bound-loops.c
174 ↗	(On Diff #35216)	Sorry, I meant to say the warning is there regardless of what comes after the loop. Here is a full test case: typedef __typeof(sizeof(int)) size_t; void malloc(size_t); void free(void ); void test() { int h_ptr = (int ) malloc(sizeof(int)); for (int i = 0; i < 2; ++i) {} // No warning. for (int i = 0; i < 10; ++i) {} free(h_ptr); } It produces the following for me: clang.exe -cc1 -analyze -analyzer-checker=core,unix.Malloc -analyzer-config widen-constant-bound-loops=true test.cpp test.cpp:8:31: warning: Potential leak of memory pointed to by 'h_ptr' for (int i = 0; i < 10; ++i) {} ^ 1 warning generated.

What do people think about me creating a new patch based on your feedback?

I don't think you need to create a new review -- we can update this one and keep the history.

The aim would be to widen any non-exiting loops by invalidation. The initial patch would be behind a flag and would use the TK_EntireMemSpace trait to conservatively invalidate 'everything' when a loop does not exit. It would then run through the loop body in the invalidated state, resulting in the analysis of all possible paths. The loop would then exit from the (now possibly false) loop condition, or a (now reachable) break statement.

This sounds great to me. I think it shouldn't be too difficult to add coarse widening before the last iteration through the loop for both concrete-bounded loops and loops with symbolic bounds. This will be a very imprecise hammer. It will invalidate everything (heap, locals, globals). I think you can choose the widening location (i.e, a loop back edge) to make it possible to still learn from the guard condition. For example, for the following loop I think the analyzer should be able widen and still know that ‘j’ < ’N’ on the final, widened iteration through the loop and that ‘j’ >= ’N’ after the loop (assuming there is no ‘break’):

for (int j = 0; j < N; j++) { .. }

This will let us evaluate what kinds of improvements in coverage you get from not generating a sink but will almost certainly be too imprecise to turn on by default. I think a slightly tricky challenge here will be to handle nested loops without unnecessary sinks.

You could then make the widening more precise by invalidating only memory regions that a pre-analysis indicates may be modified by the loop. This pre-analysis can be relatively simple — invalidating the cone of influence of the DeclRefExprs in the the loop (as Ted suggested above.) With this implemented, we can evaluate precision and performance to decide whether to turn this on by default — but the hope is yes. If not, we may need to play with unsound tricks that leave less invalidated.

Loops that never exit regardless of any variables would be an exception; see my comment below for thoughts on handling infinite loops.

Generalized non-termination checking is a very hard problem and is not something analyzer is particularly well-suited for. While this would be an interesting avenue for the analyzer to pursue, I think it should be in separate checker.

Updated to latest revision.
Change patch to widen all loops which do not exit.
Changed to widen on block entrance so the guard condition is accounted for (thanks for the suggestion).
Updated tests to reflect changes.

There are some issues which haven't been resolved yet:

There is a loss of precision for loops that need to be executed exactly maxBlockVisitOnPath times, as the loop body is executed with the widened state instead of the last iteration.
The invalidation still causes memory leak false positives (see failing test: pointer_doesnt_leak_from_loop).

seaneveson retitled this revision from [Analyzer] Handling constant bound loops to [Analyzer] Widening loops which do not exit.Oct 8 2015, 12:49 AM

There is a loss of precision for loops that need to be executed exactly maxBlockVisitOnPath times, as the loop body is executed with the widened state instead of the last iteration.

I think this is an acceptable loss of precision because, in general, it is unlikely that a concrete-bounded loop will be executed *exactly* maxBlockVisitOnPath times. You could add special syntactic checks to try to detect this, but I think it is unlikely to make much different in practice.

lib/StaticAnalyzer/Core/LoopWidening.cpp
65	Passing `true` here instead of `false` for `CausedByPointerEscape` should fix the false alarm. This will cause invalidation to tell the MallocChecker that the pointer has escaped and so shouldn't be treated as a leak.
test/Analysis/loop-widening.c
4	There seems to be some weirdness when using `_Bool` in the prototype (vs. int). It adds an extra `!= 0` to symbolic expressions, which can result in clang_analyzer_eval() yielding `UNKNOWN` when using `extern void clang_analyze_eval(int)` would print `TRUE` or `FALSE`. We should track this down and fix it, but for now I think it is better to use `extern void clang_analyze_eval(int)`.
14	This is clever.
112	After changing the `clang_analyzer_eval()` prototype to take an int, you can use `clang_analyzer_eval(i >= 50); // expected-warning {{TRUE}}`.
159	I think it would be good to add some nested loop tests. For example: void nested1() { int i = 0, j = 0; for (i = 0; i < 10; i++) { clang_analyzer_eval(i < 10); // expected-warning {{TRUE}} for (j = 0; j < 2; j++) { clang_analyzer_eval(j < 2); // expected-warning {{TRUE}} } clang_analyzer_eval(j >= 2); // expected-warning {{TRUE}} } clang_analyzer_eval(i >= 10); // expected-warning {{TRUE}} } void nested2() { int i = 0, j = 0; for (i = 0; i < 2; i++) { clang_analyzer_eval(i < 2); // expected-warning {{TRUE}} for (j = 0; j < 10; j++) { clang_analyzer_eval(j < 10); // expected-warning {{TRUE}} } clang_analyzer_eval(j >= 10); // expected-warning {{TRUE}} } clang_analyzer_eval(i >= 2); // expected-warning {{TRUE}} }

Set CausedByPointerEscape to true
Check if the loop has already been exited before widening
Changed tests to use void clang_analyze_eval(int)
Added variable bound loop tests
Added nested loop tests

In D12358#266391, @dcoughlin wrote:

I think this is an acceptable loss of precision because, in general, it is unlikely that a concrete-bounded loop will be executed *exactly* maxBlockVisitOnPath times. You could add special syntactic checks to try to detect this, but I think it is unlikely to make much different in practice.

I agree.

The problem can be solved by executing the widened state after the last iteration, rather than instead of it. This can only really be done by changing the max block visit approach. This needs to be done in the future in order to support nested loops. This precision issue could also be resolved at that time.

include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
84 ↗	(On Diff #37462)	The BlockCount is specific to the path of analysis, so it can't be used to check if a loop has been exited before. I've added this set to store and identify previously visited blocks. Is there a better way this can/could be done?
lib/StaticAnalyzer/Core/LoopWidening.cpp
66	Perfect, thanks. Perhaps it could be changed back to false when the invalidation is better? Then it could catch true positives e.g. int *a = new int[10] for (int i = 0; i < 100; ++i) { if (i > 90) ++a; ... }
test/Analysis/loop-widening.c
160	Thank you for the example tests. When an inner loop is widened, any outer loops are widened 'coincidentally' since everything is invalidated. This means the second example test passes. When an inner loop is not widened the outer loop will not be widened. This is because a sink is generated when the max block count is reached the second time through the outer loop. i.e. for (i = 0; i < 10; i++) { for (j = 0; j < 2; j++) { // A sink is generated here on the second iteration of the outer loop, // since this block was already visited twice (when i == 0). ... } } // Never reach here This means the first example test does not pass. I’ve added the tests with a FIXME comment.

Hi Sean,

Sorry it took me so long to get back to you.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
9	Is this field used?
lib/StaticAnalyzer/Core/ExprEngine.cpp
31	What is the purpose of checking whether the block has already been visited by some other path? If I understand correctly, this will stop the analyzer from widening before the "last" iteration through the loop and so will result in a sink after that iteration. What this means is that we will never explore the code beyond the loop in the widened state -- but isn't this the whole point of the widening? So, for example, in your `variable_bound_exiting_loops_not_widened()` test, don't we want the clang_analyzer_eval() statement after the loop to be symbolically executed on 4 separate paths? That is, once when i is 0, once when i is 1, once when i is 2 and once when i is $conj_i$ + 1 where $conj_i$ is the value conjured for i when widening. Also, in general, analysis of one path should not depend at all on whether another path has been explored. This is because we want the analyzer to be free choose different strategies about path exploration (e.g., BFS vs. DFS, prioritizing some paths over others, etc.) without changing the issues reported on along any given path. For this reason, I don't think we necessarily want to track and expose `blockWasVisited()` on CoreEngine or use this to determine when to permit a sink.
lib/StaticAnalyzer/Core/LoopWidening.cpp
99	I get a warning here about comparing a signed int (RegionIndex) to an unsigned (NumRegions). I think you can avoid this and simplify things by using a range-based for loop: const MemRegion Regions[] = { ... }; RegionAndSymbolInvalidationTraits ITraits; for (auto Region : Regions) { ... }

Hi Devin,

Sorry it also took me so long to get back to you.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
9	No. Thanks I'll fix that.
lib/StaticAnalyzer/Core/ExprEngine.cpp
31	I was trying to avoid doing unnecessary invalidation, where the variable bound loop had already exited. I suppose this won’t be a concern when the invalidation is improved. If everyone is happy for loops that have already exited to be widened then I will remove the related changes.
lib/StaticAnalyzer/Core/LoopWidening.cpp
99	Will do. Thanks.

Removed checking for already exited loops
Addressed Comments

seaneveson marked 5 inline comments as done.Oct 27 2015, 3:57 AM

seaneveson added inline comments.

test/Analysis/loop-widening.c
160	The inner loop will now be widened in the first example test, which 'coincidentally' widens the outer loop as well. This means this text now passes.

LGTM. Please commit. Thanks for tackling this, Sean!

Updated to latest revision

Closed by commit rL251621: [Analyzer] Widening loops which do not exit (authored by seaneveson). · Explain WhyOct 29 2015, 3:07 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

clang/

StaticAnalyzer/

Core/

	AnalyzerOptions.h
	AnalyzerOptions.h (revision 251386)

7 lines

PathSensitive/

	LoopWidening.h
	LoopWidening.h (revision 0)

37 lines

lib/

StaticAnalyzer/

Core/

	AnalyzerOptions.cpp
	AnalyzerOptions.cpp (revision 251386)

6 lines

	CMakeLists.txt
	CMakeLists.txt (revision 251386)

1 line

	ExprEngine.cpp
	ExprEngine.cpp (revision 251386)

20 lines

	LoopWidening.cpp
	LoopWidening.cpp (revision 0)

68 lines

test/

Analysis/

	analyzer-config.c
	analyzer-config.c (revision 251386)

3 lines

	analyzer-config.cpp
	analyzer-config.cpp (revision 251386)

3 lines

	loop-widening.c
	loop-widening.c (revision 0)

190 lines

Diff 38519

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h

//===--- AnalyzerOptions.h - Analysis Engine Options ------------- C++ --===//		//===--- AnalyzerOptions.h - Analysis Engine Options ------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
		dcoughlinUnsubmitted Done Reply Inline Actions Is this field used? dcoughlin: Is this field used?
		seanevesonAuthorUnsubmitted Done Reply Inline Actions No. Thanks I'll fix that. seaneveson: No. Thanks I'll fix that.
// This header defines various options for the static analyzer that are set		// This header defines various options for the static analyzer that are set
// by the frontend and are consulted throughout the analyzer.		// by the frontend and are consulted throughout the analyzer.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_STATICANALYZER_CORE_ANALYZEROPTIONS_H		#ifndef LLVM_CLANG_STATICANALYZER_CORE_ANALYZEROPTIONS_H
#define LLVM_CLANG_STATICANALYZER_CORE_ANALYZEROPTIONS_H		#define LLVM_CLANG_STATICANALYZER_CORE_ANALYZEROPTIONS_H

▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	private:
Optional<unsigned> MinCFGSizeTreatFunctionsAsLarge;		Optional<unsigned> MinCFGSizeTreatFunctionsAsLarge;

/// \sa getMaxNodesPerTopLevelFunction		/// \sa getMaxNodesPerTopLevelFunction
Optional<unsigned> MaxNodesPerTopLevelFunction;		Optional<unsigned> MaxNodesPerTopLevelFunction;

/// \sa shouldInlineLambdas		/// \sa shouldInlineLambdas
Optional<bool> InlineLambdas;		Optional<bool> InlineLambdas;

		/// \sa shouldWidenLoops
		Optional<bool> WidenLoops;

/// A helper function that retrieves option for a given full-qualified		/// A helper function that retrieves option for a given full-qualified
/// checker name.		/// checker name.
/// Options for checkers can be specified via 'analyzer-config' command-line		/// Options for checkers can be specified via 'analyzer-config' command-line
/// option.		/// option.
/// Example:		/// Example:
/// @code-analyzer-config unix.Malloc:OptionName=CheckerOptionValue @endcode		/// @code-analyzer-config unix.Malloc:OptionName=CheckerOptionValue @endcode
/// or @code-analyzer-config unix:OptionName=GroupOptionValue @endcode		/// or @code-analyzer-config unix:OptionName=GroupOptionValue @endcode
/// for groups of checkers.		/// for groups of checkers.
▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	public:
///		///
/// This is controlled by the 'max-nodes' config option.		/// This is controlled by the 'max-nodes' config option.
unsigned getMaxNodesPerTopLevelFunction();		unsigned getMaxNodesPerTopLevelFunction();

/// Returns true if lambdas should be inlined. Otherwise a sink node will be		/// Returns true if lambdas should be inlined. Otherwise a sink node will be
/// generated each time a LambdaExpr is visited.		/// generated each time a LambdaExpr is visited.
bool shouldInlineLambdas();		bool shouldInlineLambdas();

		/// Returns true if the analysis should try to widen loops.
		/// This is controlled by the 'widen-loops' config option.
		bool shouldWidenLoops();

public:		public:
AnalyzerOptions() :		AnalyzerOptions() :
AnalysisStoreOpt(RegionStoreModel),		AnalysisStoreOpt(RegionStoreModel),
AnalysisConstraintsOpt(RangeConstraintsModel),		AnalysisConstraintsOpt(RangeConstraintsModel),
AnalysisDiagOpt(PD_HTML),		AnalysisDiagOpt(PD_HTML),
AnalysisPurgeOpt(PurgeStmt),		AnalysisPurgeOpt(PurgeStmt),
DisableAllChecks(0),		DisableAllChecks(0),
ShowCheckerHelp(0),		ShowCheckerHelp(0),
Show All 24 Lines

include/clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h

				//===--- LoopWidening.h - Instruction class definition ----------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// This header contains the declarations of functions which are used to widen
				/// loops which do not otherwise exit. The widening is done by invalidating
				/// anything which might be modified by the body of the loop.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_LOOPWIDENING_H
				#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_LOOPWIDENING_H

				#include "clang/Analysis/CFG.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"

				namespace clang {
				namespace ento {

				/// \brief Get the states that result from widening the loop.
				///
				/// Widen the loop by invalidating anything that might be modified
				/// by the loop body in any iteration.
				ProgramStateRef getWidenedLoopState(ProgramStateRef PrevState,
				const LocationContext *LCtx,
				unsigned BlockCount,
				const Stmt *LoopStmt);

				} // end namespace ento
				} // end namespace clang

				#endif

lib/StaticAnalyzer/Core/AnalyzerOptions.cpp

Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	bool AnalyzerOptions::shouldConditionalizeStaticInitializers() {
return getBooleanOption("cfg-conditional-static-initializers", true);		return getBooleanOption("cfg-conditional-static-initializers", true);
}		}

bool AnalyzerOptions::shouldInlineLambdas() {		bool AnalyzerOptions::shouldInlineLambdas() {
if (!InlineLambdas.hasValue())		if (!InlineLambdas.hasValue())
InlineLambdas = getBooleanOption("inline-lambdas", /Default=/true);		InlineLambdas = getBooleanOption("inline-lambdas", /Default=/true);
return InlineLambdas.getValue();		return InlineLambdas.getValue();
}		}

		bool AnalyzerOptions::shouldWidenLoops() {
		if (!WidenLoops.hasValue())
		WidenLoops = getBooleanOption("widen-loops", /Default=/false);
		return WidenLoops.getValue();
		}

lib/StaticAnalyzer/Core/CMakeLists.txt

Show All 22 Lines	add_clang_library(clangStaticAnalyzerCore
ExplodedGraph.cpp		ExplodedGraph.cpp
ExprEngine.cpp		ExprEngine.cpp
ExprEngineC.cpp		ExprEngineC.cpp
ExprEngineCXX.cpp		ExprEngineCXX.cpp
ExprEngineCallAndReturn.cpp		ExprEngineCallAndReturn.cpp
ExprEngineObjC.cpp		ExprEngineObjC.cpp
FunctionSummary.cpp		FunctionSummary.cpp
HTMLDiagnostics.cpp		HTMLDiagnostics.cpp
		LoopWidening.cpp
MemRegion.cpp		MemRegion.cpp
PathDiagnostic.cpp		PathDiagnostic.cpp
PlistDiagnostics.cpp		PlistDiagnostics.cpp
ProgramState.cpp		ProgramState.cpp
RangeConstraintManager.cpp		RangeConstraintManager.cpp
RegionStore.cpp		RegionStore.cpp
SValBuilder.cpp		SValBuilder.cpp
SVals.cpp		SVals.cpp
Show All 13 Lines

lib/StaticAnalyzer/Core/ExprEngine.cpp

	Show All 14 Lines

	#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
	#include "PrettyStackTraceLocationContext.h"			#include "PrettyStackTraceLocationContext.h"
	#include "clang/AST/CharUnits.h"			#include "clang/AST/CharUnits.h"
	#include "clang/AST/ParentMap.h"			#include "clang/AST/ParentMap.h"
	#include "clang/AST/StmtCXX.h"			#include "clang/AST/StmtCXX.h"
	#include "clang/AST/StmtObjC.h"			#include "clang/AST/StmtObjC.h"
	#include "clang/Basic/Builtins.h"			#include "clang/Basic/Builtins.h"
	#include "clang/Basic/PrettyStackTrace.h"			#include "clang/Basic/PrettyStackTrace.h"
				zaks.annaUnsubmitted Not Done Reply Inline Actions Do we loose precision for loops that need to be executed exactly maxBlockVisitOnPath times? zaks.anna: Do we loose precision for loops that need to be executed exactly maxBlockVisitOnPath times?
				seanevesonAuthorUnsubmitted Not Done Reply Inline Actions Yes. To be precise, the loss is for loops that need to be executed (maxBlockVisitOnPath - 1) times, because processCFGBlockEntrance generates a sink if blockCount >= maxBlockVisitOnPath. With the default value for maxBlockVisitOnPath, a loop that iterated three times would be fully analyzed with widen=false, but would be unnecessarily widened if widen=true. This could be fixed by (effectively) incrementing the value of maxBlockVisitOnPath, when the widening was enabled. e.g. replacing the following line from processCFGBlockEntrance: if (nodeBuilder.getContext().blockCount() >= AMgr.options.maxBlockVisitOnPath) { ... With something like: int blockLimit = AMgr.options.maxBlockVisitOnPath + (AMgr.options.shouldWidenConstantBoundLoops() ? 1 : 0); if (nodeBuilder.getContext().blockCount() >= blockLimit) { ... Does that seem resonable? seaneveson: Yes. To be precise, the loss is for loops that need to be executed (maxBlockVisitOnPath - 1)…
				zaks.annaUnsubmitted Not Done Reply Inline Actions This would increment the value for all loops... I think we should first answer the question below. Do we want to widen all loops, not just loops with "constant bounds"? zaks.anna: This would increment the value for all loops... I think we should first answer the question…
	#include "clang/Basic/SourceManager.h"			#include "clang/Basic/SourceManager.h"
	#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"			#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
	#include "clang/StaticAnalyzer/Core/CheckerManager.h"			#include "clang/StaticAnalyzer/Core/CheckerManager.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h"
	#include "llvm/ADT/ImmutableList.h"			#include "llvm/ADT/ImmutableList.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
				dcoughlinUnsubmitted Not Done Reply Inline Actions What is the purpose of checking whether the block has already been visited by some other path? If I understand correctly, this will stop the analyzer from widening before the "last" iteration through the loop and so will result in a sink after that iteration. What this means is that we will never explore the code beyond the loop in the widened state -- but isn't this the whole point of the widening? So, for example, in your `variable_bound_exiting_loops_not_widened()` test, don't we want the clang_analyzer_eval() statement after the loop to be symbolically executed on 4 separate paths? That is, once when i is 0, once when i is 1, once when i is 2 and once when i is $conj_i$ + 1 where $conj_i$ is the value conjured for i when widening. Also, in general, analysis of one path should not depend at all on whether another path has been explored. This is because we want the analyzer to be free choose different strategies about path exploration (e.g., BFS vs. DFS, prioritizing some paths over others, etc.) without changing the issues reported on along any given path. For this reason, I don't think we necessarily want to track and expose `blockWasVisited()` on CoreEngine or use this to determine when to permit a sink. dcoughlin: What is the purpose of checking whether the block has already been visited by some other path?
				seanevesonAuthorUnsubmitted Not Done Reply Inline Actions I was trying to avoid doing unnecessary invalidation, where the variable bound loop had already exited. I suppose this won’t be a concern when the invalidation is improved. If everyone is happy for loops that have already exited to be widened then I will remove the related changes. seaneveson: I was trying to avoid doing unnecessary invalidation, where the variable bound loop had already…
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#ifndef NDEBUG			#ifndef NDEBUG
	#include "llvm/Support/GraphWriter.h"			#include "llvm/Support/GraphWriter.h"
	#endif			#endif

	using namespace clang;			using namespace clang;
	using namespace ento;			using namespace ento;
	▲ Show 20 Lines • Show All 1,347 Lines • ▼ Show 20 Lines
	}			}

	/// Block entrance. (Update counters).			/// Block entrance. (Update counters).
	void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,			void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
	NodeBuilderWithSinks &nodeBuilder,			NodeBuilderWithSinks &nodeBuilder,
	ExplodedNode *Pred) {			ExplodedNode *Pred) {
	PrettyStackTraceLocationContext CrashInfo(Pred->getLocationContext());			PrettyStackTraceLocationContext CrashInfo(Pred->getLocationContext());

				// If this block is terminated by a loop and it has already been visited the
				// maximum number of times, widen the loop.
				unsigned int BlockCount = nodeBuilder.getContext().blockCount();
				if (BlockCount == AMgr.options.maxBlockVisitOnPath - 1 &&
				AMgr.options.shouldWidenLoops()) {
				const Stmt *Term = nodeBuilder.getContext().getBlock()->getTerminator();
				if (!(Term &&
				(isa<ForStmt>(Term) \|\| isa<WhileStmt>(Term) \|\| isa<DoStmt>(Term))))
				return;
				// Widen.
				const LocationContext *LCtx = Pred->getLocationContext();
				ProgramStateRef WidenedState =
				getWidenedLoopState(Pred->getState(), LCtx, BlockCount, Term);
				nodeBuilder.generateNode(WidenedState, Pred);
				return;
				}

	// FIXME: Refactor this into a checker.			// FIXME: Refactor this into a checker.
	if (nodeBuilder.getContext().blockCount() >= AMgr.options.maxBlockVisitOnPath) {			if (BlockCount >= AMgr.options.maxBlockVisitOnPath) {
	static SimpleProgramPointTag tag(TagProviderName, "Block count exceeded");			static SimpleProgramPointTag tag(TagProviderName, "Block count exceeded");
	const ExplodedNode *Sink =			const ExplodedNode *Sink =
	nodeBuilder.generateSink(Pred->getState(), Pred, &tag);			nodeBuilder.generateSink(Pred->getState(), Pred, &tag);

	// Check if we stopped at the top level function or not.			// Check if we stopped at the top level function or not.
	// Root node should have the location context of the top most function.			// Root node should have the location context of the top most function.
	const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();			const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
	const LocationContext *CalleeSF = CalleeLC->getCurrentStackFrame();			const LocationContext *CalleeSF = CalleeLC->getCurrentStackFrame();
	▲ Show 20 Lines • Show All 1,312 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Core/LoopWidening.cpp

				//===--- LoopWidening.cpp - Instruction class definition --------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// This file contains functions which are used to widen loops. A loop may be
				/// widened to approximate the exit state(s), without analyzing every
				zaks.annaUnsubmitted Done Reply Inline Actions This would allow to widen not just the constant bound loops! I think we should widen any loop that we know we did not fully execute. zaks.anna: This would allow to widen not just the constant bound loops! I think we should widen any loop…
				seanevesonAuthorUnsubmitted Done Reply Inline Actions I agree that the goal should be to widen any loop that isn't fully executed, but we need to consider infinite loops, which might be falsely exited after widening. The way I see it, assuming the widening will be done by invalidating variables/memory, we could either: Only widen loops which definitely exit. Widen all loops unless they are definitely infinite (or widen them in a way which minimizes the number of infinite loops which then exit). Come up with a method which decides if a loops is infinite or not (but tolerate mistakes either way) and widen when we "think" the loop will exit. This is similar to the current approach for constant bound loops. My current preference would be option 2. This is based on the assumption that loops are generally written to exit at some point, and if they aren't, they are unlikely to have code after them anyway. When it does not know which branch the program will go down, the analyzer's approach is to go down both. Similarly if the analyzer does not know whether the loop exits, it should optimistically go down the exit branch (IMO). seaneveson: I agree that the goal should be to widen any loop that isn't fully executed, but we need to…
				/// iteration. The widening is done by invalidating anything which might be
				/// modified by the body of the loop.
				///
				//===----------------------------------------------------------------------===//

				#include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h"

				using namespace clang;
				using namespace ento;

				/// Return the loops condition Stmt or NULL if LoopStmt is not a loop
				static const Expr getLoopCondition(const Stmt LoopStmt) {
				switch (LoopStmt->getStmtClass()) {
				default:
				return NULL;
				case Stmt::ForStmtClass:
				return cast<ForStmt>(LoopStmt)->getCond();
				case Stmt::WhileStmtClass:
				return cast<WhileStmt>(LoopStmt)->getCond();
				case Stmt::DoStmtClass:
				return cast<DoStmt>(LoopStmt)->getCond();
				}
				}

				namespace clang {
				namespace ento {

				ProgramStateRef getWidenedLoopState(ProgramStateRef PrevState,
				const LocationContext *LCtx,
				unsigned BlockCount, const Stmt *LoopStmt) {

				assert(isa<ForStmt>(LoopStmt) \|\| isa<WhileStmt>(LoopStmt) \|\|
				isa<DoStmt>(LoopStmt));

				// Invalidate values in the current state.
				// TODO Make this more conservative by only invalidating values that might
				// be modified by the body of the loop.
				// TODO Nested loops are currently widened as a result of the invalidation
				// being so inprecise. When the invalidation is improved, the handling
				// of nested loops will also need to be improved.
				const StackFrameContext *STC = LCtx->getCurrentStackFrame();
				MemRegionManager &MRMgr = PrevState->getStateManager().getRegionManager();
				const MemRegion *Regions[] = {MRMgr.getStackLocalsRegion(STC),
				MRMgr.getStackArgumentsRegion(STC),
				MRMgr.getGlobalsRegion()};
				RegionAndSymbolInvalidationTraits ITraits;
				for (auto *Region : Regions) {
				ITraits.setTrait(Region,
				RegionAndSymbolInvalidationTraits::TK_EntireMemSpace);
				}
				return PrevState->invalidateRegions(Regions, getLoopCondition(LoopStmt),
				BlockCount, LCtx, true, nullptr, nullptr,
				&ITraits);
				}
				dcoughlinUnsubmitted Done Reply Inline Actions Passing `true` here instead of `false` for `CausedByPointerEscape` should fix the false alarm. This will cause invalidation to tell the MallocChecker that the pointer has escaped and so shouldn't be treated as a leak. dcoughlin: Passing `true` here instead of `false` for `CausedByPointerEscape` should fix the false alarm.

				seanevesonAuthorUnsubmitted Not Done Reply Inline Actions Perfect, thanks. Perhaps it could be changed back to false when the invalidation is better? Then it could catch true positives e.g. int a = new int[10] for (int i = 0; i < 100; ++i) { if (i > 90) ++a; ... } seaneveson:* Perfect, thanks. Perhaps it could be changed back to false when the invalidation is better?
				} // end namespace ento
				} // end namespace clang
				zaks.annaUnsubmitted Done Reply Inline Actions iteration -> iteration What do you mean by "approximate"? Note that people rely on the analyzer to track the exact values of variables. They expect it to know what the value is. It can be very confusing if the analyzer states that it knows the value of the variable and it is the wrong value. I am concerned with changing the value of a single variable of the loop based on a syntactic check. Also, I am not sure there is a strong need for preserving the value of the index variable either. zaks.anna: iteration -> iteration What do you mean by "approximate"? Note that people rely on the…
				dcoughlinUnsubmitted Done Reply Inline Actions This doesn't seem quite right. Consider: int i; for (i = 0; i < 21; i += 3) {} clang_analyzer_eval(i == 23); The value of `i` should be 21 after the loop, but this code sets it to 23. And what happens if `i` starts at 1 instead of 0? Another (ridiculous) case to consider: for (i = 0; i < 21; i += 3) { if (random() % 2 == 1) { i = i * i; } else { i--; } } What are the possible values of `i` after the loop? dcoughlin: This doesn't seem quite right. Consider: ``` int i; for (i = 0; i < 21; i += 3) {}…
				dcoughlinUnsubmitted Done Reply Inline Actions I get a warning here about comparing a signed int (RegionIndex) to an unsigned (NumRegions). I think you can avoid this and simplify things by using a range-based for loop: const MemRegion Regions[] = { ... }; RegionAndSymbolInvalidationTraits ITraits; for (auto Region : Regions) { ... } dcoughlin: I get a warning here about comparing a signed int (RegionIndex) to an unsigned (NumRegions). I…
				seanevesonAuthorUnsubmitted Done Reply Inline Actions Will do. Thanks. seaneveson: Will do. Thanks.

test/Analysis/analyzer-config.c

	Show All 19 Lines
	// CHECK-NEXT: ipa-always-inline-size = 3			// CHECK-NEXT: ipa-always-inline-size = 3
	// CHECK-NEXT: leak-diagnostics-reference-allocation = false			// CHECK-NEXT: leak-diagnostics-reference-allocation = false
	// CHECK-NEXT: max-inlinable-size = 50			// CHECK-NEXT: max-inlinable-size = 50
	// CHECK-NEXT: max-nodes = 150000			// CHECK-NEXT: max-nodes = 150000
	// CHECK-NEXT: max-times-inline-large = 32			// CHECK-NEXT: max-times-inline-large = 32
	// CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14			// CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14
	// CHECK-NEXT: mode = deep			// CHECK-NEXT: mode = deep
	// CHECK-NEXT: region-store-small-struct-limit = 2			// CHECK-NEXT: region-store-small-struct-limit = 2
				// CHECK-NEXT: widen-loops = false
	// CHECK-NEXT: [stats]			// CHECK-NEXT: [stats]
	// CHECK-NEXT: num-entries = 14			// CHECK-NEXT: num-entries = 15

test/Analysis/analyzer-config.cpp

	Show All 30 Lines
	// CHECK-NEXT: ipa-always-inline-size = 3			// CHECK-NEXT: ipa-always-inline-size = 3
	// CHECK-NEXT: leak-diagnostics-reference-allocation = false			// CHECK-NEXT: leak-diagnostics-reference-allocation = false
	// CHECK-NEXT: max-inlinable-size = 50			// CHECK-NEXT: max-inlinable-size = 50
	// CHECK-NEXT: max-nodes = 150000			// CHECK-NEXT: max-nodes = 150000
	// CHECK-NEXT: max-times-inline-large = 32			// CHECK-NEXT: max-times-inline-large = 32
	// CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14			// CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14
	// CHECK-NEXT: mode = deep			// CHECK-NEXT: mode = deep
	// CHECK-NEXT: region-store-small-struct-limit = 2			// CHECK-NEXT: region-store-small-struct-limit = 2
				// CHECK-NEXT: widen-loops = false
	// CHECK-NEXT: [stats]			// CHECK-NEXT: [stats]
	// CHECK-NEXT: num-entries = 19			// CHECK-NEXT: num-entries = 20

test/Analysis/loop-widening.c

				// RUN: %clang_cc1 -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-max-loop 4 -analyzer-config widen-loops=true -verify %s

				void clang_analyzer_eval(int);
				void clang_analyzer_warnIfReached();
				dcoughlinUnsubmitted Done Reply Inline Actions There seems to be some weirdness when using `_Bool` in the prototype (vs. int). It adds an extra `!= 0` to symbolic expressions, which can result in clang_analyzer_eval() yielding `UNKNOWN` when using `extern void clang_analyze_eval(int)` would print `TRUE` or `FALSE`. We should track this down and fix it, but for now I think it is better to use `extern void clang_analyze_eval(int)`. dcoughlin: There seems to be some weirdness when using `_Bool` in the prototype (vs. int). It adds an…

				typedef __typeof(sizeof(int)) size_t;
				void *malloc(size_t);
				void free(void *);

				void loop_which_iterates_limit_times_not_widened() {
				int i;
				int x = 1;
				// Check loop isn't widened by checking x isn't invalidated
				for (i = 0; i < 1; ++i) {}
				dcoughlinUnsubmitted Not Done Reply Inline Actions This is clever. dcoughlin: This is clever.
				clang_analyzer_eval(x == 1); // expected-warning {{TRUE}}
				for (i = 0; i < 2; ++i) {}
				clang_analyzer_eval(x == 1); // expected-warning {{TRUE}}
				for (i = 0; i < 3; ++i) {}
				// FIXME loss of precision as a result of evaluating the widened loop body
				// instead of the last iteration.
				clang_analyzer_eval(x == 1); // expected-warning {{UNKNOWN}}
				}

				int a_global;

				void loop_evaluated_before_widening() {
				int i;
				a_global = 1;
				for (i = 0; i < 10; ++i) {
				if (i == 2) {
				// True before widening then unknown after.
				clang_analyzer_eval(a_global == 1); // expected-warning{{TRUE}} expected-warning{{UNKNOWN}}
				}
				}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void warnings_after_loop() {
				int i;
				for (i = 0; i < 10; ++i) {}
				char m = (char)malloc(12);
				} // expected-warning {{Potential leak of memory pointed to by 'm'}}

				void for_loop_exits() {
				int i;
				for (i = 0; i < 10; ++i) {}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void while_loop_exits() {
				int i = 0;
				while (i < 10) {++i;}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void do_while_loop_exits() {
				int i = 0;
				do {++i;} while (i < 10);
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void loop_body_is_widened() {
				int i = 0;
				while (i < 100) {
				if (i > 10) {
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}
				++i;
				}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void invariably_infinite_loop() {
				int i = 0;
				while (1) { ++i; }
				clang_analyzer_warnIfReached(); // no-warning
				}

				void invariably_infinite_break_loop() {
				int i = 0;
				while (1) {
				++i;
				int x = 1;
				if (!x) break;
				}
				clang_analyzer_warnIfReached(); // no-warning
				}

				void reachable_break_loop() {
				int i = 0;
				while (1) {
				++i;
				if (i == 100) break;
				}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void condition_constrained_true_in_loop() {
				int i = 0;
				while (i < 50) {
				clang_analyzer_eval(i < 50); // expected-warning {{TRUE}}
				++i;
				}
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void condition_constrained_false_after_loop() {
				int i = 0;
				while (i < 50) {
				++i;
				}
				clang_analyzer_eval(i >= 50); // expected-warning {{TRUE}}
				dcoughlinUnsubmitted Done Reply Inline Actions After changing the `clang_analyzer_eval()` prototype to take an int, you can use `clang_analyzer_eval(i >= 50); // expected-warning {{TRUE}}`. dcoughlin: After changing the `clang_analyzer_eval()` prototype to take an int, you can use…
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}

				void multiple_exit_test() {
				int x = 0;
				int i = 0;
				while (i < 50) {
				if (x) {
				i = 10;
				break;
				}
				++i;
				}
				// Reachable by 'normal' exit
				if (i == 50) clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				// Reachable by break point
				if (i == 10) clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				// Not reachable
				if (i < 10) clang_analyzer_warnIfReached(); // no-warning
				if (i > 10 && i < 50) clang_analyzer_warnIfReached(); // no-warning
				}

				void pointer_doesnt_leak_from_loop() {
				int h_ptr = (int ) malloc(sizeof(int));
				for (int i = 0; i < 2; ++i) {}
				for (int i = 0; i < 10; ++i) {} // no-warning
				free(h_ptr);
				}

				int g_global;

				void unknown_after_loop(int s_arg) {
				g_global = 0;
				s_arg = 1;
				int s_local = 2;
				int *h_ptr = malloc(sizeof(int));

				for (int i = 0; i < 10; ++i) {}

				clang_analyzer_eval(g_global); // expected-warning {{UNKNOWN}}
				clang_analyzer_eval(s_arg); // expected-warning {{UNKNOWN}}
				clang_analyzer_eval(s_local); // expected-warning {{UNKNOWN}}
				clang_analyzer_eval(h_ptr == 0); // expected-warning {{UNKNOWN}}
				free(h_ptr);
				}

				void variable_bound_exiting_loops_widened(int x) {
				dcoughlinUnsubmitted Done Reply Inline Actions I think it would be good to add some nested loop tests. For example: void nested1() { int i = 0, j = 0; for (i = 0; i < 10; i++) { clang_analyzer_eval(i < 10); // expected-warning {{TRUE}} for (j = 0; j < 2; j++) { clang_analyzer_eval(j < 2); // expected-warning {{TRUE}} } clang_analyzer_eval(j >= 2); // expected-warning {{TRUE}} } clang_analyzer_eval(i >= 10); // expected-warning {{TRUE}} } void nested2() { int i = 0, j = 0; for (i = 0; i < 2; i++) { clang_analyzer_eval(i < 2); // expected-warning {{TRUE}} for (j = 0; j < 10; j++) { clang_analyzer_eval(j < 10); // expected-warning {{TRUE}} } clang_analyzer_eval(j >= 10); // expected-warning {{TRUE}} } clang_analyzer_eval(i >= 2); // expected-warning {{TRUE}} } dcoughlin: I think it would be good to add some nested loop tests. For example: ``` void nested1() {…
				int i = 0;
				seanevesonAuthorUnsubmitted Not Done Reply Inline Actions Thank you for the example tests. When an inner loop is widened, any outer loops are widened 'coincidentally' since everything is invalidated. This means the second example test passes. When an inner loop is not widened the outer loop will not be widened. This is because a sink is generated when the max block count is reached the second time through the outer loop. i.e. for (i = 0; i < 10; i++) { for (j = 0; j < 2; j++) { // A sink is generated here on the second iteration of the outer loop, // since this block was already visited twice (when i == 0). ... } } // Never reach here This means the first example test does not pass. I’ve added the tests with a FIXME comment. seaneveson: Thank you for the example tests. When an inner loop is widened, any outer loops are widened…
				seanevesonAuthorUnsubmitted Not Done Reply Inline Actions The inner loop will now be widened in the first example test, which 'coincidentally' widens the outer loop as well. This means this text now passes. seaneveson: The inner loop will now be widened in the first example test, which 'coincidentally' widens the…
				int t = 1;
				while (i < x) {
				++i;
				}
				clang_analyzer_eval(t == 1); // expected-warning {{TRUE}} // expected-warning {{UNKNOWN}}
				}

				void nested_loop_outer_widen() {
				int i = 0, j = 0;
				for (i = 0; i < 10; i++) {
				clang_analyzer_eval(i < 10); // expected-warning {{TRUE}}
				for (j = 0; j < 2; j++) {
				clang_analyzer_eval(j < 2); // expected-warning {{TRUE}}
				}
				clang_analyzer_eval(j >= 2); // expected-warning {{TRUE}}
				}
				clang_analyzer_eval(i >= 10); // expected-warning {{TRUE}}
				}

				void nested_loop_inner_widen() {
				int i = 0, j = 0;
				for (i = 0; i < 2; i++) {
				clang_analyzer_eval(i < 2); // expected-warning {{TRUE}}
				for (j = 0; j < 10; j++) {
				clang_analyzer_eval(j < 10); // expected-warning {{TRUE}}
				}
				clang_analyzer_eval(j >= 10); // expected-warning {{TRUE}}
				}
				clang_analyzer_eval(i >= 2); // expected-warning {{TRUE}}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Widening loops which do not exitClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 38519

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h

include/clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h

lib/StaticAnalyzer/Core/AnalyzerOptions.cpp

lib/StaticAnalyzer/Core/CMakeLists.txt

lib/StaticAnalyzer/Core/ExprEngine.cpp

lib/StaticAnalyzer/Core/LoopWidening.cpp

test/Analysis/analyzer-config.c

test/Analysis/analyzer-config.cpp

test/Analysis/loop-widening.c

[Analyzer] Widening loops which do not exit
ClosedPublic