This is an archive of the discontinued LLVM Phabricator instance.

[StaticAnalyser] Don't merge different returns in ExplodedGraph
ClosedPublic

Authored by danielmarjamaki on Oct 6 2016, 8:06 AM.

Details

Summary

Returns when calling an inline function should not be merged in the ExplodedGraph unless they are same.

Background post on cfe-dev:
http://lists.llvm.org/pipermail/cfe-dev/2016-October/051001.html

Here is an example patch that solves my false positives and also fixes 2 false negatives in existing tests.

What do you think about this approach?

Diff Detail

Repository
rL LLVM

Event Timeline

danielmarjamaki retitled this revision from to [StaticAnalyser] Don't merge different returns in ExplodedGraph.
danielmarjamaki updated this object.
danielmarjamaki set the repository for this revision to rL LLVM.

Daniel, please, add reviewers to this patch.

NoQ added a comment.Oct 6 2016, 7:07 PM

Funny facts about the Environment:

(a) Environment allows taking SVals of ReturnStmt, which is not an expression, by transparently converting it into its sub-expression. In fact, it only stores expressions.

Having just noticed (a), i also understand that (a) is not of direct importance to the question, however:

(b) With a similar trick, Environment allows taking SVal of literal expressions, but never stores literal expressions. Instead, it reconstructs the constant value by looking at the literal and returns the freshly constructed value when asked.

This leads to "return 0;" and "return 1;" branches having the same program state. The difference should have been within Environment (return values are different), however return values are literals, and they aren't stored in the Environment, and hence the Environment is equal in both states. If only the function returned, say, 0 and i, the problem wouldn't have been there.

This leaves us with two options:

  1. Tell the Environment to store everything. This would make things heavy. However, i do not completely understand the consequences of this bug at the moment - perhaps there would be more problems due to this state merging unless we start storing everything.
  1. Rely on the ProgramPoint to remember where we are. After all, why do we merge when program points should clearly be different? The program point that fails us is CallExitBegin - we could construct it with ReturnStmt and its block count to discriminate between different returns. It should fix this issue, and is similar to the approach taken in this patch (just puts things to their own place), however, as i mentioned, there might be more problems with misbehavior (b) of the Environment, need to think.

Finally, i'm not quite sure why CallExitBegin is at all necessary. I wonder if we could just remove it and jump straight to Bind Return Value, also need to think.

In D25326#564082, @NoQ wrote:

Funny facts about the Environment:

(a) Environment allows taking SVals of ReturnStmt, which is not an expression, by transparently converting it into its sub-expression. In fact, it only stores expressions.

Having just noticed (a), i also understand that (a) is not of direct importance to the question, however:

(b) With a similar trick, Environment allows taking SVal of literal expressions, but never stores literal expressions. Instead, it reconstructs the constant value by looking at the literal and returns the freshly constructed value when asked.

This leads to "return 0;" and "return 1;" branches having the same program state. The difference should have been within Environment (return values are different), however return values are literals, and they aren't stored in the Environment, and hence the Environment is equal in both states. If only the function returned, say, 0 and i, the problem wouldn't have been there.

I did not know "a" and "b", thanks for info.

This leaves us with two options:

  1. Tell the Environment to store everything. This would make things heavy. However, i do not completely understand the consequences of this bug at the moment - perhaps there would be more problems due to this state merging unless we start storing everything.
  1. Rely on the ProgramPoint to remember where we are. After all, why do we merge when program points should clearly be different? The program point that fails us is CallExitBegin - we could construct it with ReturnStmt and its block count to discriminate between different returns. It should fix this issue, and is similar to the approach taken in this patch (just puts things to their own place), however, as i mentioned, there might be more problems with misbehavior (b) of the Environment, need to think.
  1. yes sounds heavy.
  2. I assume you are right.. however as I understand it the ProgramPoint when CallExitBegin is called is the same (in the exit block). do you suggest that we take the ProgramPoint for the exit block's predecessor?

Finally, i'm not quite sure why CallExitBegin is at all necessary. I wonder if we could just remove it and jump straight to Bind Return Value, also need to think.

me too. however there is much I wonder about when it comes to ExplodedGraph. :-)

NoQ edited edge metadata.Oct 7 2016, 1:29 AM

as I understand it the ProgramPoint when CallExitBegin is called is the same (in the exit block). do you suggest that we take the ProgramPoint for the exit block's predecessor?

CallExitBegin is the program point. I propose to make different exits from the same function be different program points. This could be achieved by adding more members to the CallExitBegin class - return statement and block count would probably be sufficient.

In fact, not sure if we need block count. If we're, say, returning from a loop with the same statement, then we're either returning different values, or returning the same literal expression, so Environment would do the job for us well in both cases.

ok. As far as I see it's not trivial to know which ReturnStmt there was when CallExitBegin is created. Do you suggest that I move it or that I try to lookup the ReturnStmt? I guess it can be looked up by looking in the predecessors in the ExplodedGraph?

Finally, i'm not quite sure why CallExitBegin is at all necessary. I wonder if we could just remove it and jump straight to Bind Return Value, also need to think.

.. unless that is better apprach

NoQ added a comment.Oct 7 2016, 3:23 AM

ok. As far as I see it's not trivial to know which ReturnStmt there was when CallExitBegin is created.

We're in HandleBlockEdge, just pass down the statement from CFG here?

.. unless that is better apprach

It seems to have something to do with separation of duties between CoreEngine and ExprEngine. Kind of, CoreEngine explores the CFG, ExprEngine models effects of statements, and noticing end of function is CoreEngine's duty, while binding the return value is ExprEngine's duty, and CallExitBegin acts like a message from CoreEngine to ExprEngine so that they could work together. That's how it seems to me, but i'm not sure of the original intention here.

In D25326#564283, @NoQ wrote:

ok. As far as I see it's not trivial to know which ReturnStmt there was when CallExitBegin is created.

We're in HandleBlockEdge, just pass down the statement from CFG here?

I don't directly see how you mean. Code is:

void CoreEngine::HandleBlockEdge(const BlockEdge &L, ExplodedNode *Pred) {

  const CFGBlock *Blk = L.getDst();

The Blk->dump() says:

[B0 (EXIT)]
   Preds (2): B1 B2
NoQ added a comment.Oct 7 2016, 3:54 AM

The Blk->dump() says:

[B0 (EXIT)]
   Preds (2): B1 B2

What about the source block?

In D25326#564283, @NoQ wrote:

ok. As far as I see it's not trivial to know which ReturnStmt there was when CallExitBegin is created.

We're in HandleBlockEdge, just pass down the statement from CFG here?

I don't directly see how you mean. Code is:

void CoreEngine::HandleBlockEdge(const BlockEdge &L, ExplodedNode *Pred) {

  const CFGBlock *Blk = L.getDst();

The Blk->dump() says:

[B0 (EXIT)]
   Preds (2): B1 B2

Sorry... I think I see. L.getSrc() will give me the cfg block I am interested in.

danielmarjamaki edited edge metadata.
danielmarjamaki removed rL LLVM as the repository for this revision.

Refactoring.

NoQ accepted this revision.Oct 7 2016, 7:17 AM
NoQ edited edge metadata.

Yay, this is great. Thanks for investigating all those loss of coverage issues!

lib/StaticAnalyzer/Core/CoreEngine.cpp
610 ↗(On Diff #73926)

Space after ",".

639 ↗(On Diff #73926)

Space after ",".

lib/StaticAnalyzer/Core/ExprEngine.cpp
1792 ↗(On Diff #73926)

Space after ",".

This revision is now accepted and ready to land.Oct 7 2016, 7:17 AM
zaks.anna accepted this revision.Oct 7 2016, 9:23 AM
zaks.anna edited edge metadata.

Please, fix the style issues before committing.

include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
266 ↗(On Diff #73926)

Add spaces around '=.'

dcoughlin edited edge metadata.Oct 7 2016, 10:03 AM

Can you also add a test that tests this more directly (i.e., with clang_analyzer_warnIfReached). I don't think it is good to have the only test for this core coverage issue to be in tests for an alpha checker. Adding the direct test would also make it easier to track down any regression if it happens. The 'func.c' test file might be a good place for such a test.

a.sidorin added inline comments.Oct 7 2016, 10:13 AM
test/Analysis/unreachable-code-path.c
201 ↗(On Diff #73926)

I have a small question. Is it possible to simplify this sample with removing of table[] array? Like putting something like i != 0 into condition. As I understand, the problem is not array-related.

NoQ added inline comments.Oct 7 2016, 10:15 AM
test/Analysis/unreachable-code-path.c
201 ↗(On Diff #73926)

Any UnknownVal in the condition would trigger this issue.

seurer added a subscriber: seurer.Oct 7 2016, 10:36 AM
This revision was automatically updated to reflect the committed changes.
test/Analysis/unreachable-code-path.c
201 ↗(On Diff #73926)

Is it possible to simplify this sample with removing of table[] array?

I tried to simplify as much as possible. But as NoQ says an UnknownVal is required here for this test.

Please, fix the style issues before committing.

Sorry I missed that.

Ideally it would be possible to run clang-format on the files before committing. but currently I get lots of unrelated changes then.

Would it be ok to run clang-format on some files to clean up the formatting? At least some minor changes like removing trailing spaces.

NoQ added a comment.Oct 10 2016, 1:01 AM

Ideally it would be possible to run clang-format on the files before committing. but currently I get lots of unrelated changes then.

Would it be ok to run clang-format on some files to clean up the formatting? At least some minor changes like removing trailing spaces.

clang-format should be able to format small sections of the code, keeping in mind the rest of the file but not touching it. In vim, i use a hotkey for formatting the statement under cursor. I've seen a plugin for Xcode that clang-format's selected text. Also, there's this git clang-format thingy, which should be able to clang-format git changes and only them, i didn't try, but it might be universally useful.

dcoughlin added inline comments.Oct 10 2016, 8:23 AM
test/Analysis/unreachable-code-path.c
201 ↗(On Diff #73926)

This is worth a comment in the test then, so that if we ever add symbolic reasoning we'll know how to adjust the test.

You could also try to add a canary with clang analyzer eval after the if statement to force the test to fail if we do add this symbolic reasoning.

Please, fix the style issues before committing.

Would it be ok to run clang-format on some files to clean up the formatting? At least some minor changes like removing trailing spaces.

We generally try to avoid large-scale fixes of formatting and spacing issues if they are not on lines already modified by a patch. This makes it a bit easier to track down the original commit for a line and also reduces merge conflicts.

I agree with the comments from you dcoughlin but I am not sure how to do it.

Can you also add a test that tests this more directly (i.e., with clang_analyzer_warnIfReached). I don't think it is good to have the only test for this core coverage issue to be in tests for an alpha checker. Adding the direct test would also make it easier to track down any regression if it happens. The 'func.c' test file might be a good place for such a test.

I totally agree.

In func.c there such comments:
// expected-warning{{FALSE|TRUE|UNKNOWN}}

what does those FALSE|TRUE|UNKNOWN do?

I don't see what this will do:

clang_analyzer_eval(!f);

I want that both returns are reached. and I want to ensure that result from function is both 1 and 0.

You could also try to add a canary with clang analyzer eval after the if statement to force the test to fail if we do add this symbolic reasoning.

sounds good. sorry but I don't see how to do it.

You could also try to add a canary with clang analyzer eval after the if statement to force the test to fail if we do add this symbolic reasoning.

sounds good. sorry but I don't see how to do it.

The trick is to not first store the UnknownVal into a local (the analyzer will automatically create a symbol for that when read).

Instead you can do something like:

if (table[i] != 0) { }
clang_analyzer_eval((table[i] != 0)) // expected-warning {{UNKNOWN}}

This way if the analyzer ever starts generating symbols for a read from table[i] then the case split will change the eval to emit TRUE on one path and FALSE on the other.