This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
18/44
RangeConstraintManager.cpp
-
test/Analysis/
-
Analysis/
2/3
constraint_manager_conditions.cpp

Differential D78933

[analyzer] Reasoning about comparison expressions in RangeConstraintManager
ClosedPublic

Authored by ASDenysPetrov on Apr 27 2020, 8:00 AM.

Download Raw Diff

Details

Reviewers

NoQ
baloghadamsoftware
steakhal
xazax.hun
dcoughlin
Charusso
vsavchenko

Commits

rGe1741e34e00b: [analyzer] Reasoning about comparison expressions in RangeConstraintManager

Summary

I got an idea how to make RangeConstraintManager more sofisticated within comparison operations.
I want you speak out, share your vision about this idea. Briefly speaking, RangeConstraintManager shall care about the sanity of comparison expressions sequences and help reasonably to branch an exploded graph.

The idea came to me as a thing which resolves this bug PR13426.
Let's consider the next snippet:

int foo(int x, int y) {
    int z;
    if (x == y) {
        z = 0;
    }
    if (x != y) {
        z = 1;
    }
    return z;
}

Obviously that z will be initialized, but CSA reports next:

warning: Undefined or garbage value returned to caller
      [core.uninitialized.UndefReturn]
        return z;

It happenes because CSA does not take into account that x == y and x != y are just two opposites, as if it was:

if (x == y) {
    z = 0;
} else {
    z = 1;
}

So my improvements are in handling case above and similar ones. Assume for some comparison x < y we look for any other comparisons with the same operands (x > y, y != x, etc.). Then we do logical calculations and refuse impossible branches.
For example:

x < y and x > y at the same time are impossible.
x >= y and x != y at the same time makes x > y true only.
x == y and y == x are just reversed but the same.

It covers all possible combinations and is described in the next table.

	<	>	<=	>=	==	!=	UnknownX2
<	1	0	*	0	0	*	1
>	0	1	0	*	0	*	1
<=	1	0	1	*	1	*	0
>=	0	1	*	1	1	*	0
==	0	0	*	*	1	0	1
!=	1	1	*	*	0	1	0

The table holds states for the preceding comparison operation assuming the latter is true.
For instance, assume x < y is true that means x != y is surely true:

if (x preciding_operation y) // <    | !=      | >
    if (x operation y)       // !=   | >       | <
         tristate            // True | Unknown | False

Columns stands for a preceding operator. Rows stands for a current operator. Each row has exactly two Unknown cases. UnknownX2 means that both Unknown preceding operators are met in code, and there is a special column for that, for example:

if (x >= y)
  if (x != y)
    if (x <= y)
      False only

As a result of processing an example below, we have

before.html3 MBDownload

and

after.html293 KBDownload

exploded graphs.

void foo(int y, int x) {
  if (x < y) {
    if (y > x);
    if (x > y);
    if (x <= y);
    if (y <= x);
    if (x == y);
    if (y != x);
  }
}

Please, express your thoughts down here, if it is worth to be.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ASDenysPetrov created this revision.Apr 27 2020, 8:00 AM

Herald added subscribers: cfe-commits, martong, Charusso, rnkovacs. · View Herald TranscriptApr 27 2020, 8:00 AM

ASDenysPetrov updated this revision to Diff 260331.Apr 27 2020, 8:03 AM

ASDenysPetrov added reviewers: dcoughlin, Charusso.

ASDenysPetrov edited the summary of this revision. (Show Details)Apr 27 2020, 8:12 AM

ASDenysPetrov edited the summary of this revision. (Show Details)Apr 27 2020, 8:26 AM

ASDenysPetrov retitled this revision from RangeConstraintManager optimizations in comparison expressions to [analyzer] RangeConstraintManager optimizations in comparison expressions.

Herald added subscribers: dkrupp, donat.nagy, Szelethus and 3 others. · View Herald TranscriptApr 27 2020, 8:27 AM

ASDenysPetrov edited the summary of this revision. (Show Details)Apr 27 2020, 8:32 AM

Harbormaster failed remote builds in B54807: Diff 260302!Apr 27 2020, 9:39 AM

ASDenysPetrov edited the summary of this revision. (Show Details)May 1 2020, 2:47 PM

ASDenysPetrov added a reviewer: vsavchenko.

ASDenysPetrov updated this revision to Diff 261552.May 1 2020, 2:49 PM

ASDenysPetrov edited the summary of this revision. (Show Details)May 1 2020, 2:59 PM

ASDenysPetrov edited the summary of this revision. (Show Details)

ASDenysPetrov updated this revision to Diff 261556.May 1 2020, 3:02 PM

ASDenysPetrov updated this revision to Diff 261563.May 1 2020, 3:39 PM

ASDenysPetrov edited the summary of this revision. (Show Details)

NoQ mentioned this in D79232: [analyzer] Refactor range inference for symbolic expressions.May 1 2020, 11:16 PM

ASDenysPetrov marked an inline comment as done.May 6 2020, 1:53 AM

ASDenysPetrov added inline comments.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1182–1183	Folk, is this a good idea to explicitly create bool ranges as a return value? As for me, comparisons like `>`, `<`, etc. can only produce bool-based ranges, otherwise it would be weird.

Guys,
@xazax.hun, @Charusso, @steakhal, @vsavchenko, @NoQ, @baloghadamsoftware,
please, tell something. What do you think of this improvement?

In D78933#2022288, @ASDenysPetrov wrote:

Guys,
@xazax.hun, @Charusso, @steakhal, @vsavchenko, @NoQ, @baloghadamsoftware,
please, tell something. What do you think of this improvement?

Sorry, I will definitely take a look at this later today! In the meantime, we are always curious about the performance and usability implications of those patches.
Did you do some measurements on some projects how the run-time changes?
Also, it would be useful to see how statistics like coverage patterns, number of exploded nodes etc changes. See the output of -analyzer-stats option for details.

clang/test/Analysis/constraint_manager_conditions.cpp
214	Nit: please add new line at the end of the file.

xazax.hun added inline comments.May 6 2020, 5:02 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	I have really hard time processing how to interpret `AnyX2`. For example in the code below: if (x >= y) if (x != y) if (x <= y) return false if (x >= y) if (x == y) if (x <= y) return true We would get different results for `<=`. So I do not really get how I should read the `AnyX2` column.
1175	Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this function if one of them is not a simple symbol?
1182–1183	I think modeling booleans with ranges is OK. This is what the analyzer is doing at the moment. But I guess the question is about whether boolean results of comparisons is a good way to store the relationships between the symbols. I am not convinced about that see my inline comment below.
1188	I am not really familiar with how constraints are represented but I vaguely recall the analyzer sometimes normalizing some expressions like conversion `A == B` to `A - B == 0`. I am just wondering if this API to look this expression up is not the right abstraction as it might be better to handle such normalizations in a unified, central way. Also, note that this method does not handle transitivity. I wonder if maintaining set of ranges is the right representation for this information at all. The ordering between the symbols will define a lattice. Representing that lattice directly instead of using ranges might be more efficient.

Thanks for working on this, I do believe the analyzer would greatly profit from better constraint solving capabilities. Unfortunately, we had some troubles in the past trying to improve upon the current status and we had to revert multiple patches. This is why the community is super cautious when it comes to changes like this.

A high-level note:

Currently the constraint solver we have is interval based.
I think this patch makes a step towards a more refined domain: octagons [1].

At this point I wonder, if we should study octagons first, and see if there are some efficient implementation strategies are available.
I do not insist on rewriting the whole constraint manager to be octagon-based, but it would be useful to see what we actually pursuing here and what are the implications.

[1]: https://arxiv.org/pdf/cs/0703084.pdf

@xazax.hun, many thanks for looking at the patch.

Did you do some measurements on some projects how the run-time changes?

I have never run them before. How I can do this? What project to use as a sample?

[1]: https://arxiv.org/pdf/cs/0703084.pdf

I will definitely read it, thank you.

Also, it would be useful to see how statistics like coverage patterns, number of exploded nodes etc changes.

You can check

before.html3 MBDownload

and

after.html293 KBDownload

exploded graphs, to see the number of nodes.

See the output of -analyzer-stats option for details.

I didn't find this option in clang --help. I've never use it. Please, explain how to use it.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	I'm sorry for the obscure explanation. I was really thinking about how to explain it more clear. Let's consider this code: if (x >= y) if (x != y) if (x <= y) If 'x >= y' is true, then following `x <= y` is `Any` ( can be true or false ). See table in the summary. If 'x != y' is true, then following `x <= y` is `Any` ( can be true or false ). If 'x >= y && x != y' is true, then following `x <= y` is `False` only (it means that we previously met two Any states, what I've called as AnyX2).
1175	I think it makes sense, because `(x+3y) > z` and `z < (x+3y)` perfectly fits in the logic.
1182–1183	Once I met a case when logic operator `<` retruns me int-based ranges {0,0} on the false branch and {1, 429496725} (instead of bool-based {1,1}) on the true branch. It confused me, so the question arised.
1188	conversion A == B to A - B == 0. All my samples `x == y` worked as `x == y`. I havn't seen such transformations. Representing that lattice directly instead of using ranges might be more efficient. How do you imagine that lattice? Could you detail it?

In D78933#2022684, @ASDenysPetrov wrote:

I have never run them before. How I can do this? What project to use as a sample?

Unfortunately, we do not really have a well-defined set of benchmarks. But as a first step, you could run the analyzer on LLVM before and after applying your patch and measure the run-time. Or you could start with a smaller project (you can pick any well-known project from GitHub or other similar sites) as checking LLVM can be really time-consuming (especially in debug mode). To run the analyzer on a project I recommend using scan-build or CodeChecker. If you feel like, you can also experiment with the CSA-testbench (https://github.com/Xazax-hun/csa-testbench), but it requires quite a lot of effort to set up upfront.

Also, it would be useful to see how statistics like coverage patterns, number of exploded nodes etc changes.

You can check
before.html3 MBDownload
and
after.html293 KBDownload
exploded graphs, to see the number of nodes.

This is only for a small artificial example. We would like to learn more about how the node count changes for a large real-world project like LLVM. We do not need to see the actual exploded graph for those big projects as they are really hard to interpret, we just want to know the numbers.

See the output of -analyzer-stats option for details.

I didn't find this option in clang --help. I've never use it. Please, explain how to use it.

I forgot to mention, not all of the analyzer flags have a corresponding flag in the driver. You can check all the analyzer related flags in the frontend (as opposed to the driver) using the following command: clang -cc1 -help | grep "analyzer"
You can either add -analyzer-stats to your cc1 invocation or you can use the -Xclang to pass the flag through the driver this way: -Xclang -analyzer-stats. This will modify the output of the analyzer to include additional statistics about the analysis.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	Could you tell me how the reasoning would be different for if (x >= y) if (x == y) if (x <= y) ?
1175	Makes sense, thanks. Maybe it would be worth to rephrase the comment to note that `x` and `y` can also stand for subexpressions, not only for actual symbols.
1182–1183	I see. I do not see any problem with that immediately, but @NoQ is much more well-versed to answer this question.
1188	A lattice is basically a DAG, where the nodes are the values (symbols, or subexpressions), and the edges are ordering. So it defines an ordering. E.g., A is smaller than B, if and only if there is a path from A to B. I am not sure whether this is the right representation, but it would take care of transitivity.

xazax.hun added inline comments.May 6 2020, 7:53 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I find this logic hard to explain, understand, and prove correct. I'd like to see at least some informal reasoning why will it always work. Also, if someone would like to extend this table in the future (e.g. with operator <=>), how to make it in such a way that does not break this logic.

You can either add -analyzer-stats to your cc1 invocation

It helps, thanks.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	When I was creating a table it was just 6x6 without AnyX2 column. It should tell of what RangeSet we shall return (true, false or empty). It covers cases with only one preceding condition. Then I found that two certain sequential conditions can also tell us what happen to the next condition. And those two conditions happened to be described as `Any` in the table. Thus I deduced one more column for this purpose. In other words: if (x >= y) // 3. Aha, we found `x >= y`, which is also `Any` for `x <= y` in the table if (x != y) // 2. Aha, we found `x == y`, which is `Any` for `x <= y` in the table if (x <= y) // 1. Reasoning a RangeSet about this `x <= y` // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` column for `x <= y`. // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA builds false branch only. You can substitute any other row from the table and corresponding Any-comparisons. Honestly, I don’t know how else to explain this. :) I find this logic hard to explain, understand, and prove correct. If you come up with it, It becomes really easy and clear. You'll see. But I agree that AnyX2 can be changed to something more obvious.
1175	Yes! Great!
1188	I will inspect this idea. Thanks.

xazax.hun added inline comments.May 6 2020, 12:01 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	Ok, in the meantime I revisited the algorithm and I think now I understand why does it work. I think the main observation I missed was the following: Every row has exactly two `Any` entries, so we only need to think about one combination per row. I think originally I interpreted your matrix wrong (the transposed one) and it did not really add up. So I think you should document the fact that each row has exactly two unknown cases. But for each of the rows, if we have both facts, we can actually evaluate the current operator. The goal is that if someone wants to extend this code later on (e.g. if the standard committee introduces a new operator that we handle), the next person should be able to modify this algorithm without breaking the existing code. So as you can see, adding a new column/row could break this invariant of having exactly 2 unknowns in each row rendering this code incorrect. This is why it is important, to explain what makes the algorithm work. I might miss something, but I do not really see test cases that are covering these. I think some renaming might also help others understanding what is going on: Capitalize variable names, for example, `tristate`. Chose a bit more descriptive names. `tristate` from the previous example describes the type of the variable. But variable names should describe their contents rather than their type. It can be `Result` in case it describes the result of the operation. I think in 3-valued logic we tend to use `Unknown` rather than `Any`. You have code like `auto tristate = CmpOpTable[IndexOP][Index];`. This is not really readable as it is really hard to tell what index corresponds to what. One of the indexes correspond to the evaluated operator while the other corresponds to the query. The variable names should reflect that.

ASDenysPetrov marked an inline comment as done.May 7 2020, 3:35 AM

ASDenysPetrov added inline comments.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1141	Thank you. I'll take into account all your notes. I'll update the patch soon.

ASDenysPetrov updated this revision to Diff 262667.May 7 2020, 8:36 AM

ASDenysPetrov edited the summary of this revision. (Show Details)

Updated the patch due to comments of @xazax.hun

clang/test/Analysis/constraint_manager_conditions.cpp
214	I've added a new line and checked twice before creating a patch. But it didn't appear.

Thanks!

I still have some nits inline, but overall the implementation looks good to me.
I think, however, that we should not only focus on the implementation but on the high-level questions.
Is this the way forward we want? Can we do it more efficiently? What is the performance penalty of this solution?

The measurements I mentioned earlier about runtimes, node count and so on could help answer some of those questions.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1140	I think it would be better to use slightly different terminology. Instead of preceding operator I would say something like "previously known fact".
1150	I think this table together with the operations that transform between indices and operators could be factored out into a separate class. That would make this method slightly cleaner.
1166	In LLVM we usually only use `auto` for iterators and to avoid repeating the type twice. Auto is ok above with the `dynamic_cast`, but here and some other places the type should be spelled out.
1221	I think `IndexPrevOP` is confusing as the "previous" operation is not necessarily the previous operation we evaluated. In fact `IndexPrevOP` could have been evaluated any time in the past. We query the known facts that are known for the current state, I think this is what the names should reflect. I suggest to use `CurrentOP` and `QueriedOP`.
clang/test/Analysis/constraint_manager_conditions.cpp
214	This looks good now. If it misses the new line there is a message saying that explicitly.

The measurements I mentioned earlier about runtimes, node count and so on could help answer some of those questions.

I'm looking for an appropriate project to measure performance, since I have some troubles running the test-suite on Windows.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1150	I'll try to come up how to turn it into the class.
1166	Thanks for the guidance, I'll check the code for implicit types.
1221	No problem. I'll rename them.

Updated due to @xazax.hun comments.

xazax.hun added inline comments.May 13 2020, 7:08 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
91	I think this function should not work with indices. It should work with operators. So the caller of this function never need to invoke `IndexFromOp` or `OpFromIndex` making this function easier to use.

Updated. @xazax.hun you were right. Now it looks more readable. Thanks.

I've made some performance measurements on https://github.com/vim/vim project.
Since I'm using Windows I had no chance to do a lot of test on different projects, because of Unix-environment dependency.
I've tested the most weighty files running clang --analyze -Xclang "-Iproto" -Xclang -analyzer-stats src/file.c on each.
In this way I've tested 84 files. And here is a result table (Each row contains a result of testing an individual c/cpp-file).

performance_test_vim.xlsx15 KBDownload

Total results are:
test 1

Before	After	Delta
770,9s	760,8s	1,31%

test 2

Before	After	Delta
791,4s	771,8s	2,48%

@xazax.hun, any thoughts?

Updated. Rebased with D79232.
@vsavchenko, @NoQ, @xazax.hun please look.

In D78933#2054195, @ASDenysPetrov wrote:

@xazax.hun, any thoughts?

I think we should check it on some more projects. We saw vastly different analyzer behavior on other projects in the past. I think you should be able to run it on any cmake project that supports windows by generating compilation database and using scan-build-py (in case other methods did not work).

Overall, the performance penalty does not look that bad. But I wonder how would this compare to a solution that also handles transitivity. We did not really look into the alternatives yet.

I'd like to know what @vsavchenko thinks as Valeriy is more into this kind of stuff lately.

Sorry for postponing this review.
I'll get to it right away!

@xazax.hun

I think we should check it on some more projects. We saw vastly different analyzer behavior on other projects in the past.

I completely agree with you. But, unfortunately, vim-proj is the only I could squeeze from that bunch.

I think you should be able to run it on any cmake project that supports windows by generating compilation database and using scan-build-py (in case other methods did not work).

I'll appreciate if you'd give some of such projects. Another point is that I don't know how to get printed stats from the scan-build.

But I wonder how would this compare to a solution that also handles transitivity. We did not really look into the alternatives yet.

I've read an article about Octagons you gave me. The article covers much bigger and more complicated scope, then mine solution. For now I'm not really ready to enlarge it, as it needs a deep investigation. I think for now it can be accepted as an intermadiate proposal.

vsavchenko added inline comments.May 29 2020, 7:40 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
32	+1 for this static assert! It's good to ensure such assumptions, even if those are very unlikely to change
80	I would prefer function names to comply with LLVM coding standards (start with a verb and a lowercase letter https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
665	I believe that when we use auto we still try to be more verbose with it, so in this case it should be smth like `const auto *SSE`
689	Sorry for nitpicking, but it seems like the grammar is a bit off in the the all columns except the last part
690	I think that treat is not the best option here. Maybe smith like process will do?
696	Maybe we can name this constant somehow differently to be more verbose?
700–705	Please, correct me if I'm wrong, but I thought that we are iterating over all comparison operators (excluding `<=>`), which means that if we don't find it on this iteration for let's say `x < y` then we'll find it (or already did) for `x > y`. So, my question is - why do we have this clause then? And it confuses me that `RangeSet` now corresponds to a reversed operator, while `QueriedOP` remains the same. Maybe a good commentary explaining why we need it could help!

Do not apologize for syntax, language or code style remarks. Them make code better and it's really important for me to know about my faults.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
80	I'll add a get prefix. Saying about lower/uppercases, I'd say this is a mess inside this file. I've just decided to use an upper one.
665	I'll do.
689	Do you mean to change to all of the columns exept the last one ('UnknownX2')?
690	I met a lot of treat usage in terms of handle/process among the code, so just followed it.
696	What do you think about FoundRangeSet?
700–705	No-no. Please, look carefully. At first we try to find `X op Y`, when op is a comparison operator. If we failed then try to find the reversed (flipped) version of the expression `Y reversed(op) X`. Examples: `x > y and y < x`, `x != y and y != x`, `x <= y and y >= x` We don't care about operand positions, we just want to find any of its previous relation, and which branch of it we are now in.

Great job! Thank you for your work!

I would say, fixing a couple of minor style issues and testing it on a few more projects, and it is ready to land!

Another point is that I don't know how to get printed stats from the scan-build.

That option in scan-build is called -internal-stats.

I completely agree with you. But, unfortunately, vim-proj is the only I could squeeze from that bunch.

My opinion here is that any projects (with a good amount of users or stars on GitHub ^-^ ) will do. If it is hard to build them on Windows, move to the next project. I guess projects with a minimal number of dependencies should be fairly easy on any platform.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
80	I agree that it is a mess, but if we want to reduce it with time, I'd say we should stick to the official style guide.
689	Yep, sounds good!
696	I was thinking about adding more context to the name, something like `RelatedRangeSet` or similar. To show not only what it is (`RangeSet`) and how we got it (`found`), but also its purpose or meaning.
700–705	Oh, I see now. I missed the part where `RHS` is on the left now :-)

Updated due to comments.

@vsavchenko I've compiled some performance stats using csa-testbench. The result are within the margin of error.

Project	Before	After	Delta
TMUX	0:01:13	0:01:13	0,00%
Redis	0:02:37	0:02:39	1,27%
Vim	0:03:13	0:03:14	0,52%
Xerces	0:04:14	0:04:17	1,18%
OpenSSL	0:04:25	0:04:26	0,38%
protobuf	0:05:52	0:05:55	0,85%
SQLite	0:07:15	0:07:34	4,37%
PostgreSQL	0:08:27	0:08:37	1,97%
LLVM	1:06:45	1:06:15	-0,75%
TOTAL	1:44:01	1:44:10	0,14%

Upd: replaced a bit ill-collected statistics with more relevant one.

In D78933#2087386, @ASDenysPetrov wrote:

@vsavchenko I've compiled some performance stats using csa-testbench. The result are within the margin of error.

Awesome! Thank you for your work!

This revision is now accepted and ready to land.Jun 11 2020, 5:48 AM

@xazax.hun,
I've made performance measurements you concerned about. Could you look at it and make decision on this patch?

I would not call the results of the measurement within the margin of error but the results do not look bad. Unless there is some objection from someone else I am ok with committing this but please change the title of revision before committing. When one says optimization we often think about performance. It should say something like reasoning about more comparison operations.

In D78933#2090076, @xazax.hun wrote:

I would not call the results of the measurement within the margin of error but the results do not look bad. Unless there is some objection from someone else I am ok with committing this but please change the title of revision before committing. When one says optimization we often think about performance. It should say something like reasoning about more comparison operations.

Thanks. I'll change the name.
I mentioned it as a "margin of error" because if you run tests on the same build twice you can easily get the similar difference between two tests in a row.

ASDenysPetrov retitled this revision from [analyzer] RangeConstraintManager optimizations in comparison expressions to [analyzer] Reasoning about comparison expressions in RangeConstraintManager.Jun 15 2020, 2:42 AM

Closed by commit rGe1741e34e00b: [analyzer] Reasoning about comparison expressions in RangeConstraintManager (authored by ASDenysPetrov). · Explain WhyJun 15 2020, 8:43 AM

This revision was automatically updated to reflect the committed changes.

Szelethus mentioned this in D86533: (Urgent!) [release][docs][analyzer] Add 11.0.0. release notes.Sep 7 2020, 6:46 AM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

170 lines

test/

Analysis/

constraint_manager_conditions.cpp

213 lines

Diff 270767

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show All 18 Lines
#include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h"
#include "llvm/ADT/FoldingSet.h"		#include "llvm/ADT/FoldingSet.h"
#include "llvm/ADT/ImmutableSet.h"		#include "llvm/ADT/ImmutableSet.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;

		// This class can be extended with other tables which will help to reason
		// about ranges more precisely.
		class OperatorRelationsTable {
		static_assert(BO_LT < BO_GT && BO_GT < BO_LE && BO_LE < BO_GE &&
		BO_GE < BO_EQ && BO_EQ < BO_NE,
		"This class relies on operators order. Rework it otherwise.");
		vsavchenkoUnsubmitted Not Done Reply Inline Actions +1 for this static assert! It's good to ensure such assumptions, even if those are very unlikely to change vsavchenko: +1 for this static assert! It's good to ensure such assumptions, even if those are very…

		public:
		enum TriStateKind {
		False = 0,
		True,
		Unknown,
		};

		private:
		// CmpOpTable holds states which represent the corresponding range for
		// branching an exploded graph. We can reason about the branch if there is
		// a previously known fact of the existence of a comparison expression with
		// operands used in the current expression.
		// E.g. assuming (x < y) is true that means (x != y) is surely true.
		// if (x previous_operation y) // < \| != \| >
		// if (x operation y) // != \| > \| <
		// tristate // True \| Unknown \| False
		//
		// CmpOpTable represents next:
		// __\|< \|> \|<=\|>=\|==\|!=\|UnknownX2\|
		// < \|1 \|0 \|* \|0 \|0 \|* \|1 \|
		// > \|0 \|1 \|0 \|* \|0 \|* \|1 \|
		// <=\|1 \|0 \|1 \|* \|1 \|* \|0 \|
		// >=\|0 \|1 \|* \|1 \|1 \|* \|0 \|
		// ==\|0 \|0 \|* \|* \|1 \|0 \|1 \|
		// !=\|1 \|1 \|* \|* \|0 \|1 \|0 \|
		//
		// Columns stands for a previous operator.
		// Rows stands for a current operator.
		// Each row has exactly two `Unknown` cases.
		// UnknownX2 means that both `Unknown` previous operators are met in code,
		// and there is a special column for that, for example:
		// if (x >= y)
		// if (x != y)
		// if (x <= y)
		// False only
		static constexpr size_t CmpOpCount = BO_NE - BO_LT + 1;
		const TriStateKind CmpOpTable[CmpOpCount][CmpOpCount + 1] = {
		// < > <= >= == != UnknownX2
		{True, False, Unknown, False, False, Unknown, True}, // <
		{False, True, False, Unknown, False, Unknown, True}, // >
		{True, False, True, Unknown, True, Unknown, False}, // <=
		{False, True, Unknown, True, True, Unknown, False}, // >=
		{False, False, Unknown, Unknown, True, False, True}, // ==
		{True, True, Unknown, Unknown, False, True, False}, // !=
		};

		static size_t getIndexFromOp(BinaryOperatorKind OP) {
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I would prefer function names to comply with LLVM coding standards (start with a verb and a lowercase letter https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly vsavchenko: I would prefer function names to comply with LLVM coding standards (start with a verb and a…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'll add a get prefix. Saying about lower/uppercases, I'd say this is a mess inside this file. I've just decided to use an upper one. ASDenysPetrov: I'll add a get prefix. Saying about lower/uppercases, I'd say this is a mess inside this…
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I agree that it is a mess, but if we want to reduce it with time, I'd say we should stick to the official style guide. vsavchenko: I agree that it is a mess, but if we want to reduce it with time, I'd say we should stick to…
		return static_cast<size_t>(OP - BO_LT);
		}

		public:
		constexpr size_t getCmpOpCount() const { return CmpOpCount; }

		static BinaryOperatorKind getOpFromIndex(size_t Index) {
		return static_cast<BinaryOperatorKind>(Index + BO_LT);
		}

		TriStateKind getCmpOpState(BinaryOperatorKind CurrentOP,
		xazax.hunUnsubmitted Not Done Reply Inline Actions I think this function should not work with indices. It should work with operators. So the caller of this function never need to invoke `IndexFromOp` or `OpFromIndex` making this function easier to use. xazax.hun: I think this function should not work with indices. It should work with operators. So the…
		BinaryOperatorKind QueriedOP) const {
		return CmpOpTable[getIndexFromOp(CurrentOP)][getIndexFromOp(QueriedOP)];
		}

		TriStateKind getCmpOpStateForUnknownX2(BinaryOperatorKind CurrentOP) const {
		return CmpOpTable[getIndexFromOp(CurrentOP)][CmpOpCount];
		}
		};
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// RangeSet implementation		// RangeSet implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void RangeSet::IntersectInRange(BasicValueFactory &BV, Factory &F,		void RangeSet::IntersectInRange(BasicValueFactory &BV, Factory &F,
const llvm::APSInt &Lower,		const llvm::APSInt &Lower,
const llvm::APSInt &Upper,		const llvm::APSInt &Upper,
PrimRangeSet &newRanges,		PrimRangeSet &newRanges,
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	if (AssociatedRange && RangeAssociatedWithNegatedSym)
RangeAssociatedWithNegatedSym->Negate(ValueFactory, RangeFactory));		RangeAssociatedWithNegatedSym->Negate(ValueFactory, RangeFactory));

if (AssociatedRange)		if (AssociatedRange)
return *AssociatedRange;		return *AssociatedRange;

if (RangeAssociatedWithNegatedSym)		if (RangeAssociatedWithNegatedSym)
return RangeAssociatedWithNegatedSym->Negate(ValueFactory, RangeFactory);		return RangeAssociatedWithNegatedSym->Negate(ValueFactory, RangeFactory);

		// If Sym is a comparison expression (except <=>),
		// find any other comparisons with the same operands.
		// See function description.
		const RangeSet CmpRangeSet = getRangeForComparisonSymbol(State, Sym);
		if (!CmpRangeSet.isEmpty())
		return CmpRangeSet;

return Visit(Sym);		return Visit(Sym);
}		}

/// Infer range information solely from the type.		/// Infer range information solely from the type.
RangeSet infer(QualType T) {		RangeSet infer(QualType T) {
// Lazily generate a new RangeSet representing all possible values for the		// Lazily generate a new RangeSet representing all possible values for the
// given symbol type.		// given symbol type.
RangeSet Result(RangeFactory, ValueFactory.getMinValue(T),		RangeSet Result(RangeFactory, ValueFactory.getMinValue(T),
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	if (const SymSymExpr *SSE = dyn_cast<SymSymExpr>(Sym)) {
T->isSignedIntegerOrEnumerationType())		T->isSignedIntegerOrEnumerationType())
return negV;		return negV;
}		}
}		}
}		}
return nullptr;		return nullptr;
}		}

		// Returns ranges only for binary comparison operators (except <=>)
		// when left and right operands are symbolic values.
		// Finds any other comparisons with the same operands.
		// Then do logical calculations and refuse impossible branches.
		// E.g. (x < y) and (x > y) at the same time are impossible.
		// E.g. (x >= y) and (x != y) at the same time makes (x > y) true only.
		// E.g. (x == y) and (y == x) are just reversed but the same.
		// It covers all possible combinations (see CmpOpTable description).
		// Note that `x` and `y` can also stand for subexpressions,
		// not only for actual symbols.
		RangeSet getRangeForComparisonSymbol(ProgramStateRef State, SymbolRef Sym) {
		const RangeSet EmptyRangeSet = RangeFactory.getEmptySet();

		auto SSE = dyn_cast<SymSymExpr>(Sym);
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I believe that when we use auto we still try to be more verbose with it, so in this case it should be smth like `const auto SSE` vsavchenko:* I believe that when we use auto we still try to be more verbose with it, so in this case it…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'll do. ASDenysPetrov: I'll do.
		if (!SSE)
		return EmptyRangeSet;

		BinaryOperatorKind CurrentOP = SSE->getOpcode();

		// We currently do not support <=> (C++20).
		if (!BinaryOperator::isComparisonOp(CurrentOP) \|\| (CurrentOP == BO_Cmp))
		return EmptyRangeSet;

		static const OperatorRelationsTable CmpOpTable;

		const SymExpr *LHS = SSE->getLHS();
		const SymExpr *RHS = SSE->getRHS();
		QualType T = SSE->getType();

		SymbolManager &SymMgr = State->getSymbolManager();
		const llvm::APSInt &Zero = ValueFactory.getValue(0, T);
		const llvm::APSInt &One = ValueFactory.getValue(1, T);
		const RangeSet TrueRangeSet(RangeFactory, One, One);
		const RangeSet FalseRangeSet(RangeFactory, Zero, Zero);

		int UnknownStates = 0;

		// Loop goes through all of the columns exept the last one ('UnknownX2').
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Sorry for nitpicking, but it seems like the grammar is a bit off in the the all columns except the last part vsavchenko: Sorry for nitpicking, but it seems like the grammar is a bit off in the **the all columns…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Do you mean to change to all of the columns exept the last one ('UnknownX2')? ASDenysPetrov: Do you mean to change to all of the columns exept the last one ('UnknownX2')?
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Yep, sounds good! vsavchenko: Yep, sounds good!
		// We treat `UnknownX2` column separately at the end of the loop body.
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I think that treat is not the best option here. Maybe smith like process will do? vsavchenko: I think that treat is not the best option here. Maybe smith like process will do?
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I met a lot of treat usage in terms of handle/process among the code, so just followed it. ASDenysPetrov: I met a lot of treat usage in terms of handle/process among the code, so just followed…
		for (size_t i = 0; i < CmpOpTable.getCmpOpCount(); ++i) {

		// Let's find an expression e.g. (x < y).
		BinaryOperatorKind QueriedOP = OperatorRelationsTable::getOpFromIndex(i);
		const SymSymExpr *SymSym = SymMgr.getSymSymExpr(LHS, QueriedOP, RHS, T);
		const RangeSet *QueriedRangeSet = State->get<ConstraintRange>(SymSym);
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Maybe we can name this constant somehow differently to be more verbose? vsavchenko: Maybe we can name this constant somehow differently to be more verbose?
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions What do you think about FoundRangeSet? ASDenysPetrov: What do you think about FoundRangeSet?
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I was thinking about adding more context to the name, something like `RelatedRangeSet` or similar. To show not only what it is (`RangeSet`) and how we got it (`found`), but also its purpose or meaning. vsavchenko: I was thinking about adding more context to the name, something like `RelatedRangeSet` or…

		// If ranges were not previously found,
		// try to find a reversed expression (y > x).
		if (!QueriedRangeSet) {
		const BinaryOperatorKind ROP =
		BinaryOperator::reverseComparisonOp(QueriedOP);
		SymSym = SymMgr.getSymSymExpr(RHS, ROP, LHS, T);
		QueriedRangeSet = State->get<ConstraintRange>(SymSym);
		}
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Please, correct me if I'm wrong, but I thought that we are iterating over all comparison operators (excluding `<=>`), which means that if we don't find it on this iteration for let's say `x < y` then we'll find it (or already did) for `x > y`. So, my question is - why do we have this clause then? And it confuses me that `RangeSet` now corresponds to a reversed operator, while `QueriedOP` remains the same. Maybe a good commentary explaining why we need it could help! vsavchenko: Please, correct me if I'm wrong, but I thought that we are iterating over //all// comparison…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions No-no. Please, look carefully. At first we try to find `X op Y`, when op is a comparison operator. If we failed then try to find the reversed (flipped) version of the expression `Y reversed(op) X`. Examples: `x > y and y < x`, `x != y and y != x`, `x <= y and y >= x` We don't care about operand positions, we just want to find any of its previous relation, and which branch of it we are now in. ASDenysPetrov: No-no. Please, look carefully. At first we try to find `X op Y`, when op is a comparison…
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Oh, I see now. I missed the part where `RHS` is on the left now :-) vsavchenko: Oh, I see now. I missed the part where `RHS` is on the left now :-)

		if (!QueriedRangeSet \|\| QueriedRangeSet->isEmpty())
		continue;

		const llvm::APSInt *ConcreteValue = QueriedRangeSet->getConcreteValue();
		const bool isInFalseBranch =
		ConcreteValue ? (*ConcreteValue == 0) : false;

		// If it is a false branch, we shall be guided by opposite operator,
		// because the table is made assuming we are in the true branch.
		// E.g. when (x <= y) is false, then (x > y) is true.
		if (isInFalseBranch)
		QueriedOP = BinaryOperator::negateComparisonOp(QueriedOP);

		OperatorRelationsTable::TriStateKind BranchState =
		CmpOpTable.getCmpOpState(CurrentOP, QueriedOP);

		if (BranchState == OperatorRelationsTable::Unknown) {
		if (++UnknownStates == 2)
		// If we met both Unknown states.
		// if (x <= y) // assume true
		// if (x != y) // assume true
		// if (x < y) // would be also true
		// Get a state from `UnknownX2` column.
		BranchState = CmpOpTable.getCmpOpStateForUnknownX2(CurrentOP);
		else
		continue;
		}

		return (BranchState == OperatorRelationsTable::True) ? TrueRangeSet
		: FalseRangeSet;
		}

		return EmptyRangeSet;
		}

BasicValueFactory &ValueFactory;		BasicValueFactory &ValueFactory;
RangeSet::Factory &RangeFactory;		RangeSet::Factory &RangeFactory;
ProgramStateRef State;		ProgramStateRef State;
};		};

template <>		template <>
RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_Or>(Range LHS, Range RHS,		RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_Or>(Range LHS, Range RHS,
QualType T) {		QualType T) {
▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines

ProgramStateRef		ProgramStateRef
RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym,		RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym,
const llvm::APSInt &Int,		const llvm::APSInt &Int,
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
// Before we do any real work, see if the value can even show up.		// Before we do any real work, see if the value can even show up.
APSIntType AdjustmentType(Adjustment);		APSIntType AdjustmentType(Adjustment);
if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)		if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)
return nullptr;		return nullptr;
		xazax.hunUnsubmitted Not Done Reply Inline Actions I think it would be better to use slightly different terminology. Instead of preceding operator I would say something like "previously known fact". xazax.hun: I think it would be better to use slightly different terminology. Instead of preceding operator…

		xazax.hunUnsubmitted Not Done Reply Inline Actions I have really hard time processing how to interpret `AnyX2`. For example in the code below: if (x >= y) if (x != y) if (x <= y) return false if (x >= y) if (x == y) if (x <= y) return true We would get different results for `<=`. So I do not really get how I should read the `AnyX2` column. xazax.hun: I have really hard time processing how to interpret `AnyX2`. For example in the code below…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'm sorry for the obscure explanation. I was really thinking about how to explain it more clear. Let's consider this code: if (x >= y) if (x != y) if (x <= y) If 'x >= y' is true, then following `x <= y` is `Any` ( can be true or false ). See table in the summary. If 'x != y' is true, then following `x <= y` is `Any` ( can be true or false ). If 'x >= y && x != y' is true, then following `x <= y` is `False` only (it means that we previously met two Any states, what I've called as AnyX2). ASDenysPetrov: I'm sorry for the obscure explanation. I was really thinking about how to explain it more clear.
		xazax.hunUnsubmitted Not Done Reply Inline Actions Could you tell me how the reasoning would be different for if (x >= y) if (x == y) if (x <= y) ? xazax.hun: Could you tell me how the reasoning would be different for ``` if (x >= y) if (x == y)…
		xazax.hunUnsubmitted Not Done Reply Inline Actions Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I find this logic hard to explain, understand, and prove correct. I'd like to see at least some informal reasoning why will it always work. Also, if someone would like to extend this table in the future (e.g. with operator <=>), how to make it in such a way that does not break this logic. xazax.hun: Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I find this logic hard…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions When I was creating a table it was just 6x6 without AnyX2 column. It should tell of what RangeSet we shall return (true, false or empty). It covers cases with only one preceding condition. Then I found that two certain sequential conditions can also tell us what happen to the next condition. And those two conditions happened to be described as `Any` in the table. Thus I deduced one more column for this purpose. In other words: if (x >= y) // 3. Aha, we found `x >= y`, which is also `Any` for `x <= y` in the table if (x != y) // 2. Aha, we found `x == y`, which is `Any` for `x <= y` in the table if (x <= y) // 1. Reasoning a RangeSet about this `x <= y` // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` column for `x <= y`. // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA builds false branch only. You can substitute any other row from the table and corresponding Any-comparisons. Honestly, I don’t know how else to explain this. :) I find this logic hard to explain, understand, and prove correct. If you come up with it, It becomes really easy and clear. You'll see. But I agree that AnyX2 can be changed to something more obvious. ASDenysPetrov: When I was creating a table it was just 6x6 without AnyX2 column. It should tell of what…
		xazax.hunUnsubmitted Not Done Reply Inline Actions Ok, in the meantime I revisited the algorithm and I think now I understand why does it work. I think the main observation I missed was the following: Every row has exactly two `Any` entries, so we only need to think about one combination per row. I think originally I interpreted your matrix wrong (the transposed one) and it did not really add up. So I think you should document the fact that each row has exactly two unknown cases. But for each of the rows, if we have both facts, we can actually evaluate the current operator. The goal is that if someone wants to extend this code later on (e.g. if the standard committee introduces a new operator that we handle), the next person should be able to modify this algorithm without breaking the existing code. So as you can see, adding a new column/row could break this invariant of having exactly 2 unknowns in each row rendering this code incorrect. This is why it is important, to explain what makes the algorithm work. I might miss something, but I do not really see test cases that are covering these. I think some renaming might also help others understanding what is going on: Capitalize variable names, for example, `tristate`. Chose a bit more descriptive names. `tristate` from the previous example describes the type of the variable. But variable names should describe their contents rather than their type. It can be `Result` in case it describes the result of the operation. I think in 3-valued logic we tend to use `Unknown` rather than `Any`. You have code like `auto tristate = CmpOpTable[IndexOP][Index];`. This is not really readable as it is really hard to tell what index corresponds to what. One of the indexes correspond to the evaluated operator while the other corresponds to the query. The variable names should reflect that. xazax.hun: Ok, in the meantime I revisited the algorithm and I think now I understand why does it work. I…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Thank you. I'll take into account all your notes. I'll update the patch soon. ASDenysPetrov: Thank you. I'll take into account all your notes. I'll update the patch soon.
// [Int-Adjustment, Int-Adjustment]		// [Int-Adjustment, Int-Adjustment]
llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment;		llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment;
RangeSet New = getRange(St, Sym).Intersect(getBasicVals(), F, AdjInt, AdjInt);		RangeSet New = getRange(St, Sym).Intersect(getBasicVals(), F, AdjInt, AdjInt);
return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);		return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);
}		}

RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St,		RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St,
SymbolRef Sym,		SymbolRef Sym,
const llvm::APSInt &Int,		const llvm::APSInt &Int,
		xazax.hunUnsubmitted Not Done Reply Inline Actions I think this table together with the operations that transform between indices and operators could be factored out into a separate class. That would make this method slightly cleaner. xazax.hun: I think this table together with the operations that transform between indices and operators…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'll try to come up how to turn it into the class. ASDenysPetrov: I'll try to come up how to turn it into the class.
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
// Before we do any real work, see if the value can even show up.		// Before we do any real work, see if the value can even show up.
APSIntType AdjustmentType(Adjustment);		APSIntType AdjustmentType(Adjustment);
switch (AdjustmentType.testInRange(Int, true)) {		switch (AdjustmentType.testInRange(Int, true)) {
case APSIntType::RTR_Below:		case APSIntType::RTR_Below:
return F.getEmptySet();		return F.getEmptySet();
case APSIntType::RTR_Within:		case APSIntType::RTR_Within:
break;		break;
case APSIntType::RTR_Above:		case APSIntType::RTR_Above:
return getRange(St, Sym);		return getRange(St, Sym);
}		}

// Special case for Int == Min. This is always false.		// Special case for Int == Min. This is always false.
llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);		llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);
llvm::APSInt Min = AdjustmentType.getMinValue();		llvm::APSInt Min = AdjustmentType.getMinValue();
if (ComparisonVal == Min)		if (ComparisonVal == Min)
		xazax.hunUnsubmitted Not Done Reply Inline Actions In LLVM we usually only use `auto` for iterators and to avoid repeating the type twice. Auto is ok above with the `dynamic_cast`, but here and some other places the type should be spelled out. xazax.hun: In LLVM we usually only use `auto` for iterators and to avoid repeating the type twice. Auto is…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Thanks for the guidance, I'll check the code for implicit types. ASDenysPetrov: Thanks for the guidance, I'll check the code for implicit types.
return F.getEmptySet();		return F.getEmptySet();

llvm::APSInt Lower = Min - Adjustment;		llvm::APSInt Lower = Min - Adjustment;
llvm::APSInt Upper = ComparisonVal - Adjustment;		llvm::APSInt Upper = ComparisonVal - Adjustment;
--Upper;		--Upper;

return getRange(St, Sym).Intersect(getBasicVals(), F, Lower, Upper);		return getRange(St, Sym).Intersect(getBasicVals(), F, Lower, Upper);
}		}

		xazax.hunUnsubmitted Not Done Reply Inline Actions Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this function if one of them is not a simple symbol? xazax.hun: Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I think it makes sense, because `(x+3y) > z` and `z < (x+3y)` perfectly fits in the logic. ASDenysPetrov: I think it makes sense, because `(x+3y) > z` and `z < (x+3y)` perfectly fits in the logic.
		xazax.hunUnsubmitted Not Done Reply Inline Actions Makes sense, thanks. Maybe it would be worth to rephrase the comment to note that `x` and `y` can also stand for subexpressions, not only for actual symbols. xazax.hun: Makes sense, thanks. Maybe it would be worth to rephrase the comment to note that `x` and `y`…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Yes! Great! ASDenysPetrov: Yes! Great!
ProgramStateRef		ProgramStateRef
RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym,		RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym,
const llvm::APSInt &Int,		const llvm::APSInt &Int,
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
RangeSet New = getSymLTRange(St, Sym, Int, Adjustment);		RangeSet New = getSymLTRange(St, Sym, Int, Adjustment);
return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);		return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);
}		}

		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Folk, is this a good idea to explicitly create bool ranges as a return value? As for me, comparisons like `>`, `<`, etc. can only produce bool-based ranges, otherwise it would be weird. ASDenysPetrov: Folk, is this a good idea to explicitly create bool ranges as a return value? As for me…
		xazax.hunUnsubmitted Not Done Reply Inline Actions I think modeling booleans with ranges is OK. This is what the analyzer is doing at the moment. But I guess the question is about whether boolean results of comparisons is a good way to store the relationships between the symbols. I am not convinced about that see my inline comment below. xazax.hun: I think modeling booleans with ranges is OK. This is what the analyzer is doing at the moment.
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Once I met a case when logic operator `<` retruns me int-based ranges {0,0} on the false branch and {1, 429496725} (instead of bool-based {1,1}) on the true branch. It confused me, so the question arised. ASDenysPetrov: Once I met a case when logic operator `<` retruns me int-based ranges {0,0} on the //false//…
		xazax.hunUnsubmitted Not Done Reply Inline Actions I see. I do not see any problem with that immediately, but @NoQ is much more well-versed to answer this question. xazax.hun: I see. I do not see any problem with that immediately, but @NoQ is much more well-versed to…
RangeSet RangeConstraintManager::getSymGTRange(ProgramStateRef St,		RangeSet RangeConstraintManager::getSymGTRange(ProgramStateRef St,
SymbolRef Sym,		SymbolRef Sym,
const llvm::APSInt &Int,		const llvm::APSInt &Int,
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
// Before we do any real work, see if the value can even show up.		// Before we do any real work, see if the value can even show up.
		xazax.hunUnsubmitted Not Done Reply Inline Actions I am not really familiar with how constraints are represented but I vaguely recall the analyzer sometimes normalizing some expressions like conversion `A == B` to `A - B == 0`. I am just wondering if this API to look this expression up is not the right abstraction as it might be better to handle such normalizations in a unified, central way. Also, note that this method does not handle transitivity. I wonder if maintaining set of ranges is the right representation for this information at all. The ordering between the symbols will define a lattice. Representing that lattice directly instead of using ranges might be more efficient. xazax.hun: I am not really familiar with how constraints are represented but I vaguely recall the analyzer…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions conversion A == B to A - B == 0. All my samples `x == y` worked as `x == y`. I havn't seen such transformations. Representing that lattice directly instead of using ranges might be more efficient. How do you imagine that lattice? Could you detail it? ASDenysPetrov: >conversion A == B to A - B == 0. All my samples `x == y` worked as `x == y`. I havn't seen…
		xazax.hunUnsubmitted Not Done Reply Inline Actions A lattice is basically a DAG, where the nodes are the values (symbols, or subexpressions), and the edges are ordering. So it defines an ordering. E.g., A is smaller than B, if and only if there is a path from A to B. I am not sure whether this is the right representation, but it would take care of transitivity. xazax.hun: A lattice is basically a DAG, where the nodes are the values (symbols, or subexpressions), and…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I will inspect this idea. Thanks. ASDenysPetrov: I will inspect this idea. Thanks.
APSIntType AdjustmentType(Adjustment);		APSIntType AdjustmentType(Adjustment);
switch (AdjustmentType.testInRange(Int, true)) {		switch (AdjustmentType.testInRange(Int, true)) {
case APSIntType::RTR_Below:		case APSIntType::RTR_Below:
return getRange(St, Sym);		return getRange(St, Sym);
case APSIntType::RTR_Within:		case APSIntType::RTR_Within:
break;		break;
case APSIntType::RTR_Above:		case APSIntType::RTR_Above:
return F.getEmptySet();		return F.getEmptySet();
Show All 16 Lines
RangeConstraintManager::assumeSymGT(ProgramStateRef St, SymbolRef Sym,		RangeConstraintManager::assumeSymGT(ProgramStateRef St, SymbolRef Sym,
const llvm::APSInt &Int,		const llvm::APSInt &Int,
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
RangeSet New = getSymGTRange(St, Sym, Int, Adjustment);		RangeSet New = getSymGTRange(St, Sym, Int, Adjustment);
return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);		return New.isEmpty() ? nullptr : St->set<ConstraintRange>(Sym, New);
}		}

RangeSet RangeConstraintManager::getSymGERange(ProgramStateRef St,		RangeSet RangeConstraintManager::getSymGERange(ProgramStateRef St,
SymbolRef Sym,		SymbolRef Sym,
		xazax.hunUnsubmitted Not Done Reply Inline Actions I think `IndexPrevOP` is confusing as the "previous" operation is not necessarily the previous operation we evaluated. In fact `IndexPrevOP` could have been evaluated any time in the past. We query the known facts that are known for the current state, I think this is what the names should reflect. I suggest to use `CurrentOP` and `QueriedOP`. xazax.hun: I think `IndexPrevOP` is confusing as the "previous" operation is not necessarily the previous…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions No problem. I'll rename them. ASDenysPetrov: No problem. I'll rename them.
const llvm::APSInt &Int,		const llvm::APSInt &Int,
const llvm::APSInt &Adjustment) {		const llvm::APSInt &Adjustment) {
// Before we do any real work, see if the value can even show up.		// Before we do any real work, see if the value can even show up.
APSIntType AdjustmentType(Adjustment);		APSIntType AdjustmentType(Adjustment);
switch (AdjustmentType.testInRange(Int, true)) {		switch (AdjustmentType.testInRange(Int, true)) {
case APSIntType::RTR_Below:		case APSIntType::RTR_Below:
return getRange(St, Sym);		return getRange(St, Sym);
case APSIntType::RTR_Within:		case APSIntType::RTR_Within:
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

clang/test/Analysis/constraint_manager_conditions.cpp

This file was added.

				// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection -verify %s

				void clang_analyzer_eval(int);

				void comparison_lt(int x, int y) {
				if (x < y) {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				}
				}

				void comparison_gt(int x, int y) {
				if (x > y) {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x <= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				}
				}

				void comparison_le(int x, int y) {
				if (x <= y) {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x <= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				}
				}

				void comparison_ge(int x, int y) {
				if (x >= y) {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				}
				}

				void comparison_eq(int x, int y) {
				if (x == y) {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x != y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{FALSE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				}
				}

				void comparison_ne(int x, int y) {
				if (x != y) {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}} expected-warning{{FALSE}}
				clang_analyzer_eval(x == y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y == x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x != y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y != x); // expected-warning{{TRUE}}
				} else {
				clang_analyzer_eval(x < y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y > x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x > y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < x); // expected-warning{{FALSE}}
				clang_analyzer_eval(x <= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y >= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y <= x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x != y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{FALSE}}
				}
				}

				void comparison_le_ne(int x, int y) {
				if (x <= y)
				if (x != y) {
				clang_analyzer_eval(x < y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y > x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y <= x); // expected-warning{{FALSE}}
				}
				}

				void comparison_ge_ne(int x, int y) {
				if (x >= y)
				if (x != y) {
				clang_analyzer_eval(x > y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y < x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x <= y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y >= x); // expected-warning{{FALSE}}
				}
				}

				void comparison_le_ge(int x, int y) {
				if (x <= y)
				if (x >= y) {
				clang_analyzer_eval(x == y); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == x); // expected-warning{{TRUE}}
				clang_analyzer_eval(x != y); // expected-warning{{FALSE}}
				clang_analyzer_eval(y != x); // expected-warning{{FALSE}}
				}
				}
				xazax.hunUnsubmitted Not Done Reply Inline Actions Nit: please add new line at the end of the file. xazax.hun: Nit: please add new line at the end of the file.
				ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I've added a new line and checked twice before creating a patch. But it didn't appear. ASDenysPetrov: I've added a new line and checked twice before creating a patch. But it didn't appear.
				xazax.hunUnsubmitted Done Reply Inline Actions This looks good now. If it misses the new line there is a message saying that explicitly. xazax.hun: This looks good now. If it misses the new line there is a message saying that explicitly.

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Reasoning about comparison expressions in RangeConstraintManagerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270767

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

clang/test/Analysis/constraint_manager_conditions.cpp

[analyzer] Reasoning about comparison expressions in RangeConstraintManager
ClosedPublic