This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
43/56
RangeConstraintManager.cpp
-
test/Analysis/
-
Analysis/
-
svalbuilder-casts.cpp
1/2
symbol-integral-cast.cpp

Differential D103096

[analyzer] Implement cast for ranges of symbolic integers
Needs ReviewPublic

Authored by ASDenysPetrov on May 25 2021, 9:19 AM.

Download Raw Diff

Details

Reviewers

NoQ
vsavchenko
steakhal
martong
dcoughlin
baloghadamsoftware

Summary

Support integral cast for ranges of symbolic integers. Previously we only support integral cast for concrete integers.
Reason about the ranges of SymbolCast expressions. Apply truncations, promotions and conversions to get a correct range set using nested types of a SymbolCast.

Fixes: https://github.com/llvm/llvm-project/issues/50380

The solution

Create a map which contains a bitwidth as a key and a range set as a data. Call it CastMap.
CastMap = Map<uint32_t, RangeSet>

NOTE: LLVM-IR has the ability to represent integers with a bitwidth from 1 all the way to 16'777'215. See _ExtInt Feature for details.

NOTE: We don't care about certain signedness of RangeSet stored in CastMap. But the signedness of all stored RangeSet in the map shall be the same.

Create a map which contains a symbol as a key and CastMap as a data. Call it SymCastMap.
SymCastMap = Map<SymbolRef, CastMap>

Store and update SymCastMap for every SymbolCast and every SymExpr which represents an integer.
Use a root symbol of SymbolCast as a key of the map. E.g. for (int16)(uint8)(int32 x) root symbol is (int32 x).
For SymExpr use the symbol itself as a key of the map.

Getting a constraint

Get a key symbol from SymbolCast/SymExpr.
Get a CastMap of constraints from SymCastMap using a key symbol.
Find the smallest type of the given cast symbolic expression.
Find a RangeSet in the CastMap for equal or the first bigger than the bitwidth of the smallest type.
If no RangeSet was found, create a new full RangeSet for the smallest type.
Sequentially cast the RangeSet across the chain of types starting from the most inner one.

Pseudocode

GivenSymbol = (int16)(uint8)(int32 x)

RootSymbol = GetRoot(GivenSymbol) // (int32 x)
CastMap = GetCastMap(RootSymbol) // CastMap for (int32 x)
MinType = FindMinType(GivenSymbol) // uint8
MinBitwidth = BitwidthOf(MinType) // uint8

RangeSet = FindRange(CastMap , MinBitwidth)  // range for bitwidth of 8
if(!RangeSet)
  RangeSet = FindNextRange(CastMap, MinBitwidth) // range for bitwidth of 9+
if(!RangeSet)
  RangeSet = CreateRangeForType(MinType) // full range for uint8

CastChain = GetCastChain(GivenSymbol) // int32 -> uint8 -> int16
ResultRangeSet = RangeThroughCastChain(RangeSet, CastChain)

return ResultRangeSet

Setting a constraint

Get a key symbol from SymbolCast/SymExpr.
Get a map of constraints from SymCastMap using a key symbol.
Find the smallest type of the given cast symbolic expression.
Find and update all RangeSet's in the CastMap for bitwidths which are equal or lower than the bitwidth of the smallest type.
If there is no constraint for the bitwidth of the smallest type in the map, add a new entry with the given RangeSet.

Pseudocode

GivenRangeSet = [N, M]
GivenSymbol = (int16)(uint8)(int32 x)

RootSymbol = GetRoot(GivenSymbol) // (int32 x)
CastMap = GetCastMap(RootSymbol) // CastMap for (int32 x)
MinType = FindMinType(GivenSymbol) // uint8
MinBitwidth = BitwidthOf(MinType) // uint8

Bitwidth = MinBitwidth
while (Bitwidth > 0) // update all constraints which bitwidth is equal or lower then the minimal one
  RangeSet = FindRange(CastMap, Bitwidth)  // range for bitwidth of 8 and lower
  UpdateRange(CastMap, Bitwidth, RangeSet ∩ GivenRangeSet) //  intersect ranges and store the result back to the map
  Bitwidth--

if(!RangeExistsInMap(CastMap, MinBitwidth))
  AddRange(CastMap, MinBitwidth, GivenRangeSet)  // store the given range to the map

See tests in symbol-integral-cast.cpp for examples.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added subscribers: manas, dkrupp, donat.nagy and 6 others. · View Herald TranscriptMay 25 2021, 9:19 AM

ASDenysPetrov requested review of this revision.May 25 2021, 9:19 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptMay 25 2021, 9:19 AM

ASDenysPetrov added a parent revision: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.May 25 2021, 9:19 AM

Harbormaster completed remote builds in B106111: Diff 347702.May 25 2021, 9:31 AM

ASDenysPetrov mentioned this in D99797: [analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency.May 26 2021, 5:55 AM

ASDenysPetrov mentioned this in D97388: [analyzer] Replace StoreManager::evalIntegralCast with SValBuilder::evalCast.May 27 2021, 8:45 AM

It sounds like you indeed solved a lot of problems that prevented us from enabling SymbolCast. But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence. If it works out though, it might be the best thing to happen to the static analyzer in years.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths. People are often careless about integral types and it may lead to bugs which your patch helps uncover. But it is worthless to uncover these bugs if the user can't understand them. I'm thinking of scenarios like this:

01  void foo(long x) {
02    if (x == 0)
03      return;
04
05    bar(x, nullptr);
06  }
07
08  void bar(int y, int *p) {
09    if (y == 0)
10      *p = 1; // warning: null dereference
11  }

The user will discard this as false positive because "I checked for zero in foo(), it obviously can't be zero in bar() in this context". There needs to be a note that explains the implicit truncation of an interesting* symbol on line 5. Maybe even mention truncation on line 9 as well (not sure how to word that).

I also wonder if a lot of such reports will also be false positives simply because the presumption of potential overflow is baseless. On paper it looks like "if the user didn't want to pass large values, they'd just use int instead of long". But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values). There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

__
*In this case it's interesting as a control flow dependency of the bug location; it sounds like without @Szelethus's control flow dependency tracking this advancement would have been virtually impossible.

@NoQ
This solution only intends to make correct calculations whenever cast occures. We can mark this as alpha or add an argument flag to turn cast reasoning on/off, or we can even disable any part of this patch with argument settings.

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

If it works out though, it might be the best thing to happen to the static analyzer in years.

Thank you. I appreciate your possitive evaluation.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths.

My proposition is to design and describe a paper of:

what cases shall be considered as erroneous and be reported;
what cases shall be ignored or considered as exceptions (i.e. static_cast);
what wordings shall we use in reports;
how paths of those reports shall look like;
your options;

But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values).

IMO this is great when we tell user that he/she should make sure of the value bounds before passing the arguments to such APIs.

There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

For example, this test easily passes.

void test(int x) {
  assert(0 < x && x < 42);
  char c = x;
  clang_analyzer_eval(c <= 0); // expected-warning {{FALSE}}
  clang_analyzer_eval(c >= 42); // expected-warning {{FALSE}}
}

Or you meant some other cases?

__
*In this case it's interesting as a control flow dependency of the bug location; it sounds like without @Szelethus's control flow dependency tracking this advancement would have been virtually impossible.

Can this tracking mechanism be adjusted then?

In the end, should we go further with this patch and make more adjustments in CSA or reject it in view of your concerns?

ASDenysPetrov mentioned this in D103317: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree.Jun 2 2021, 3:47 AM

In D103096#2789439, @ASDenysPetrov wrote:

@NoQ
This solution only intends to make correct calculations whenever cast occures. We can mark this as alpha or add an argument flag to turn cast reasoning on/off, or we can even disable any part of this patch with argument settings.

That would be awesome. I think we should land this patch under an -analyzer-config flag. This will allow us to experiment with viability of enabling cast symbols at any time as we improve the constraint solver. Additionally, presence of cast symbols is extremely valuable for z3 runs in which our concerns for increased complexity are eliminated.

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

Test on a lot of real-world code. Like, seriously, *A LOT* of real-world code. Say, 20x the amount of code we have in docker and csa-testbench, from a large variety of projects. Investigate newly appeared reports carefully to understand the impact. I'll be happy to help with this at some point.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths.

My proposition is to design and describe a paper of:

what cases shall be considered as erroneous and be reported;

what cases shall be ignored or considered as exceptions (i.e. static_cast);

what wordings shall we use in reports;

how paths of those reports shall look like;

your options;

I think we should start from examples. While testing on real-world code, find poorly explained reports and see what piece of information is missing and preventing the user from understanding the warning. Then add that piece of information.

But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values).

IMO this is great when we tell user that he/she should make sure of the value bounds before passing the arguments to such APIs.

I'm thinking about warnings inside the implementations of such APIs.

There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

For example, this test easily passes.
void test(int x) {
  assert(0 < x && x < 42);
  char c = x;
  clang_analyzer_eval(c <= 0); // expected-warning {{FALSE}}
  clang_analyzer_eval(c >= 42); // expected-warning {{FALSE}}
}
Or you meant some other cases?

I meant real-world examples. We should see if it works on real-world code where constraints are significantly more complex.

There is so much fancy stuff going on upstream. Awesome to see.
I'm trying to catch up ASAP, I'm finally done with my master's thesis.

In D103096#2798238, @NoQ wrote:

Additionally, presence of cast symbols is extremely valuable for z3 runs in which our concerns for increased complexity are eliminated.

You are probably referring to D85528. (Without that patch, Z3 refutation crashes all over the place due to the not modeled widening/narrowing casts.)

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

Test on a lot of real-world code. Like, seriously, *A LOT* of real-world code. Say, 20x the amount of code we have in docker and csa-testbench, from a large variety of projects. Investigate newly appeared reports carefully to understand the impact. I'll be happy to help with this at some point.

The CSA-testbench is capable of using the Conan package manager.
By cloning the https://github.com/conan-io/conan-center-index you can get a bunch of Conan package recipes, with tests actually using the given library.
Running their tests would ensure that header-only libraries get analyzed as well as normal libraries. However, frequently used packages would be analyzed over and over again, similarly to how headers suffer from this.

We planned to make use of this in the future to make CTU analysis and Z3 refutation more and more robust.
As soon as we have the infrastructure and scale, we plan to enable a small set of contributors of initiating such measurements, but don't expect it in the close future.

Added a boolean option handle-integral-cast-for-ranges under -analyzer-config flag. Disabled the feature by default.

@NoQ, @steakhal
How do you think whether it's neccesory to add any changes in SMTConstraintManager in scope of this patch?

Harbormaster completed remote builds in B108193: Diff 350579.Jun 8 2021, 4:50 AM

What about this patch?

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
950–951	That's definitely regresses the interface, so `NominalTypeList` should be definitely reworked.
1214–1241	This looks like a very `static` data structure to me, I don't see any reasons why the user should be able to create multiple copies of it. If it becomes a static data-structure, there will be no need in passing it around.
1236–1272	I think all this extra logic about how we infer ranges for casts is interesting, but should be a separate patch. For now, you can simply put `return Visit(Sym->getOperand());`. First, it will unblock you from depending on that `RangeFactory` feature. And also have quite a few questions about this particular implementation, so it will stagger this patch as well.
3137–3196	I need more explanation why we have this function and why we call it where we call it. Additionally, it again looks like it belongs in a separate patch.

This revision now requires changes to proceed.Jun 16 2021, 5:34 AM

In D103096#2821750, @vsavchenko wrote:

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

Thanks for the tips. I'll adress them in the next update. Actually, I thought about splitting before the first upload and splitted it into D103094 and the current one. This particular patch provides full mechanism implementing feasibility of the test cases. Honestly, I don't know what part could be cut to keep this mechanism holistic and self-sufficient. But I'll see what i can do.

In D103096#2822965, @ASDenysPetrov wrote:

In D103096#2821750, @vsavchenko wrote:

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

Honestly, I don't know what part could be cut to keep this mechanism holistic and self-sufficient. But I'll see what i can do.

I know: solver part = separate patch.
As I said, introduce a very minimal support in solver (aka VisitSymbolCast in Inferrer) and that's it. All other algorithms, like looking for constraints for the same expression, but casted to larger type, logically belong in a separate where you actually start producing symbolic casts.

vsavchenko added inline comments.Jun 17 2021, 1:23 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3179–3193	What's the point of this when you do reverse operation in `Inferrer`? As far as I understood, in `VisitSymbolCast`, you iterate over larger types and see if the same symbol was casted to any of those, and if yes you truncate the result and use that range. Here, when we are about to set the constraint for a casted symbol, you iterate over smaller types, truncate this range for a smaller type, construct a cast to that smaller type, and add constraint for that symbol as well. So, if this is correct, these two pieces of code DO THE SAME WORK and ONLY ONE should remain.

ASDenysPetrov mentioned this in D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast.Jul 2 2021, 2:47 AM

Splitted this revision and moved SValBuilder related changes to separate patch D105340. Added detailed comments. Made NominalTypeList static and as a result removed the forwarding across the functions. Spread handleSymbolCast logic to three methods: modifySymbolAndConstraints, updateExistingConstraints, getProperSymbolAndConstraint.

ASDenysPetrov edited parent revisions, added: D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast; removed: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.Jul 6 2021, 3:47 AM

Harbormaster completed remote builds in B112574: Diff 356668.Jul 6 2021, 3:47 AM

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 6 2021, 3:48 AM

This is a very complicated patch, I think we'll have to iterate on it quite a lot.
Additionally, we have to be sure that this doesn't crash our performance.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1214	Comments on: why do we need it? why does it have four types? why do we not care about signed/unsigned types?
3254–3256	OK, but I still don't understand one thing. Here you go over all "smaller" types and artificially create constraints for them, and at the same time in `VisitSymbolCast` you do the opposite operation? Why? Shouldn't the map have constraints for smaller types already because of this action? Why do we need to do both?
3258–3259	This looks like a pattern and we should probably make into a method of `SymbolCast`

I found some issues. Working on improvement.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3254–3256	I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.
3258–3259	I did it :) but refused. It will just turn into: if (isa<SymbolCast>(Sym)) Sym = cast<SymbolCast>(Sym)->getRootOperand(); It looks pretty the same and brings no benefit IMO, does it? Every time I used `getRootOperand` I also needed some additional traverse through the types te get some another information, so I couldn't avoid the `while` loop there. So I decided not to introduce a new method in `SymbolCast`.

vsavchenko added inline comments.Jul 6 2021, 9:07 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3258–3259	Aha, I see your point. I guess we can take it into `SymExpr` and call it not `getRootOperand`, which won't tell much to a person reading the name, but something like `ignoreCasts`. It will fit well with `Expr::IgnoreCasts`, `Expr::IgnoreParens`, etc.

ASDenysPetrov added inline comments.Jul 6 2021, 10:11 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3258–3259	Nice idea! True, `getRootOperand` would only tell enough to user in scope of `SymbolCast`. I'll try to implement this in the next update.

Added SymExpr::ignoreCasts method. Added descriptive comments.

Harbormaster completed remote builds in B112999: Diff 357240.Jul 8 2021, 8:56 AM

Added more descriptive comments. Fixed RangeConstraintManager::updateExistingConstraints function.

Harbormaster completed remote builds in B113164: Diff 357463.Jul 9 2021, 3:04 AM

Can you please explain why you do the same thing in two different ways?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)

@vsavchenko

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3254–3256	I've fixed `RangeConstraintManager::updateExistingConstraints`. There was a mistake when I update smaller types from the root symbol, but correct symbol is the given symbol which is before calling `ignoreCast()`. May be now it would be more clear for you.

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

In D103096#2866730, @vsavchenko wrote:

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:

void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

In D103096#2867021, @ASDenysPetrov wrote:
In D103096#2866730, @vsavchenko wrote:

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:
void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

OK, it's something! Good!
I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	Why do you use `VisitSymExpr` here? You want to interrupt all `Visits or... I'm not sure I fully understand.
1244	Can we get a test for that?
1245	Same goes here.
1270	Why do you get associated constraint directly without consulting with what `SymbolRangeInferrer` can tell you about it?

Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:

int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)

That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

In D103096#2867104, @ASDenysPetrov wrote:
Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:
int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)
That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

What we do in Inferrer is that we try to look at many sources of information and intersect their ranges. And I repeat my question again in a bit different form, why can't it look up constraints for (char)x and for (short)x and intersect them?
You should admit you never really address this question. Why can't VisitSymolCast do everything?

In D103096#2867136, @ASDenysPetrov wrote:

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

It should not be about debugging, it's your code! Why did you write it this way!?

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

ASDenysPetrov added inline comments.Jul 9 2021, 10:50 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1270	What do you mean? I didn't get. Could you give an example?

In D103096#2867441, @ASDenysPetrov wrote:

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

[-10, 10] is also valid, right? You can't keep things at their best all the time. And if you want all constraints directly in the map then what's all this logic in VisitSymbolCast? That's why I keep asking why do you need both parts of this solution and didn't get any answer so far.
I'm hands down for the incremental approach and adding small-to-medium size improvements on top of each other. That makes my life as a reviewer easier :) That's said, I don't want to commit to a big solution, where the author doesn't want to explain why there are two parts of the solution instead of one.

I want you to tell me why the code that's in VisitSymbolCast does what it does. And the same about updateExistingConstraints. Also I want to hear a solid reason why it's split this way and why we need both of them.

You should understand that I'm not peaking on you personally. The review process takes a lot of my time too. I want to make it easier for both of us. When the reviewer understand what you are going for, it is much easier for them to help you in refining your solution. This patch is very big, but the summary doesn't cover the main part: the approach. And you leave me here dragging it out of you.

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 12 2021, 9:47 AM

@vsavchenko I've updated the summary. I hope, I addressed your question. Thanks.

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 12 2021, 10:16 AM

ASDenysPetrov added inline comments.Jul 13 2021, 5:52 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	Here we want to delegate the reasoning to another handler as we don't support non-integral cast yet.
1244	I'll add some.

ASDenysPetrov added inline comments.Jul 13 2021, 6:23 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Do you think the recursive call is better than the loop? But, I guess, I see your point. You option could be safer if we had another implementation of the virtual method. Or you think such alike cast symbol is possible in the future? Well, for now `ignoreCasts` doesn't make sense to any other `Expr` successors.

I'll allocate some time to get into your summary, but for now here are my concerns about SymbolRangeInferrer and VisitSymbolCast.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	You are not delegating it here. `Visit` includes a runtime dispatch that calls a correct `VisitTYPE` method. Here you call `VisitSymExpr` directly, which is one of the `VisitTYPE` methods. No dispatch will happen, and since we use `VisitSymExpr` as the last resort (it's the most base class, if we got there, we don't actually support the expression), you interrupt the `Visit` and go directly to "the last resort". See the problem now?
1270	`getConstraint` returns whatever constraint we have stored directly in the constraint map. That's the main source of information for ranges, but not the only one. Here is the of things that you skip, when you do `getConstraint` here: we can understand that something is equality/disequality check and find the corresponding info in Equivalence Classes data structure we can see that the expression has the form `A - B` and we can find constraint for `B - A` we can see that the expression is comparison `A op B` and check what other comparison info we have on `A` and `B` (your own change) we can see that the expression is of form `A op B` and check if we know something about `A` and `B`, and produce a reasonable constraint out of this information In order to use the right information, you should use `infer` that will actually do all other things as well. That's how `SymbolRangeInferrer` is designed, to be recursive. Speaking of recursiveness. All these loops and manually checking for types of the cast's operand is against this pattern. Recursive visitors should call `Visit` for children nodes (like `RecursiveASTVisitor`). In other words, if `f(x)` is a visit function, it should be defined like this: f(x) = g(f(x->operand_1), f(x->operand_2), ... , f(x->operand_N)) or if we talk about your case specifically: f(x: SymbolCast) = h(f(x->Operand)) and the `h` function should transform the range set returned by `f(x->Operand)` into a range set appropriate for `x`. NOTE: `h` can also intersect different ranges

vsavchenko added inline comments.Jul 13 2021, 6:58 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Oh, wait, why is it even virtual? I don't think that it should be virtual. Are similar functions in `Expr` virtual? And I think that this implementation should live in `SymExpr` directly. Then it would look like: if (const SymbolCast *ThisAsCast = dyn_cast<SymbolCast>(this)) { return ThisAsCast->ignoreCasts(); } return this;

ASDenysPetrov added inline comments.Jul 13 2021, 7:37 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Yes, `SymExpr` is an abstract class. And because of limitations and dependency of inheritance we are not able to know the implementaion of `SymbolCast`. Unfortunately, this is not a CRTP.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	OK. I reject this idea before. If we call `Visit` inside `VisitSymbolCast`, we will go into recursive loop, because it will return us back to `VisitSymbolCast` as we have passed `Sym` as is. (This is theoretically, I didn't check in practice.) Or I'm missing smth? I choosed `VisitSymExpr` here because all kinds of `SymbolCast` were previously handled here. So I decided to pass all unsupproted forms of casts there.
1270	Thank you for useful notes! I'll take them into account.

vsavchenko added inline comments.Jul 13 2021, 8:01 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	Did I suggest to `Visit(Sym)`? Of course it is going to end up in a loop! Why isn't it `Visit(Sym->getOperand())` here? Before we started producing casts, casts were transparent. This logic would fit perfectly with that.
1245	And here, since we couldn't really reason about it, we usually return `infer(T)`.

OK, thanks for putting a summary. I now got a good idea why you need both.
At the same time, take a look at D105692. I'm about to land it and I think it's going to be useful for you.

ASDenysPetrov added inline comments.Jul 13 2021, 11:32 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	were transparent. Not exactly. There are still some cases when symbols are not equal to there roots(aka Operands). Such cases were handled by `VisitSymExpr` which uses `infer(Sym->getType());` instead of getOperand`. So this needs a sort of think twice. Also I see a problem with `EquivalenceClass`'es. Consider next: int x, y; if(x == y) if ((char)x == 2) if(y == 259) // Here we shall update `(char)x` and find this branch infeasible. Also such cases like: if(x == (short)y) // What we should do(store) with(in) `EquivalenceClass`es. Currently, I have an obscure vision of the solution.

// 1. `VisitSymbolCast`.
// Get a range for main `reg_$0<int x>` - [-2147483648, 2147483647]
// Cast main range to `short` - [-2147483648, 2147483647] -> [-32768, 32767].
// Now we get a valid range for further bifurcation - [-32768, 32767].

That's a great example, thanks for putting it together. I can see your point now!

Please, rebase your change and make use of ConstraintAssignor, and rework VisitSymbolCast.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Re-read my comment, please.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1232	There are still some cases when symbols are not equal to there roots(aka Operands) Right now we don't have casts, this is what we do currently. However faulty it is, it is the existing solution and we should respect that. Also I see a problem with EquivalenceClass'es. Because of the current situation with casts (or more precisely with their lack), `EquivalenceClass`es do not get merged for symbols with different types. It is as simple as that. You can find similar tests in `equality_tracking.c`.

ASDenysPetrov added inline comments.Jul 14 2021, 10:26 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Oh, wait, why is it even virtual? `ignoreCasts` is a virtual function because I haven't found any other way to implement it better. I don't think that it should be virtual. Unfortunately, this is not a CRTP to avoid dynamic linking. Are similar functions in Expr virtual? `SymExpr` is an abstract class. I'm not sure about similarity but `SymExpr` has such virtual methods: computeComplexity getType getOriginRegion And I think that this implementation should live in SymExpr directly. It's impossible due to `SymExpr` implementation design. `SymExpr` knows nothing about implementation details of `SymbolCast` to invoke `ignoreCasts()`.

vsavchenko added inline comments.Jul 14 2021, 10:38 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	a) `Expr` is also an abstract class b) I put the implementation right there in the comment above. I don't see any reasons not to use it. c) I don't buy it about "impossible" and "implementation design" because you can always declare function in one place and define it in the other.

Rebased

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	I think I achieved of what you've been concerned.

Harbormaster completed remote builds in B114049: Diff 358685.Jul 14 2021, 1:25 PM

In D103096#2877818, @ASDenysPetrov wrote:

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

It is already a pattern in other type hierarchies.
Virtual functions are only good, when they can have multiple implementations. ignoreCasts by its name can have only one implementation and couldn't be virtual. That's it! It is more useable now, and less confusing for its users. The fact that its definition lives in some other cpp file doesn't change it.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	This function should be removed then.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3150–3152	No, please, remove duplication by putting it inside of the constraint assignor. It is designed specifically so we don't duplicate code around `assumeSymXX` functions.

@vsavchenko

It is already a pattern in other type hierarchies.

I just rarely met them. And it was hard to me reading the code searching for implementation all over the places.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	NP.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3150–3152	+1. That's what I've recently thought about. :)

Improved ignoreCasts implementation. Adapted to ConstraintAssignor.

vsavchenko added inline comments.Jul 15 2021, 9:15 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2155–2159	That's not using `ConstraintAssignor`, you simply put your implementation in here. That won't do!

ASDenysPetrov added inline comments.Jul 15 2021, 9:31 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2155–2159	OK, please tell me how to use it correctly in my case.

vsavchenko added inline comments.Jul 15 2021, 9:36 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2155–2159	Can you read the comments first and then ask me if you have any specific questions?

Harbormaster completed remote builds in B114266: Diff 359007.Jul 15 2021, 11:20 AM

Adapted solution to ConstraintAssignor API. Added tests.

ASDenysPetrov added inline comments.Jul 16 2021, 6:38 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2155–2159	I think I did it. Could you please review the changes?

Harbormaster completed remote builds in B114485: Diff 359305.Jul 16 2021, 7:29 AM

Ping.

ASDenysPetrov retitled this revision from [analyzer] Implement cast for ranges of symbolic integers. to [analyzer] Implement cast for ranges of symbolic integers.Nov 17 2021, 1:59 AM

Rebased.

Harbormaster completed remote builds in B135157: Diff 388552.Nov 19 2021, 12:11 PM

ASDenysPetrov mentioned this in D114718: [analyzer] Implement a new checker for Strict Aliasing Rule..Nov 30 2021, 8:31 AM

ASDenysPetrov mentioned this in D115932: [Analyzer] Create and handle SymbolCast for pointer to integral conversion.Jan 10 2022, 4:35 AM

Ping. If there is somebody interested in this? :)

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 10:13 AM

First of all, thanks Denys for working on this, nice work! Here are my concerns and remarks.

I think this fixed set of types in NominalTypeList is too rigid. I don't like the fact that we have to iterate over all four types whenever we set a new constraint or when we try to infer. Also, I am thinking about downstream hardware architectures, where there might be integers with different bit-widths (@vabridgers). Also, at some point people will pursue us to support integers with arbitrary bitwidth (see _ExtInt)

Thus, I am proposing an alternative approach. We should have a SymExpr -> Set of SymExpr mapping in the State that represents the relation of symbols that are connected via some cast operations (see REGISTER_MAP_WITH_PROGRAMSTATE). Let's call this mapping as CastMap. The key should be the root symbol, i.e the symbol that is being declared first before all cast operations.

E.g. Let's have

int16 a = 128;

then we have a constraint [128,128] stored for $a. Then

if ((int8)a < 0)

creates a new symbol $a2 (SymbolCast) that has a new constraint [-128,-128] assigned to it. And we also keep track in the State, that $a and $a2 refers the same root symbol (a). We now have in the CastMap $a -> [$a2].

Now, let's say we have

if ((_ExtInt(7))a > 64)

then we can dig up the existing contraints from CastMap to check for the State's validity and we can update all the constraints of $a and $a2 as needed. Also, CastMap is updated: $a -> [$a2, $a3].

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
421–426 ↗	(On Diff #388552)	Does it really matter? I mean, why do we need this change?
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2286	Could you please rebase? The "simplification" code part had been merged already to llvm/main and it is not part of this change.
2301–2325	I think this hunk should remain in `assignSymExprToConst`. Why did you move it?
2354–2364	I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad. I mean, we should simply store the constraint directly for the symbol as it is. And then only in `VisitSymbolCast` should we infer the proper value from the stored constraint (if we can). (Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)
2355
2358
2359
2360
2396
2420–2424	Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g `0` or in case of the `true` rangeSet {[MIN, -1], [1, MAX]}).

Thank you for the review @martong! Your work is not less hard than mine. I'll rework and update the revision ASAP.

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
421–426 ↗	(On Diff #388552)	I investigated. This changes is not obligatory now. I'll remove it.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2301–2325	I'll remove. It's unrelated one.
2354–2364	I see what you mean. I thought about this. Here what I've faced with. Let's say you meet `(wchar_t)x > 0`, which you store like a pair {(wchar_t)x, [1,32767]}. Then you meet `(short)x < 0`, where you have to answer whether it's `true` or `false`. So would be your next step? Brute forcing all the constraint pairs to find any `x`-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient. What I propose is to "canonize" arbitrary types to a general form where this form could be a part of key along with `x` and we could get a constraint with a classic map complexity. So that: You meet `(wchar_t)x > 0`, which you convert `wchar_t` to `int16` and store like a pair {(int16)x, [1,32767]}. Then you meet `(short)x < 0`, where you convert `short` to `int16` and get a constraint. That's why I've introduced `NominalTypeList`. But now I admited your concern about arbitrary size of integers and will redesign my solution.
2420–2424	No, it's incorrect. Consider next: int x; if(x > 1000000 \|\| x < 100000) return; // x (100'000, 1000'000) if((int8)x != 42) return; // x (100'000, 1000'000) && (int8)x (42, 42) We can't just remove or invalidate `x (100'000, 1000'000)` because this range will still stay true. Strictly speaking `x` range should be updated with values 100394, 102442, 108586, ...,, 960554 and any other value within the range which has its lowest byte equals to 42. We can't just update the `RangeSet` with such a big amount of values due to performance issues. So we just assume it as less accurate.

martong added inline comments.Apr 25 2022, 3:07 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2354–2364	So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient. I don't think we need a brute force search among the constraints if we have the additional `CastMap` (that I have previously proposed). So, the next step would be: Lookup the root symbol of `(short)x` that is `(wchar_t)x` (supposing we have seen a declaration `wchar_t x;` first during the symbolic execution, but having a different root might work out as well). Then from the `CastMap` we can query in O(1) the set of the related cast symbols of the root symbol. From this set it takes to query the constraints for each of the members from the existing equivalneClass->constraint mapping. That's O(1) times the number of the members of the cast set (which is assumed to be very few in the usual case).
2420–2424	Okay, this makes perfect sense, thanks for the example!

@martong thank you for the idea. I've tried to implement it. Could you look at the patch once again, please? I've also described a new solution in the Summary.

Harbormaster completed remote builds in B161422: Diff 425248.Apr 26 2022, 9:37 AM

ASDenysPetrov marked 2 inline comments as done.Apr 26 2022, 9:37 AM

Thanks Denys for the update! This is getting really good.

I have some concerns though about the CastMap = Map<uint32_t, RangeSet>. I think we should have CastMap = Map<uint32_t, EquivalenceClass> instead, and we could get the RangeSet from the existing ConstraintRange mapping. By storing directly the RangeSet, the State might get out-of-sync when we introduce a constraint to another member in an equivalence class. (Besides that, our mapping of constraints is happening always by using the EquivalenceClasses as keys.)
I think this could solve the problematic code you posted earlier

int x, y;
if(x == y)
  if ((char)x == 2)
    if(y == 259)
      // Here we shall update `(char)x` and find this branch infeasible.

Here we have EqClass1: [x, y] , EqClass2: [(char)x] and they are not the same class, thus when you iterate over the CastMap, you can get the updated RangeSet for both classes, and the infeasibility can be discovered.

About this:

if(x == (short)y)
  // What we should do(store) with(in) `EquivalenceClass`es.

In this case, we have one EqClass with two members, the SymbolRef x and the SymbolCast (short)y. They both must have the same RangeSet associated to them. And this is already implemented. By referreing to the EqClass in the CastMap, we simply can reuse this information.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1248–1249	By using `llvm::Expected<T>` we would be more aligned with the llvm error handling practices. Besides, the `bool` in the tuple and the `Success` variable in the function below would not be needed.

NoQ added inline comments.Apr 28 2022, 1:54 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
891–892	These maps will need to be cleaned up when symbols become dead (as in `RangeConstraintManager::removeDeadBindings()`).

Giving it some more thought, the SymCastMap = Map<SymbolRef, CastMap> should be keyed as well with an equivalence class : SymCastMap = Map<EquivalenceClass, CastMap>. This is the only way to use the equivalence info correctly when we process the casts.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
891–892	Yes, the same way as we clean up e.g. the `DisequalityMap`.

martong added inline comments.Apr 29 2022, 3:04 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1308–1310	There might be a problem here because the iteration of the map is non-deterministic. We should probably have a copy that is sorted, or the container should be sorted (sorted immutable list maybe?). Your tests below passed probably because the cast chains are too small. Could you please have a test, where the chain is really long (20 maybe) and shuffled. (My thanks for @steakhal for this additional comment.)
2392–2395	Same here.
clang/test/Analysis/symbol-integral-cast.cpp
32

martong mentioned this in D124658: [analyzer] Canonicalize SymIntExpr so the RHS is positive when possible.Apr 29 2022, 5:33 AM

Ping

In D103096#3502955, @martong wrote:

Ping

Thank you, folk, for taking you time. I'll surely make corresponding changes according to your suggestions and notify you then. Sorry, @martong, for the late response. I'm pretty loaded recent times.

ASDenysPetrov edited the summary of this revision. (Show Details)May 19 2022, 10:26 AM

martong added a parent revision: D126481: [analyzer] Handle SymbolCast in SValBuilder.May 26 2022, 8:44 AM

Denys, I've created a very simple patch that makes the SValBuilder to be able to look up and use a constraint for an operand of a SymbolCast. That change passes 2 of your test cases, thus I made that a parent patch.

clang/test/Analysis/symbol-integral-cast.cpp
13–37	These two tests are redundant because they are handled by the Parent patch I've just created. https://reviews.llvm.org/D126481

ASDenysPetrov mentioned this in D126481: [analyzer] Handle SymbolCast in SValBuilder.May 27 2022, 11:14 AM

@martong Just FYI. I've been working on reworking this solution to using EquivalenceClasses for several weeks. It turned out that this is an extremely hard task to acomplish. There're a lot of cast cases like: (int8)x==y, (uint16)a==(int64)b, (uint8)y == b, Merging and inferring all of this without going beyond the complexity O(n) is really tricky.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1308–1310	I've checked. `ImmutableSet` gave me a sorted order. But I agree that it could be just a coincidence. I'll try to add more tests.

In D103096#3613059, @ASDenysPetrov wrote:

@martong Just FYI. I've been working on reworking this solution to using EquivalenceClasses for several weeks. It turned out that this is an extremely hard task to acomplish. There're a lot of cast cases like: (int8)x==y, (uint16)a==(int64)b, (uint8)y == b, Merging and inferring all of this without going beyond the complexity O(n) is really tricky.

Please elaborate. I don't see how is it different than merging and inferring without the handling of casts. My understanding is that, we have more symbols (additional SymbolCasts) indeed. But, the merging algorithm should not change at all, that should be agnostic to the actual symbol kind (whether that is a SymbolCast or a SymbolData or a SymSymEpxr).
The infer algorithm might be different though, but there I think the original algorithm you delved for SymExprs should work as well for EquivalenceClasses.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1308–1310	Yes, your are right, and I was wrong. `ImmutableSet` is based on an AVL tree, which is a balanced binary tree and the `begin()` gives such an iterator that conducts an inorder - thus sorted - traversal. And the key is an integer. (I don't know now how, but we were mislead, @steakhal too. I guess, we thought that the key of `CastMap` is a pointer.)
2392–2395	Please disregard the comment above.

@martong Thank you for your patience. I started moving to a bit another direction that we can improving it iteratively.
Just spoiling, in my latest solution a single symbol will be associated to many classes. Here are also tricky cases:

Consider equalities

a32 == (int8)b32
b32 == c32

Class-Symbol map is:

class1 { a32 , (int8)b32 }
class2 { b32 , c32 }

Symbol-Class map is:

a32 { 32 : class1 }
b32 {  8 : class1, 32 : class2  }
c32 { 32 : class2 }

If we know:

(int8)c32 == -1

then what is:

(int8)a32 - ?

Should we traverse like a -> 32 -> class1 -> (int8)b32 -> b32 -> class2 -> c32 -> (int8)c32 ?

The x8 == y32 we can treat as a range of int8 ( {-128, 127} or {0, 255} ).

For (int8)x32 == (int16)x32 we can eliminate one of the symbols in the class a s a redundant one.

If x32 == 0 then we can simplify next classes (int16)x32 == y and (int8)x32 == z merging them into a single class {x32, y, z}.

I believe there are more cases.

Thanks Denys for your continued work on this. These are very good questions that must be answered, we need exactly such thinking to implement this properly. I believe we can advance gradually.

In D103096#3642681, @ASDenysPetrov wrote:
@martong Thank you for your patience. I started moving to a bit another direction that we can improving it iteratively.
Just spoiling, in my latest solution a single symbol will be associated to many classes. Here are also tricky cases:

Consider equalities
a32 == (int8)b32
b32 == c32
Class-Symbol map is:
class1 { a32 , (int8)b32 }
class2 { b32 , c32 }
Symbol-Class map is:
a32 { 32 : class1 }
b32 {  8 : class1, 32 : class2  }
c32 { 32 : class2 }
If we know:
(int8)c32 == -1
then what is:
(int8)a32 - ?
Should we traverse like a -> 32 -> class1 -> (int8)b32 -> b32 -> class2 -> c32 -> (int8)c32 ?

I think, we should have only a -> 32 -> class1 -> (int8)b32. The (int8)b32 -> b32 step would be incorrect according to the modulo logic.
With other words, we should check the equivalence class of the root symbol (and no other eq classes should be considered), in this case this is only class1. (More precisely we should check the SymCastMap of class1.)

The x8 == y32 we can treat as a range of int8 ( {-128, 127} or {0, 255} ).

I am not sure what you mean here.
You mean, when we bifurcate on x8 == y32 ? Then we have two branches, the false case x8 == y32: [0, 0] and x8 == y32: [[INT_MIN, -1], [1,INT_MAX]]

For (int8)x32 == (int16)x32 we can eliminate one of the symbols in the class a s a redundant one.

Yes, but this is more like an optimization step. I'd handle this with low priority and with a FIXME comment.

If x32 == 0 then we can simplify next classes (int16)x32 == y and (int8)x32 == z merging them into a single class {x32, y, z}.

Good point. This is an optimization for precision. I'd like to have this, but in a subsequent patch. Let's try to have the absolute simplest working version in this patch.
Also, this can extend to any concrete value that is meaningful in the smaller types. E.g. any single value of x32 in [0,127] could simplify (int8)x32.

I believe there are more cases.

Yes. Consider liveness for example. We should remove the class from SymCastMap if the class itself becomes dead. This should be part of this patch.

ASDenysPetrov mentioned this in D112621: [analyzer][solver] Introduce reasoning for not equal to operator.Jul 15 2022, 10:03 AM

Completely reworked solution.

ASDenysPetrov edited parent revisions, added: D138319: [analyzer] Prepare structures for integral cast feature introducing; removed: D126481: [analyzer] Handle SymbolCast in SValBuilder, D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast.Nov 22 2022, 9:33 AM

Harbormaster completed remote builds in B199009: Diff 477233.Nov 22 2022, 9:33 AM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

136 lines

test/

Analysis/

svalbuilder-casts.cpp

12 lines

symbol-integral-cast.cpp

395 lines

Diff 477233

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show All 21 Lines

#include "llvm/ADT/ImmutableSet.h" #include "llvm/ADT/ImmutableSet.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallSet.h" #include "llvm/ADT/SmallSet.h"

#include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringExtras.h"

#include "llvm/Support/Compiler.h" #include "llvm/Support/Compiler.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include <algorithm> #include <algorithm>

#include <iterator> #include <iterator>

#include <set>

using namespace clang; using namespace clang;

using namespace ento; using namespace ento;

// This class can be extended with other tables which will help to reason // This class can be extended with other tables which will help to reason

// about ranges more precisely. // about ranges more precisely.

class OperatorRelationsTable { class OperatorRelationsTable {

static_assert(BO_LT < BO_GT && BO_GT < BO_LE && BO_LE < BO_GE && static_assert(BO_LT < BO_GT && BO_GT < BO_LE && BO_LE < BO_GE &&

▲ Show 20 Lines • Show All 844 Lines • ▼ Show 20 Lines

REGISTER_SET_FACTORY_WITH_PROGRAMSTATE(ClassSet, EquivalenceClass) REGISTER_SET_FACTORY_WITH_PROGRAMSTATE(ClassSet, EquivalenceClass)

REGISTER_MAP_FACTORY_WITH_PROGRAMSTATE(ClassMap, BitWidthType, EquivalenceClass) REGISTER_MAP_FACTORY_WITH_PROGRAMSTATE(ClassMap, BitWidthType, EquivalenceClass)

REGISTER_MAP_WITH_PROGRAMSTATE(SymClassMap, SymbolRef, ClassMap) REGISTER_MAP_WITH_PROGRAMSTATE(SymClassMap, SymbolRef, ClassMap)

REGISTER_MAP_WITH_PROGRAMSTATE(ClassMembers, EquivalenceClass, SymbolSet) REGISTER_MAP_WITH_PROGRAMSTATE(ClassMembers, EquivalenceClass, SymbolSet)

REGISTER_MAP_WITH_PROGRAMSTATE(ConstraintRange, EquivalenceClass, RangeSet) REGISTER_MAP_WITH_PROGRAMSTATE(ConstraintRange, EquivalenceClass, RangeSet)

REGISTER_MAP_WITH_PROGRAMSTATE(DisequalityMap, EquivalenceClass, ClassSet) REGISTER_MAP_WITH_PROGRAMSTATE(DisequalityMap, EquivalenceClass, ClassSet)

namespace { namespace {

/// This class encapsulates a set of symbols equal to each other. /// This class encapsulates a set of symbols equal to each other.

NoQUnsubmitted

Done

These maps will need to be cleaned up when symbols become dead (as in RangeConstraintManager::removeDeadBindings()).

NoQ: These maps will need to be cleaned up when symbols become dead (as in `RangeConstraintManager…

martongUnsubmitted

Done

Yes, the same way as we clean up e.g. the DisequalityMap.

martong: Yes, the same way as we clean up e.g. the `DisequalityMap`.

/// ///

/// The main idea of the approach requiring such classes is in narrowing /// The main idea of the approach requiring such classes is in narrowing

/// and sharing constraints between symbols within the class. Also we can /// and sharing constraints between symbols within the class. Also we can

/// conclude that there is no practical need in storing constraints for /// conclude that there is no practical need in storing constraints for

/// every member of the class separately. /// every member of the class separately.

/// ///

/// Main terminology: /// Main terminology:

/// ///

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines public:

/// members and then during the removal of dead symbols we remove one of its /// members and then during the removal of dead symbols we remove one of its

/// members. In this case, the class is still non-trivial (it still has the /// members. In this case, the class is still non-trivial (it still has the

/// mappings in ClassMembers), even though it has only one member. /// mappings in ClassMembers), even though it has only one member.

[[nodiscard]] inline bool isTrivial(ProgramStateRef State) const; [[nodiscard]] inline bool isTrivial(ProgramStateRef State) const;

/// Return true if the current class is trivial and its only member is dead. /// Return true if the current class is trivial and its only member is dead.

[[nodiscard]] inline bool isTriviallyDead(ProgramStateRef State, [[nodiscard]] inline bool isTriviallyDead(ProgramStateRef State,

SymbolReaper &Reaper) const; SymbolReaper &Reaper) const;

[[nodiscard]] static inline ProgramStateRef [[nodiscard]] static inline ProgramStateRef

vsavchenkoUnsubmitted

Not Done

That's definitely regresses the interface, so NominalTypeList should be definitely reworked.

vsavchenko: That's definitely regresses the interface, so `NominalTypeList` should be definitely reworked.

markDisequal(RangeSet::Factory &F, ProgramStateRef State, SymbolRef First, markDisequal(RangeSet::Factory &F, ProgramStateRef State, SymbolRef First,

SymbolRef Second); SymbolRef Second);

[[nodiscard]] static inline ProgramStateRef [[nodiscard]] static inline ProgramStateRef

markDisequal(RangeSet::Factory &F, ProgramStateRef State, markDisequal(RangeSet::Factory &F, ProgramStateRef State,

EquivalenceClass First, EquivalenceClass Second); EquivalenceClass First, EquivalenceClass Second);

[[nodiscard]] inline ProgramStateRef [[nodiscard]] inline ProgramStateRef

markDisequal(RangeSet::Factory &F, ProgramStateRef State, markDisequal(RangeSet::Factory &F, ProgramStateRef State,

EquivalenceClass Other) const; EquivalenceClass Other) const;

▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines [[nodiscard]] inline

} }

return intersect(F, Second, Tail...); return intersect(F, Second, Tail...);

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Symbolic reasoning logic // Symbolic reasoning logic

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

/// A little component aggregating all of the reasoning we have about /// A little component aggregating all of the reasoning we have about

vsavchenkoUnsubmitted

Not Done

Comments on:

why do we need it?
why does it have four types?
why do we not care about signed/unsigned types?

vsavchenko: Comments on: * why do we need it? * why does it have four types? * why do we not care about…

/// the ranges of symbolic expressions. /// the ranges of symbolic expressions.

/// ///

/// Even when we don't know the exact values of the operands, we still /// Even when we don't know the exact values of the operands, we still

/// can get a pretty good estimate of the result's range. /// can get a pretty good estimate of the result's range.

class SymbolicRangeInferrer class SymbolicRangeInferrer

: public SymExprVisitor<SymbolicRangeInferrer, RangeSet> { : public SymExprVisitor<SymbolicRangeInferrer, RangeSet> {

public: public:

template <class SourceType> template <class SourceType>

static RangeSet inferRange(RangeSet::Factory &F, ProgramStateRef State, static RangeSet inferRange(RangeSet::Factory &F, ProgramStateRef State,

SourceType Origin) { SourceType Origin) {

SymbolicRangeInferrer Inferrer(F, State); SymbolicRangeInferrer Inferrer(F, State);

return Inferrer.infer(Origin); return Inferrer.infer(Origin);

} }

RangeSet VisitSymExpr(SymbolRef Sym) { RangeSet VisitSymExpr(SymbolRef Sym) {

if (Optional<RangeSet> RS = getRangeForNegatedSym(Sym)) if (Optional<RangeSet> RS = getRangeForNegatedSym(Sym))

return *RS; return *RS;

if (Optional<RangeSet> RS = inferRangeForIntegralCast(State, Sym))

vsavchenkoUnsubmitted

Not Done

Why do you use VisitSymExpr here? You want to interrupt all `Visits or... I'm not sure I fully understand.

vsavchenko: Why do you use `VisitSymExpr` here? You want to interrupt all `Visits or... I'm not sure I…

ASDenysPetrovAuthorUnsubmitted

Done

Here we want to delegate the reasoning to another handler as we don't support non-integral cast yet.

ASDenysPetrov: Here we want to delegate the reasoning to another handler as we don't support non-integral cast…

vsavchenkoUnsubmitted

Not Done

You are not delegating it here. Visit includes a runtime dispatch that calls a correct VisitTYPE method. Here you call VisitSymExpr directly, which is one of the VisitTYPE methods. No dispatch will happen, and since we use VisitSymExpr as the last resort (it's the most base class, if we got there, we don't actually support the expression), you interrupt the Visit and go directly to "the last resort".

See the problem now?

vsavchenko: You are not delegating it here. `Visit` includes a runtime dispatch that calls a correct…

ASDenysPetrovAuthorUnsubmitted

Done

OK. I reject this idea before. If we call Visit inside VisitSymbolCast, we will go into recursive loop, because it will return us back to VisitSymbolCast as we have passed Sym as is. (This is theoretically, I didn't check in practice.) Or I'm missing smth?
I choosed VisitSymExpr here because all kinds of SymbolCast were previously handled here. So I decided to pass all unsupproted forms of casts there.

ASDenysPetrov: OK. I reject this idea before. If we call `Visit` inside `VisitSymbolCast`, we will go into…

vsavchenkoUnsubmitted

Not Done

Did I suggest to Visit(Sym)? Of course it is going to end up in a loop!
Why isn't it Visit(Sym->getOperand()) here? Before we started producing casts, casts were transparent. This logic would fit perfectly with that.

vsavchenko: Did I suggest to `Visit(Sym)`? Of course it is going to end up in a loop! Why isn't it…

ASDenysPetrovAuthorUnsubmitted

Done

were transparent.

Not exactly. There are still some cases when symbols are not equal to there roots(aka Operands). Such cases were handled by VisitSymExpr which uses infer(Sym->getType()); instead of getOperand`. So this needs a sort of think twice. Also I see a problem with EquivalenceClass'es. Consider next:

int x, y;
if(x == y)
  if ((char)x == 2)
    if(y == 259)
      // Here we shall update `(char)x` and find this branch infeasible.

Also such cases like:

if(x == (short)y)
  // What we should do(store) with(in) `EquivalenceClass`es.

Currently, I have an obscure vision of the solution.

ASDenysPetrov: > were transparent. Not exactly. There are still some cases when symbols are not equal to there…

vsavchenkoUnsubmitted

Not Done

There are still some cases when symbols are not equal to there roots(aka Operands)

Right now we don't have casts, this is what we do currently. However faulty it is, it is the existing solution and we should respect that.

Also I see a problem with EquivalenceClass'es.

Because of the current situation with casts (or more precisely with their lack), EquivalenceClasses do not get merged for symbols with different types. It is as simple as that.
You can find similar tests in equality_tracking.c.

vsavchenko: > There are still some cases when symbols are not equal to there roots(aka Operands) Right now…

return *RS;

// If we've reached this line, the actual type of the symbolic // If we've reached this line, the actual type of the symbolic

// expression is not supported for advanced inference. // expression is not supported for advanced inference.

// In this case, we simply backoff to the default "let's simply // In this case, we simply backoff to the default "let's simply

// infer the range from the expression's type". // infer the range from the expression's type".

return infer(Sym->getType()); return infer(Sym->getType());

} }

RangeSet VisitUnarySymExpr(const UnarySymExpr *USE) { RangeSet VisitUnarySymExpr(const UnarySymExpr *USE) {

vsavchenkoUnsubmitted

Not Done

This looks like a very static data structure to me, I don't see any reasons why the user should be able to create multiple copies of it.
If it becomes a static data-structure, there will be no need in passing it around.

vsavchenko: This looks like a very `static` data structure to me, I don't see any reasons why the user…

if (Optional<RangeSet> RS = getRangeForNegatedUnarySym(USE)) if (Optional<RangeSet> RS = getRangeForNegatedUnarySym(USE))

return *RS; return *RS;

return infer(USE->getType()); return infer(USE->getType());

vsavchenkoUnsubmitted

Not Done

Can we get a test for that?

vsavchenko: Can we get a test for that?

ASDenysPetrovAuthorUnsubmitted

Done

I'll add some.

ASDenysPetrov: I'll add some.

} }

vsavchenkoUnsubmitted

Not Done

Same goes here.

vsavchenko: Same goes here.

vsavchenkoUnsubmitted

Not Done

And here, since we couldn't really reason about it, we usually return infer(T).

vsavchenko: And here, since we couldn't really reason about it, we usually return `infer(T)`.

RangeSet VisitSymIntExpr(const SymIntExpr *Sym) { RangeSet VisitSymIntExpr(const SymIntExpr *Sym) {

return VisitBinaryOperator(Sym); return VisitBinaryOperator(Sym);

} }

martongUnsubmitted

Done

By using llvm::Expected<T> we would be more aligned with the llvm error handling practices. Besides, the bool in the tuple and the Success variable in the function below would not be needed.

martong: By using `llvm::Expected<T>` we would be more aligned with the [[ https://llvm.

RangeSet VisitIntSymExpr(const IntSymExpr *Sym) { RangeSet VisitIntSymExpr(const IntSymExpr *Sym) {

return VisitBinaryOperator(Sym); return VisitBinaryOperator(Sym);

} }

RangeSet VisitSymSymExpr(const SymSymExpr *SSE) { RangeSet VisitSymSymExpr(const SymSymExpr *SSE) {

return intersect( return intersect(

RangeFactory, RangeFactory,

// If Sym is a difference of symbols A - B, then maybe we have range // If Sym is a difference of symbols A - B, then maybe we have range

// set stored for B - A. // set stored for B - A.

// //

// If we have range set stored for both A - B and B - A then // If we have range set stored for both A - B and B - A then

// calculate the effective range set by intersecting the range set // calculate the effective range set by intersecting the range set

// for A - B and the negated range set of B - A. // for A - B and the negated range set of B - A.

getRangeForNegatedSymSym(SSE), getRangeForNegatedSymSym(SSE),

// If Sym is a comparison expression (except <=>), // If Sym is a comparison expression (except <=>),

// find any other comparisons with the same operands. // find any other comparisons with the same operands.

// See function description. // See function description.

getRangeForComparisonSymbol(SSE), getRangeForComparisonSymbol(SSE),

// If Sym is (dis)equality, we might have some information // If Sym is (dis)equality, we might have some information

// on that in our equality classes data structure. // on that in our equality classes data structure.

vsavchenkoUnsubmitted

Not Done

Why do you get associated constraint directly without consulting with what SymbolRangeInferrer can tell you about it?

vsavchenko: Why do you get associated constraint directly without consulting with what…

ASDenysPetrovAuthorUnsubmitted

Done

What do you mean? I didn't get. Could you give an example?

ASDenysPetrov: What do you mean? I didn't get. Could you give an example?

vsavchenkoUnsubmitted

Not Done

getConstraint returns whatever constraint we have stored directly in the constraint map. That's the main source of information for ranges, but not the only one.

Here is the of things that you skip, when you do getConstraint here:

we can understand that something is equality/disequality check and find the corresponding info in Equivalence Classes data structure
we can see that the expression has the form A - B and we can find constraint for B - A
we can see that the expression is comparison A op B and check what other comparison info we have on A and B (your own change)
we can see that the expression is of form A op B and check if we know something about A and B, and produce a reasonable constraint out of this information

In order to use the right information, you should use infer that will actually do all other things as well. That's how SymbolRangeInferrer is designed, to be recursive.

Speaking of recursiveness. All these loops and manually checking for types of the cast's operand is against this pattern. Recursive visitors should call Visit for children nodes (like RecursiveASTVisitor). In other words, if f(x) is a visit function, it should be defined like this:

f(x) = g(f(x->operand_1), f(x->operand_2), ... , f(x->operand_N))

or if we talk about your case specifically:

f(x: SymbolCast) = h(f(x->Operand))

and the h function should transform the range set returned by f(x->Operand) into a range set appropriate for x.

NOTE: h can also intersect different ranges

vsavchenko: `getConstraint` returns whatever constraint we have stored directly in the constraint map.

ASDenysPetrovAuthorUnsubmitted

Done

Thank you for useful notes! I'll take them into account.

ASDenysPetrov: Thank you for useful notes! I'll take them into account.

getRangeForEqualities(SSE), getRangeForEqualities(SSE),

// And we should always check what we can get from the operands. // And we should always check what we can get from the operands.

vsavchenkoUnsubmitted

Not Done

I think all this extra logic about how we infer ranges for casts is interesting, but should be a separate patch.
For now, you can simply put return Visit(Sym->getOperand());.

First, it will unblock you from depending on that RangeFactory feature.
And also have quite a few questions about this particular implementation, so it will stagger this patch as well.

vsavchenko: I think all this extra logic about how we infer ranges for casts is interesting, but should be…

VisitBinaryOperator(SSE)); VisitBinaryOperator(SSE));

} }

private: private:

SymbolicRangeInferrer(RangeSet::Factory &F, ProgramStateRef S) SymbolicRangeInferrer(RangeSet::Factory &F, ProgramStateRef S)

: ValueFactory(F.getValueFactory()), RangeFactory(F), State(S) {} : ValueFactory(F.getValueFactory()), RangeFactory(F), State(S) {}

/// Infer range information from the given integer constant. /// Infer range information from the given integer constant.

Show All 13 Lines if (ActualType->isIntegralOrEnumerationType() ||

return infer(Sym); return infer(Sym);

} }

// Otherwise, let's simply infer from the destination type. // Otherwise, let's simply infer from the destination type.

// We couldn't figure out nothing else about that expression. // We couldn't figure out nothing else about that expression.

return infer(DestType); return infer(DestType);

} }

RangeSet infer(SymbolRef Sym) { RangeSet infer(SymbolRef Sym) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

if (Opts.ShouldSupportSymbolicIntegerCasts)

return Visit(Sym);

return intersect(RangeFactory, return intersect(RangeFactory,

// Of course, we should take the constraint directly // Of course, we should take the constraint directly

// associated with this symbol into consideration. // associated with this symbol into consideration.

getConstraint(State, Sym), getConstraint(State, Sym),

// Apart from the Sym itself, we can infer quite a lot if // Apart from the Sym itself, we can infer quite a lot if

martongUnsubmitted

Done

There might be a problem here because the iteration of the map is non-deterministic. We should probably have a copy that is sorted, or the container should be sorted (sorted immutable list maybe?).

Your tests below passed probably because the cast chains are too small. Could you please have a test, where the chain is really long (20 maybe) and shuffled.
(My thanks for @steakhal for this additional comment.)

martong: There might be a problem here because the iteration of the map is non-deterministic. We should…

ASDenysPetrovAuthorUnsubmitted

Done

I've checked. ImmutableSet gave me a sorted order. But I agree that it could be just a coincidence. I'll try to add more tests.

ASDenysPetrov: I've checked. `ImmutableSet` gave me a sorted order. But I agree that it could be just a…

martongUnsubmitted

Done

Yes, your are right, and I was wrong. ImmutableSet is based on an AVL tree, which is a balanced binary tree and the begin() gives such an iterator that conducts an inorder - thus sorted - traversal. And the key is an integer. (I don't know now how, but we were mislead, @steakhal too. I guess, we thought that the key of CastMap is a pointer.)

martong: Yes, your are right, and I was wrong. `ImmutableSet` is based on an AVL tree, which is a…

// we look into subexpressions of Sym. // we look into subexpressions of Sym.

Visit(Sym)); Visit(Sym));

} }

RangeSet infer(EquivalenceClass Class) { RangeSet infer(EquivalenceClass Class) {

if (const RangeSet *AssociatedConstraint = getConstraint(State, Class)) if (const RangeSet *AssociatedConstraint = getConstraint(State, Class))

return *AssociatedConstraint; return *AssociatedConstraint;

▲ Show 20 Lines • Show All 312 Lines • ▼ Show 20 Lines RangeSet getTrueRange(QualType T) {

return assumeNonZero(TypeRange, T); return assumeNonZero(TypeRange, T);

} }

RangeSet getFalseRange(QualType T) { RangeSet getFalseRange(QualType T) {

const llvm::APSInt &Zero = ValueFactory.getValue(0, T); const llvm::APSInt &Zero = ValueFactory.getValue(0, T);

return RangeSet(RangeFactory, Zero); return RangeSet(RangeFactory, Zero);

} }

Optional<RangeSet> inferRangeForIntegralCast(ProgramStateRef State,

SymbolRef Sym) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

if (!Opts.ShouldSupportSymbolicIntegerCasts)

return None;

using NestedCastTypes = SmallVector<QualType, 3>;

using ParsedSym = std::tuple<

NestedCastTypes, // Nested types of cast. E.g. (int)(char)(short x).

BitWidthType, // Minimal bitwidth. E.g. (int)(char)(short x) ->

// sizeof(char).

SymbolRef, // Root symbol. E.g. (int)(char)(short x) -> short x.

bool // Success flag.

auto ParseSym = [&](SymbolRef Sym) -> ParsedSym {

ASTContext &C = State->getStateManager().getContext();

BitWidthType MinBitWidth = std::numeric_limits<BitWidthType>::max();

NestedCastTypes Types;

do {

const QualType T = Sym->getType();

// Check if we can handle the symbol.

if (!T->isIntegralOrEnumerationType())

return {{}, {}, {}, false};

// Find a minimal bitwidth.

const BitWidthType BitWidth = C.getIntWidth(T);

if (MinBitWidth > BitWidth)

MinBitWidth = BitWidth;

// Collect nested cast types.

Types.push_back(T);

// Traverse through cast symbols.

auto *SC = dyn_cast<SymbolCast>(Sym);

if (!SC)

break;

// Go to the root symbol.

Sym = SC->getOperand();

} while (true);

return {std::move(Types), MinBitWidth, Sym, true};

};

ParsedSym PS = ParseSym(Sym);

if (!std::get<3>(PS)) // Success flag.

return None;

auto DoSequenceCast = [&](RangeSet RS,

const NestedCastTypes &Types) -> RangeSet {

if (Types.size() == 1)

return RangeFactory.castTo(RS, Types.front());

auto TypesReversed = llvm::make_range(Types.rbegin(), Types.rend());

for (const QualType T : TypesReversed)

RS = RangeFactory.castTo(RS, T);

return RS;

};

auto findClosestRange = [&](SymbolRef Sym,

BitWidthType BW) -> const RangeSet * {

const ClassMap *CM = State->get<SymClassMap>(Sym);

if (!CM)

return nullptr;

EquivalenceClass EC;

bool Found = false;

if (const EquivalenceClass *ECPtr = CM->lookup(BW)) {

Found = true;

EC = *ECPtr;

} else {

BitWidthType ClosestBiggerBW = std::numeric_limits<BitWidthType>::max();

for (ClassMap::value_type &V : *CM) {

if (V.first >= BW) {

if (V.first < ClosestBiggerBW) {

ClosestBiggerBW = V.first;

EC = V.second;

}

Found = (ClosestBiggerBW != std::numeric_limits<BitWidthType>::max());

}

if (!Found)

return nullptr;

if (const RangeSet *RS = State->get<ConstraintRange>(EC))

return RS;

return nullptr;

};

if (const RangeSet *RS = findClosestRange(std::get<2>(PS), std::get<1>(PS)))

return DoSequenceCast(*RS, std::get<0>(PS));

RangeSet RS = infer(std::get<2>(PS)->getType());

RS = DoSequenceCast(RS, std::get<0>(PS));

return RS;

}

BasicValueFactory &ValueFactory; BasicValueFactory &ValueFactory;

RangeSet::Factory &RangeFactory; RangeSet::Factory &RangeFactory;

ProgramStateRef State; ProgramStateRef State;

}; };

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Range-based reasoning about symbolic operations // Range-based reasoning about symbolic operations

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 406 Lines • ▼ Show 20 Lines

private: private:

ConstraintAssignor(ProgramStateRef State, SValBuilder &Builder, ConstraintAssignor(ProgramStateRef State, SValBuilder &Builder,

RangeSet::Factory &F) RangeSet::Factory &F)

: State(State), Builder(Builder), RangeFactory(F) {} : State(State), Builder(Builder), RangeFactory(F) {}

using Base = ConstraintAssignorBase<ConstraintAssignor>; using Base = ConstraintAssignorBase<ConstraintAssignor>;

/// Base method for handling new constraints for symbols. /// Base method for handling new constraints for symbols.

[[nodiscard]] ProgramStateRef assign(SymbolRef Sym, RangeSet NewConstraint) { [[nodiscard]] ProgramStateRef assign(SymbolRef Sym, RangeSet NewConstraint) {

// All constraints are actually associated with equivalence classes, and // All constraints are actually associated with equivalence classes, and

// that's what we are going to do first. // that's what we are going to do first.

ClassMap::Factory &CMF = State->get_context<ClassMap>(); ClassMap::Factory &CMF = State->get_context<ClassMap>();

MinBitWidthAndSymbolRoot MBWSR = parseSymbolCast(State, Sym); MinBitWidthAndSymbolRoot MBWSR = parseSymbolCast(State, Sym);

const ClassMap *CMPtr = State->get<SymClassMap>(MBWSR.second); const ClassMap *CMPtr = State->get<SymClassMap>(MBWSR.second);

vsavchenkoUnsubmitted

Done

That's not using ConstraintAssignor, you simply put your implementation in here. That won't do!

vsavchenko: That's not using `ConstraintAssignor`, you simply put your implementation in here. That won't…

ASDenysPetrovAuthorUnsubmitted

Done

OK, please tell me how to use it correctly in my case.

ASDenysPetrov: OK, please tell me how to use it correctly in my case.

vsavchenkoUnsubmitted

Done

Can you read the comments first and then ask me if you have any specific questions?

vsavchenko: Can you read the comments first and then ask me if you have any specific questions?

ASDenysPetrovAuthorUnsubmitted

Done

I think I did it. Could you please review the changes?

ASDenysPetrov: I think I did it. Could you please review the changes?

const EquivalenceClass *ECPtr = const EquivalenceClass *ECPtr =

CMPtr ? CMPtr->lookup(MBWSR.first) : nullptr; CMPtr ? CMPtr->lookup(MBWSR.first) : nullptr;

ClassMap CM = CMPtr ? *CMPtr : CMF.getEmptyMap(); ClassMap CM = CMPtr ? *CMPtr : CMF.getEmptyMap();

EquivalenceClass EC; EquivalenceClass EC;

if (!CMPtr || !ECPtr) { if (!CMPtr || !ECPtr) {

EC = EquivalenceClass{Sym}; EC = EquivalenceClass{Sym};

CM = CMF.add(CM, MBWSR.first, EC); CM = CMF.add(CM, MBWSR.first, EC);

State = State->set<SymClassMap>(MBWSR.second, CM); State = State->set<SymClassMap>(MBWSR.second, CM);

Show All 9 Lines [[nodiscard]] ProgramStateRef assign(SymbolRef Sym, RangeSet NewConstraint) {

// constraint. // constraint.

Base::assign(Sym, NewConstraint); Base::assign(Sym, NewConstraint);

return State; return State;

} }

/// Base method for handling new constraints for classes. /// Base method for handling new constraints for classes.

[[nodiscard]] ProgramStateRef assign(EquivalenceClass Class, [[nodiscard]] ProgramStateRef assign(EquivalenceClass Class,

RangeSet NewConstraint) { RangeSet NewConstraint) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

if (Opts.ShouldSupportSymbolicIntegerCasts)

if (!intersectWithExistingIntegralCastSymbol(State, Class, NewConstraint))

return nullptr;

// There is a chance that we might need to update constraints for the // There is a chance that we might need to update constraints for the

// classes that are known to be disequal to Class. // classes that are known to be disequal to Class.

// //

// In order for this to be even possible, the new constraint should // In order for this to be even possible, the new constraint should

// be simply a constant because we can't reason about range disequalities. // be simply a constant because we can't reason about range disequalities.

if (const llvm::APSInt *Point = NewConstraint.getConcreteValue()) { if (const llvm::APSInt *Point = NewConstraint.getConcreteValue()) {

ConstraintRangeTy Constraints = State->get<ConstraintRange>(); ConstraintRangeTy Constraints = State->get<ConstraintRange>();

Show All 19 Lines if (const llvm::APSInt *Point = NewConstraint.getConcreteValue()) {

"a state with infeasible constraints"); "a state with infeasible constraints");

return setConstraints(State, Constraints); return setConstraints(State, Constraints);

} }

return setConstraint(State, Class, NewConstraint); return setConstraint(State, Class, NewConstraint);

} }

bool intersectWithExistingIntegralCastSymbol(ProgramStateRef State,

EquivalenceClass Class,

RangeSet NewConstraint) {

std::map<EquivalenceClass, BitWidthType> M;

SymbolSet SS = Class.getClassMembers(State);

BitWidthType BW = std::numeric_limits<BitWidthType>::max();

for (SymbolRef S : SS) {

MinBitWidthAndSymbolRoot MBWSR = parseSymbolCast(State, S);

BW = std::min(BW, MBWSR.first);

if (const ClassMap *CM = State->get<SymClassMap>(MBWSR.second))

for (const ClassMap::value_type &BWEC : *CM) {

auto it = M.find(BWEC.second);

if (it != M.end())

it->second = std::min(BWEC.first, it->second);

else

M.emplace_hint(it, BWEC.second, BWEC.first);

}

for (const auto &P : M)

if (const RangeSet *RSPtr = getConstraint(State, P.first)) {

const APSIntType Ty{std::min(BW, P.second), true};

RangeSet RS1 = RangeFactory.castTo(*RSPtr, Ty);

RangeSet RS2 = RangeFactory.castTo(NewConstraint, Ty);

RS1 = RangeFactory.intersect(RS1, RS2);

if (RS1.isEmpty())

return false;

}

return true;

}

ProgramStateRef trackDisequality(ProgramStateRef State, SymbolRef LHS, ProgramStateRef trackDisequality(ProgramStateRef State, SymbolRef LHS,

SymbolRef RHS) { SymbolRef RHS) {

return EquivalenceClass::markDisequal(RangeFactory, State, LHS, RHS); return EquivalenceClass::markDisequal(RangeFactory, State, LHS, RHS);

} }

ProgramStateRef trackEquality(ProgramStateRef State, SymbolRef LHS, ProgramStateRef trackEquality(ProgramStateRef State, SymbolRef LHS,

SymbolRef RHS) { SymbolRef RHS) {

return EquivalenceClass::merge(RangeFactory, State, LHS, RHS); return EquivalenceClass::merge(RangeFactory, State, LHS, RHS);

Show All 15 Lines private:

SValBuilder &Builder; SValBuilder &Builder;

RangeSet::Factory &RangeFactory; RangeSet::Factory &RangeFactory;

}; };

bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym,

const llvm::APSInt &Constraint) { const llvm::APSInt &Constraint) {

llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses; llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;

// Iterate over all equivalence classes and try to simplify them. // Iterate over all equivalence classes and try to simplify them.

martongUnsubmitted

Done

Could you please rebase? The "simplification" code part had been merged already to llvm/main and it is not part of this change.

martong: Could you please rebase? The "simplification" code part had been merged already to llvm/main…

ClassMembersTy Members = State->get<ClassMembers>(); ClassMembersTy Members = State->get<ClassMembers>();

for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) { for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {

EquivalenceClass Class = ClassToSymbolSet.first; EquivalenceClass Class = ClassToSymbolSet.first;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class); State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

if (!State) if (!State)

return false; return false;

SimplifiedClasses.insert(Class); SimplifiedClasses.insert(Class);

} }

// Trivial equivalence classes (those that have only one symbol member) are // Trivial equivalence classes (those that have only one symbol member) are

// not stored in the State. Thus, we must skim through the constraints as // not stored in the State. Thus, we must skim through the constraints as

// well. And we try to simplify symbols in the constraints. // well. And we try to simplify symbols in the constraints.

ConstraintRangeTy Constraints = State->get<ConstraintRange>(); ConstraintRangeTy Constraints = State->get<ConstraintRange>();

for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) { for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {

EquivalenceClass Class = ClassConstraint.first; EquivalenceClass Class = ClassConstraint.first;

if (SimplifiedClasses.count(Class)) // Already simplified. if (SimplifiedClasses.count(Class)) // Already simplified.

continue; continue;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class); State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

if (!State) if (!State)

return false; return false;

} }

// We may have trivial equivalence classes in the disequality info as // We may have trivial equivalence classes in the disequality info as

// well, and we need to simplify them. // well, and we need to simplify them.

DisequalityMapTy DisequalityInfo = State->get<DisequalityMap>(); DisequalityMapTy DisequalityInfo = State->get<DisequalityMap>();

for (std::pair<EquivalenceClass, ClassSet> DisequalityEntry : for (std::pair<EquivalenceClass, ClassSet> DisequalityEntry :

DisequalityInfo) { DisequalityInfo) {

EquivalenceClass Class = DisequalityEntry.first; EquivalenceClass Class = DisequalityEntry.first;

ClassSet DisequalClasses = DisequalityEntry.second; ClassSet DisequalClasses = DisequalityEntry.second;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class); State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

if (!State) if (!State)

return false; return false;

} }

return true; return true;

} }

bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym, bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,

RangeSet Constraint) { RangeSet Constraint) {

martongUnsubmitted

Done

I think this hunk should remain in assignSymExprToConst. Why did you move it?

martong: I think this hunk should remain in `assignSymExprToConst`. Why did you move it?

ASDenysPetrovAuthorUnsubmitted

Done

I'll remove. It's unrelated one.

ASDenysPetrov: I'll remove. It's unrelated one.

if (!handleRemainderOp(Sym, Constraint)) if (!handleRemainderOp(Sym, Constraint))

return false; return false;

Optional<bool> ConstraintAsBool = interpreteAsBool(Constraint); Optional<bool> ConstraintAsBool = interpreteAsBool(Constraint);

if (!ConstraintAsBool) if (!ConstraintAsBool)

return true; return true;

Show All 12 Lines if (Optional<bool> Equality = meansEquality(Sym)) {

if (!State) if (!State)

return false; return false;

} }

return true; return true;

} }

} // end anonymous namespace } // end anonymous namespace

martongUnsubmitted

Done

/// Return a symbol which is the best canidate to save it in the constraint

- /// map. We should correct symbol because in case of truncation cast we can

+ /// map. We should correct the symbol because in case of truncation cast we can

/// only reason about truncated bytes but not the whole value. E.g. (char)(int

martong:

std::unique_ptr<ConstraintManager> std::unique_ptr<ConstraintManager>

ento::CreateRangeConstraintManager(ProgramStateManager &StMgr, ento::CreateRangeConstraintManager(ProgramStateManager &StMgr,

ExprEngine *Eng) { ExprEngine *Eng) {

martongUnsubmitted

Done

/// x), we can store constraints for the first lower byte but we still don't

- /// know the root value. Also in case of promotion or converion we should

+ /// know the root value. Also, in case of promotion or conversion we should

/// store the root value instead of cast symbol, because we can always get

martong:

return std::make_unique<RangeConstraintManager>(Eng, StMgr.getSValBuilder()); return std::make_unique<RangeConstraintManager>(Eng, StMgr.getSValBuilder());

martongUnsubmitted

Done

/// know the root value. Also in case of promotion or converion we should

- /// store the root value instead of cast symbol, because we can always get

+ /// store the root value instead of the cast symbol because we can always get

/// a correct range using `castTo` metho. And we are not intrested in any

martong:

} }

martongUnsubmitted

Done

/// store the root value instead of cast symbol, because we can always get

- /// a correct range using `castTo` metho. And we are not intrested in any

+ /// a correct range using the `castTo` method. And we are not interested in any

/// constraints of cast symbol but the root symbol in `if` expression

martong:

ConstraintMap ento::getConstraintMap(ProgramStateRef State) { ConstraintMap ento::getConstraintMap(ProgramStateRef State) {

ConstraintMap::Factory &F = State->get_context<ConstraintMap>(); ConstraintMap::Factory &F = State->get_context<ConstraintMap>();

ConstraintMap Result = F.getEmptyMap(); ConstraintMap Result = F.getEmptyMap();

martongUnsubmitted

Done

I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad.
I mean, we should simply store the constraint directly for the symbol as it is. And then only in VisitSymbolCast should we infer the proper value from the stored constraint (if we can).

(Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)

martong: I think, we should definitely store the constraints as they appear in the analyzed code. This…

ASDenysPetrovAuthorUnsubmitted

Done

I see what you mean. I thought about this. Here what I've faced with.

Let's say you meet (wchar_t)x > 0, which you store like a pair {(wchar_t)x, [1,32767]}.
Then you meet (short)x < 0, where you have to answer whether it's true or false.
So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient.

What I propose is to "canonize" arbitrary types to a general form where this form could be a part of key along with x and we could get a constraint with a classic map complexity. So that:

You meet (wchar_t)x > 0, which you convert wchar_t to int16 and store like a pair {(int16)x, [1,32767]}.
Then you meet (short)x < 0, where you convert short to int16 and get a constraint.

That's why I've introduced NominalTypeList.
But now I admited your concern about arbitrary size of integers and will redesign my solution.

ASDenysPetrov: I see what you mean. I thought about this. Here what I've faced with. # Let's say you meet `…

martongUnsubmitted

Done

So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient.

I don't think we need a brute force search among the constraints if we have the additional CastMap (that I have previously proposed).
So, the next step would be: Lookup the root symbol of (short)x that is (wchar_t)x (supposing we have seen a declaration wchar_t x; first during the symbolic execution, but having a different root might work out as well).
Then from the CastMap we can query in O(1) the set of the related cast symbols of the root symbol. From this set it takes to query the constraints for each of the members from the existing equivalneClass->constraint mapping. That's O(1) times the number of the members of the cast set (which is assumed to be very few in the usual case).

martong: > So would be your next step? Brute forcing all the constraint pairs to find any x-related…

ConstraintRangeTy Constraints = State->get<ConstraintRange>(); ConstraintRangeTy Constraints = State->get<ConstraintRange>();

for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) { for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {

EquivalenceClass Class = ClassConstraint.first; EquivalenceClass Class = ClassConstraint.first;

SymbolSet ClassMembers = Class.getClassMembers(State); SymbolSet ClassMembers = Class.getClassMembers(State);

assert(!ClassMembers.isEmpty() && assert(!ClassMembers.isEmpty() &&

"Class must always have at least one member!"); "Class must always have at least one member!");

Show All 11 Lines

LLVM_DUMP_METHOD void EquivalenceClass::dumpToStream(ProgramStateRef State, LLVM_DUMP_METHOD void EquivalenceClass::dumpToStream(ProgramStateRef State,

raw_ostream &os) const { raw_ostream &os) const {

SymbolSet ClassMembers = getClassMembers(State); SymbolSet ClassMembers = getClassMembers(State);

for (const SymbolRef &MemberSym : ClassMembers) { for (const SymbolRef &MemberSym : ClassMembers) {

MemberSym->dump(); MemberSym->dump();

os << "\n"; os << "\n";

} }

inline EquivalenceClass EquivalenceClass::find(ProgramStateRef State, inline EquivalenceClass EquivalenceClass::find(ProgramStateRef State,

SymbolRef Sym) { SymbolRef Sym) {

assert(State && "State should not be null"); assert(State && "State should not be null");

martongUnsubmitted

Done

Same here.

martong: Same here.

martongUnsubmitted

Done

Please disregard the comment above.

martong: Please disregard the comment above.

assert(Sym && "Symbol should not be null"); assert(Sym && "Symbol should not be null");

martongUnsubmitted

Done

if (IsTruncated) {

- // Trancation occurred. High bits lost. We can't reason about ranges of

+ // Truncation occurred. High bits lost. We can't reason about ranges of

// the original(root) operand in this case, so we should not add it to the

martong:

MinBitWidthAndSymbolRoot MBWSR = parseSymbolCast(State, Sym); MinBitWidthAndSymbolRoot MBWSR = parseSymbolCast(State, Sym);

// We store far from all Symbol -> Class mappings // We store far from all Symbol -> Class mappings

if (const ClassMap *CM = State->get<SymClassMap>(MBWSR.second)) if (const ClassMap *CM = State->get<SymClassMap>(MBWSR.second))

if (const EquivalenceClass *EC = CM->lookup(MBWSR.first)) if (const EquivalenceClass *EC = CM->lookup(MBWSR.first))

return *EC; return *EC;

// This is a trivial class of Sym. // This is a trivial class of Sym.

return Sym; return Sym;

} }

inline ProgramStateRef EquivalenceClass::merge(RangeSet::Factory &F, inline ProgramStateRef EquivalenceClass::merge(RangeSet::Factory &F,

ProgramStateRef State, ProgramStateRef State,

SymbolRef First, SymbolRef First,

SymbolRef Second) { SymbolRef Second) {

EquivalenceClass FirstClass = find(State, First); EquivalenceClass FirstClass = find(State, First);

EquivalenceClass SecondClass = find(State, Second); EquivalenceClass SecondClass = find(State, Second);

return FirstClass.merge(F, State, SecondClass); return FirstClass.merge(F, State, SecondClass);

} }

inline ProgramStateRef EquivalenceClass::merge(RangeSet::Factory &F, inline ProgramStateRef EquivalenceClass::merge(RangeSet::Factory &F,

ProgramStateRef State, ProgramStateRef State,

EquivalenceClass Other) { EquivalenceClass Other) {

// It is already the same class. // It is already the same class.

if (*this == Other) if (*this == Other)

return State; return State;

// FIXME: As of now, we support only equivalence classes of the same type. // FIXME: As of now, we support only equivalence classes of the same type.

martongUnsubmitted

Done

Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g 0 or in case of the true rangeSet {[MIN, -1], [1, MAX]}).

martong: Instead of a noop we should be more conservative in this case. We should invalidate (remove)…

ASDenysPetrovAuthorUnsubmitted

Done

No, it's incorrect. Consider next:

int x;
if(x > 1000000 || x < 100000) 
  return;
// x (100'000, 1000'000) 
if((int8)x != 42) 
  return;
// x (100'000, 1000'000) && (int8)x (42, 42)

We can't just remove or invalidate x (100'000, 1000'000) because this range will still stay true.
Strictly speaking x range should be updated with values 100394, 102442, 108586, ...,, 960554 and any other value within the range which has its lowest byte equals to 42.
We can't just update the RangeSet with such a big amount of values due to performance issues. So we just assume it as less accurate.

ASDenysPetrov: No, it's incorrect. Consider next: ``` int x; if(x > 1000000 || x < 100000) return; // x…

martongUnsubmitted

Done

Okay, this makes perfect sense, thanks for the example!

martong: Okay, this makes perfect sense, thanks for the example!

// This limitation is connected to the lack of explicit casts in // This limitation is connected to the lack of explicit casts in

// our symbolic expression model. // our symbolic expression model.

// //

// That means that for `int x` and `char y` we don't distinguish // That means that for `int x` and `char y` we don't distinguish

// between these two very different cases: // between these two very different cases:

// * `x == y` // * `x == y`

// * `(char)x == y` // * `(char)x == y`

// //

▲ Show 20 Lines • Show All 696 Lines • ▼ Show 20 Lines

// The syntax for ranges below is mathematical, using [x, y] for closed ranges // The syntax for ranges below is mathematical, using [x, y] for closed ranges

// and (x, y) for open ranges. These ranges are modular, corresponding with // and (x, y) for open ranges. These ranges are modular, corresponding with

// a common treatment of C integer overflow. This means that these methods // a common treatment of C integer overflow. This means that these methods

// do not have to worry about overflow; RangeSet::Intersect can handle such a // do not have to worry about overflow; RangeSet::Intersect can handle such a

// "wraparound" range. // "wraparound" range.

// As an example, the range [UINT_MAX-1, 3) contains five values: UINT_MAX-1, // As an example, the range [UINT_MAX-1, 3) contains five values: UINT_MAX-1,

// UINT_MAX, 0, 1, and 2. // UINT_MAX, 0, 1, and 2.

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymNE(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymNE(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within) if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)

return St; return St;

llvm::APSInt Point = AdjustmentType.convert(Int) - Adjustment; llvm::APSInt Point = AdjustmentType.convert(Int) - Adjustment;

RangeSet New = getRange(St, Sym); RangeSet New = getRange(St, Sym);

New = F.deletePoint(New, Point); New = F.deletePoint(New, Point);

return setRange(St, Sym, New); return setRange(St, Sym, New);

} }

vsavchenkoUnsubmitted

Done

No, please, remove duplication by putting it inside of the constraint assignor. It is designed specifically so we don't duplicate code around assumeSymXX functions.

vsavchenko: No, please, remove duplication by putting it inside of the constraint assignor. It is designed…

ASDenysPetrovAuthorUnsubmitted

Done

+1. That's what I've recently thought about. :)

ASDenysPetrov: +1. That's what I've recently thought about. :)

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within) if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)

return nullptr; return nullptr;

// [Int-Adjustment, Int-Adjustment] // [Int-Adjustment, Int-Adjustment]

llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment; llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment;

RangeSet New = getRange(St, Sym); RangeSet New = getRange(St, Sym);

New = F.intersect(New, AdjInt); New = F.intersect(New, AdjInt);

return setRange(St, Sym, New); return setRange(St, Sym, New);

} }

RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St, RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St,

SymbolRef Sym, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

switch (AdjustmentType.testInRange(Int, true)) { switch (AdjustmentType.testInRange(Int, true)) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

return F.getEmptySet(); return F.getEmptySet();

case APSIntType::RTR_Within: case APSIntType::RTR_Within:

break; break;

case APSIntType::RTR_Above: case APSIntType::RTR_Above:

return getRange(St, Sym); return getRange(St, Sym);

} }

// Special case for Int == Min. This is always false. // Special case for Int == Min. This is always false.

llvm::APSInt ComparisonVal = AdjustmentType.convert(Int); llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);

llvm::APSInt Min = AdjustmentType.getMinValue(); llvm::APSInt Min = AdjustmentType.getMinValue();

if (ComparisonVal == Min) if (ComparisonVal == Min)

return F.getEmptySet(); return F.getEmptySet();

llvm::APSInt Lower = Min - Adjustment; llvm::APSInt Lower = Min - Adjustment;

llvm::APSInt Upper = ComparisonVal - Adjustment; llvm::APSInt Upper = ComparisonVal - Adjustment;

--Upper; --Upper;

vsavchenkoUnsubmitted

Done

What's the point of this when you do reverse operation in Inferrer?

As far as I understood, in VisitSymbolCast, you iterate over larger types and see if the same symbol was casted to any of those, and if yes you truncate the result and use that range.
Here, when we are about to set the constraint for a casted symbol, you iterate over smaller types, truncate this range for a smaller type, construct a cast to that smaller type, and add constraint for that symbol as well.

So, if this is correct, these two pieces of code DO THE SAME WORK and ONLY ONE should remain.

vsavchenko: What's the point of this when you do reverse operation in `Inferrer`? As far as I understood…

RangeSet Result = getRange(St, Sym); RangeSet Result = getRange(St, Sym);

return F.intersect(Result, Lower, Upper); return F.intersect(Result, Lower, Upper);

vsavchenkoUnsubmitted

Done

I need more explanation why we have this function and why we call it where we call it. Additionally, it again looks like it belongs in a separate patch.

vsavchenko: I need more explanation why we have this function and why we call it where we call it.

} }

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

RangeSet New = getSymLTRange(St, Sym, Int, Adjustment); RangeSet New = getSymLTRange(St, Sym, Int, Adjustment);

return setRange(St, Sym, New); return setRange(St, Sym, New);

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines RangeSet RangeConstraintManager::getSymGERange(ProgramStateRef St,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

switch (AdjustmentType.testInRange(Int, true)) { switch (AdjustmentType.testInRange(Int, true)) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

return getRange(St, Sym); return getRange(St, Sym);

case APSIntType::RTR_Within: case APSIntType::RTR_Within:

break; break;

case APSIntType::RTR_Above: case APSIntType::RTR_Above:

return F.getEmptySet(); return F.getEmptySet();

vsavchenkoUnsubmitted

Done

OK, but I still don't understand one thing.
Here you go over all "smaller" types and artificially create constraints for them, and at the same time in VisitSymbolCast you do the opposite operation? Why? Shouldn't the map have constraints for smaller types already because of this action? Why do we need to do both?

vsavchenko: OK, but I still don't understand one thing. Here you go over all "smaller" types and…

ASDenysPetrovAuthorUnsubmitted

Done

I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.

ASDenysPetrov: I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.

ASDenysPetrovAuthorUnsubmitted

Done

I've fixed RangeConstraintManager::updateExistingConstraints. There was a mistake when I update smaller types from the root symbol, but correct symbol is the given symbol which is before calling ignoreCast().
May be now it would be more clear for you.

ASDenysPetrov: I've fixed `RangeConstraintManager::updateExistingConstraints`. There was a mistake when I…

} }

// Special case for Int == Min. This is always feasible. // Special case for Int == Min. This is always feasible.

vsavchenkoUnsubmitted

Done

This looks like a pattern and we should probably make into a method of SymbolCast

vsavchenko: This looks like a pattern and we should probably make into a method of `SymbolCast`

ASDenysPetrovAuthorUnsubmitted

Done

I did it :) but refused. It will just turn into:

if (isa<SymbolCast>(Sym))
  Sym = cast<SymbolCast>(Sym)->getRootOperand();

It looks pretty the same and brings no benefit IMO, does it?
Every time I used getRootOperand I also needed some additional traverse through the types te get some another information, so I couldn't avoid the while loop there. So I decided not to introduce a new method in SymbolCast.

ASDenysPetrov: I did it :) but refused. It will just turn into: ``` if (isa<SymbolCast>(Sym)) Sym =…

vsavchenkoUnsubmitted

Done

Aha, I see your point. I guess we can take it into SymExpr and call it not getRootOperand, which won't tell much to a person reading the name, but something like ignoreCasts. It will fit well with Expr::IgnoreCasts, Expr::IgnoreParens, etc.

vsavchenko: Aha, I see your point. I guess we can take it into `SymExpr` and call it not `getRootOperand`…

ASDenysPetrovAuthorUnsubmitted

Done

Nice idea! True, getRootOperand would only tell enough to user in scope of SymbolCast. I'll try to implement this in the next update.

ASDenysPetrov: Nice idea! True, `getRootOperand` would only tell enough to user in scope of `SymbolCast`. I'll…

llvm::APSInt ComparisonVal = AdjustmentType.convert(Int); llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);

llvm::APSInt Min = AdjustmentType.getMinValue(); llvm::APSInt Min = AdjustmentType.getMinValue();

if (ComparisonVal == Min) if (ComparisonVal == Min)

return getRange(St, Sym); return getRange(St, Sym);

llvm::APSInt Max = AdjustmentType.getMaxValue(); llvm::APSInt Max = AdjustmentType.getMaxValue();

llvm::APSInt Lower = ComparisonVal - Adjustment; llvm::APSInt Lower = ComparisonVal - Adjustment;

llvm::APSInt Upper = Max - Adjustment; llvm::APSInt Upper = Max - Adjustment;

▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

clang/test/Analysis/svalbuilder-casts.cpp

Show All 33 Lines	void test1(int x, int y) {
static_assert((short)65536 == 0, "");		static_assert((short)65536 == 0, "");
static_assert((short)-65536 == 0, "");		static_assert((short)-65536 == 0, "");
static_assert((short)131072 == 0, "");		static_assert((short)131072 == 0, "");
static_assert((short)-131072 == 0, "");		static_assert((short)-131072 == 0, "");
clang_analyzer_eval(x == 0); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == 0); // expected-warning{{UNKNOWN}}

// These are not truncated to short as zero.		// These are not truncated to short as zero.
static_assert((short)1 != 0, "");		static_assert((short)1 != 0, "");
clang_analyzer_eval(x == 1); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == 1); // expected-warning{{FALSE}}
static_assert((short)-1 != 0, "");		static_assert((short)-1 != 0, "");
clang_analyzer_eval(x == -1); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == -1); // expected-warning{{FALSE}}
static_assert((short)65537 != 0, "");		static_assert((short)65537 != 0, "");
clang_analyzer_eval(x == 65537); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == 65537); // expected-warning{{FALSE}}
static_assert((short)-65537 != 0, "");		static_assert((short)-65537 != 0, "");
clang_analyzer_eval(x == -65537); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == -65537); // expected-warning{{FALSE}}
static_assert((short)131073 != 0, "");		static_assert((short)131073 != 0, "");
clang_analyzer_eval(x == 131073); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == 131073); // expected-warning{{FALSE}}
static_assert((short)-131073 != 0, "");		static_assert((short)-131073 != 0, "");
clang_analyzer_eval(x == -131073); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(x == -131073); // expected-warning{{FALSE}}

// Check for implicit cast.		// Check for implicit cast.
short s = y;		short s = y;
assert(s == 0);		assert(s == 0);
clang_analyzer_eval(y == 0); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(y == 0); // expected-warning{{UNKNOWN}}
}		}

clang/test/Analysis/symbol-integral-cast.cpp

This file was added.

// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection -analyzer-config eagerly-assume=false -analyzer-config support-symbolic-integer-casts=true -verify %s

template <typename T>

void clang_analyzer_eval(T);

void clang_analyzer_warnIfReached();

template<typename T>

void clang_analyzer_value(T x);

typedef short int16_t;

typedef int int32_t;

typedef unsigned short uint16_t;

typedef unsigned int uint32_t;

void test1(int x) {

// Even if two lower bytes of `x` equal to zero, it doesn't mean that

// the entire `x` is zero. We are not able to know the exact value of x.

// It can be one of 65536 possible values like [0, 65536, 131072, ...]

// and so on. To avoid huge range sets we still assume `x` in the range

// [INT_MIN, INT_MAX].

if (!(short)x) {

if (!x) {

clang_analyzer_value((short)x); // expected-warning {{16s:0}}

clang_analyzer_value(x); // expected-warning {{32s:0}}

} else {

// FIXME: Shall be simplified to ConcreteInt 16s:0.

clang_analyzer_value((short)x); // expected-warning {{16s:{ [0, 0] }}}

clang_analyzer_value(x); // expected-warning {{32s:{ [-2147483648, -1], [1, 2147483647] }}}

}

void test2(int x) {

martongUnsubmitted

Done

if (!s) {

- if (x == 65537)

+ if (x == 65537 || x == 131073)

clang_analyzer_warnIfReached(); // no-warning

martong:

// If two lower bytes of `x` equal to zero, and we know x to be 65537,

// which is not truncated to short as zero. Thus the branch is infisible.

short s = x;

if (!s) {

if (x == 65537) {

martongUnsubmitted

Not Done

These two tests are redundant because they are handled by the Parent patch I've just created. https://reviews.llvm.org/D126481

martong: These two tests are redundant because they are handled by the Parent patch I've just created.

clang_analyzer_warnIfReached(); // no-warning

} else {

clang_analyzer_value(s); // expected-warning {{16s:0}}

clang_analyzer_value(x); // expected-warning {{32s:{ [-2147483648, 65536], [65538, 2147483647] }}}

}

void test3(int x, short s) {

s = x;

if ((short)x > -10 && s < 10) {

if (x > 0 && x < 10) {

// FIXME: If the range of the whole variable was constrained then reason

// again about truncated bytes to make the ranges more precise.

// Shall be 16s:{ [1, 9] }

clang_analyzer_value((short)x); // expected-warning {{16s:{ [-9, 9] }}}

clang_analyzer_value(x); // expected-warning {{32s:{ [1, 9] }}}

}

void test4(unsigned x) {

if ((char)x > 8) {

clang_analyzer_value((char)x); // expected-warning {{8s:{ [9, 127] }}}

if (x < 42) {

// FIXME: Update lower(less) bytes if higher(more) bytes are updated.

// Should be 8s:{ [9, 41] }.

clang_analyzer_value((char)x); // expected-warning {{8s:{ [9, 127] }}}

}

void test5(unsigned x) {

if ((char)x > -10 && (char)x < 10) {

if ((short)x == 8) {

clang_analyzer_value(x); // expected-warning {{32u:{ [0, 4294967295] }}}

clang_analyzer_value((short)x); // expected-warning {{16s:{ [8, 8] }}}

// FIXME: Update lower(less) bytes if higher(more) bytes are updated.

// Should be 8s:{ [8, 8] }.

clang_analyzer_value((char)x); // expected-warning {{8s:{ [-9, 9] }}}

}

void test6(int x) {

// Even if two lower bytes of `x` less than zero, it doesn't mean that `x`

// can't be greater than zero. Thence we don't change the native range of

// `x` and this branch is feasible.

if (x > 0)

if ((short)x < 0)

clang_analyzer_value(x); // expected-warning {{32s:{ [1, 2147483647] }}}

}

void test7(int x) {

// The range of two lower bytes of `x` [1, SHORT_MAX] is enough to cover

// all possible values of char [CHAR_MIN, CHAR_MAX]. So the lowest byte

// can be lower than zero.

if ((short)x > 0) {

if ((char)x < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test8(int x) {

// Promotion from `signed int` to `signed long long` also reasoning about the

// original range, because we know the fact that even after promotion it

// remains in the range [INT_MIN, INT_MAX].

if ((long long)x < 0)

clang_analyzer_value(x); // expected-warning {{32s:{ [-2147483648, -1] }}}

}

void test9(signed int x) {

// Any cast `signed` to `unsigned` produces an unsigned range, which is

// [0, UNSIGNED_MAX] and can not be lower than zero.

if ((unsigned long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned int)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned short)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned char)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test10(unsigned int x, signed char sc) {

// Promotion from `unsigned` to `signed` produces a signed range,

// which is able to cover all the values of the original,

// so that such cast is not lower than zero.

if ((signed long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

// Any other cast(conversion or truncation) from `unsigned` to `signed`

// produces a signed range, which is [SIGNED_MIN, SIGNED_MAX]

// and can be lower than zero.

if ((signed int)x < 0) // explicit cast

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

signed short ss = x; // initialization

if (ss < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

sc = x; // assignment

if (sc < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test11(unsigned int x) {

// Promotion from 'unsigned' to 'signed' entirely covers the original range.

// Thence such cast is not lower than zero and the `true` branch is

// infiseable. But it doesn't affect the original range, which still remains

// as [0, UNSIGNED_MAX].

if ((signed long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

// Any other cast(conversion or truncation) from `unsigned` to `signed`

// produces a signed range, which is [SIGNED_MIN, SIGNED_MAX]. But it doesn't

// affect the original range, which still remains as [0, UNSIGNED_MAX].

if ((signed int)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

if ((signed short)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

if ((signed char)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

}

void test12(int x, char c) {

if (x >= 5308) {

if (x <= 5419) {

// Truncation on assignment: int[5308, 5419] -> char[-68, 43]

c = x;

clang_analyzer_value(c); // expected-warning {{8s:{ [-68, 43] }}}

// Truncation on initializaion: int[5308, 5419] -> char[-68, 43]

char c1 = x;

clang_analyzer_value(c1); // expected-warning {{8s:{ [-68, 43] }}}

}

void test13(int x) {

if (x > 913440767 && x < 913440769) { // 0x36720000

// Truncation: int[913440768] -> short[0]

clang_analyzer_value((short)x); // expected-warning {{16s:0}}

short s = x;

clang_analyzer_value(s); // expected-warning {{16s:0}}

}

void test14(int x) {

if (x >= -1569193983 && x <= 578290016) {

// The big range of `x` covers all possible values of short.

// Truncation: int[-1569193983, 578290016] -> short[-32768, 32767]

if ((short)x > 0) {

clang_analyzer_value(x); // expected-warning {{32s:{ [-1569193983, 578290016] }}}

short s = x;

clang_analyzer_value(s); // expected-warning {{16s:{ [1, 32767] }}}

}

void test15(int x) {

if (x >= -1569193983 && x <= -1569193871) { // [0xA2780001, 0xA2780071]

// The small range of `x` covers only several values of short.

// Truncation: int[-1569193983, -1569193871] -> short[1, 113]

clang_analyzer_value(x); // expected-warning {{32s:{ [-1569193983, -1569193871] }}}

clang_analyzer_value((short)x); // expected-warning {{16s:{ [1, 113] }}}

}

void test16(char x) {

if (x < 0)

clang_analyzer_value(x); // expected-warning {{8s:{ [-128, -1] }}}

else

clang_analyzer_value(x); // expected-warning {{8s:{ [0, 127] }}}

}

void test17(char x) {

if (-11 <= x && x <= -10) {

unsigned u = x;

// Conversion: char[-11, -10] -> unsigned int[4294967285, 4294967286]

clang_analyzer_value(u); // expected-warning {{32u:{ [4294967285, 4294967286] }}}

unsigned short us = x;

// Conversion: char[-11, -10] -> unsigned short[65525, 65526]

clang_analyzer_value(us); // expected-warning {{16u:{ [65525, 65526] }}}

unsigned char uc = x;

// Conversion: char[-11, -10] -> unsigned char[245, 246]

clang_analyzer_value(uc); // expected-warning {{8u:{ [245, 246] }}}

}

void test18(char c, short s, int i) {

// Any char value always is less then 1000.

int OneThousand = 1000;

c = i;

clang_analyzer_eval(c < OneThousand); // expected-warning {{TRUE}}

int MinusFourtyThousands = -40000;

s = i;

clang_analyzer_eval(s > MinusFourtyThousands); // expected-warning {{TRUE}}

}

void test19(char x, short y) {

if (-43 <= x && x <= -42) { // x[-42, -43]

y = 42;

clang_analyzer_eval(int16_t(x) < int16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < int32_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < int16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < int32_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < uint16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(int32_t(x) < uint16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < int16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < int32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < int16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < int32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < uint16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < uint16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

}

void test20(char x, short y) {

if (42 <= y && y <= 43) { // y[42, 43]

x = -42;