This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/StaticAnalyzer/
-
clang/
-
StaticAnalyzer/
-
Checkers/
-
SValExplainer.h
-
Core/PathSensitive/
-
PathSensitive/
-
SymExpr.h
-
lib/StaticAnalyzer/
-
StaticAnalyzer/
-
Checkers/
2/2
ExprInspectionChecker.cpp
-
Core/
43/56
RangeConstraintManager.cpp
-
SimpleSValBuilder.cpp
-
SymbolManager.cpp
-
test/Analysis/
-
Analysis/
1/2
symbol-integral-cast.cpp

Differential D103096

[analyzer] Implement cast for ranges of symbolic integers
Needs ReviewPublic

Authored by ASDenysPetrov on May 25 2021, 9:19 AM.

Download Raw Diff

Details

Reviewers

NoQ
vsavchenko
steakhal
martong
dcoughlin
baloghadamsoftware

Summary

Support integral cast for ranges of symbolic integers. Previously we only support integral cast for concrete integers.
Reason about the ranges of SymbolCast expressions. Apply truncations, promotions and conversions to get a correct range set using nested types of a SymbolCast.

Fixes: https://github.com/llvm/llvm-project/issues/50380

The solution

Create a map which contains a bitwidth as a key and a range set as a data. Call it CastMap.
CastMap = Map<uint32_t, RangeSet>

NOTE: LLVM-IR has the ability to represent integers with a bitwidth from 1 all the way to 16'777'215. See _ExtInt Feature for details.

NOTE: We don't care about certain signedness of RangeSet stored in CastMap. But the signedness of all stored RangeSet in the map shall be the same.

Create a map which contains a symbol as a key and CastMap as a data. Call it SymCastMap.
SymCastMap = Map<SymbolRef, CastMap>

Store and update SymCastMap for every SymbolCast and every SymExpr which represents an integer.
Use a root symbol of SymbolCast as a key of the map. E.g. for (int16)(uint8)(int32 x) root symbol is (int32 x).
For SymExpr use the symbol itself as a key of the map.

Getting a constraint

Get a key symbol from SymbolCast/SymExpr.
Get a CastMap of constraints from SymCastMap using a key symbol.
Find the smallest type of the given cast symbolic expression.
Find a RangeSet in the CastMap for equal or the first bigger than the bitwidth of the smallest type.
If no RangeSet was found, create a new full RangeSet for the smallest type.
Sequentially cast the RangeSet across the chain of types starting from the most inner one.

Pseudocode

GivenSymbol = (int16)(uint8)(int32 x)

RootSymbol = GetRoot(GivenSymbol) // (int32 x)
CastMap = GetCastMap(RootSymbol) // CastMap for (int32 x)
MinType = FindMinType(GivenSymbol) // uint8
MinBitwidth = BitwidthOf(MinType) // uint8

RangeSet = FindRange(CastMap , MinBitwidth)  // range for bitwidth of 8
if(!RangeSet)
  RangeSet = FindNextRange(CastMap, MinBitwidth) // range for bitwidth of 9+
if(!RangeSet)
  RangeSet = CreateRangeForType(MinType) // full range for uint8

CastChain = GetCastChain(GivenSymbol) // int32 -> uint8 -> int16
ResultRangeSet = RangeThroughCastChain(RangeSet, CastChain)

return ResultRangeSet

Setting a constraint

Get a key symbol from SymbolCast/SymExpr.
Get a map of constraints from SymCastMap using a key symbol.
Find the smallest type of the given cast symbolic expression.
Find and update all RangeSet's in the CastMap for bitwidths which are equal or lower than the bitwidth of the smallest type.
If there is no constraint for the bitwidth of the smallest type in the map, add a new entry with the given RangeSet.

Pseudocode

GivenRangeSet = [N, M]
GivenSymbol = (int16)(uint8)(int32 x)

RootSymbol = GetRoot(GivenSymbol) // (int32 x)
CastMap = GetCastMap(RootSymbol) // CastMap for (int32 x)
MinType = FindMinType(GivenSymbol) // uint8
MinBitwidth = BitwidthOf(MinType) // uint8

Bitwidth = MinBitwidth
while (Bitwidth > 0) // update all constraints which bitwidth is equal or lower then the minimal one
  RangeSet = FindRange(CastMap, Bitwidth)  // range for bitwidth of 8 and lower
  UpdateRange(CastMap, Bitwidth, RangeSet ∩ GivenRangeSet) //  intersect ranges and store the result back to the map
  Bitwidth--

if(!RangeExistsInMap(CastMap, MinBitwidth))
  AddRange(CastMap, MinBitwidth, GivenRangeSet)  // store the given range to the map

See tests in symbol-integral-cast.cpp for examples.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added subscribers: manas, dkrupp, donat.nagy and 6 others. · View Herald TranscriptMay 25 2021, 9:19 AM

ASDenysPetrov requested review of this revision.May 25 2021, 9:19 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptMay 25 2021, 9:19 AM

ASDenysPetrov added a parent revision: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.May 25 2021, 9:19 AM

Harbormaster completed remote builds in B106111: Diff 347702.May 25 2021, 9:31 AM

ASDenysPetrov mentioned this in D99797: [analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency.May 26 2021, 5:55 AM

ASDenysPetrov mentioned this in D97388: [analyzer] Replace StoreManager::evalIntegralCast with SValBuilder::evalCast.May 27 2021, 8:45 AM

It sounds like you indeed solved a lot of problems that prevented us from enabling SymbolCast. But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence. If it works out though, it might be the best thing to happen to the static analyzer in years.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths. People are often careless about integral types and it may lead to bugs which your patch helps uncover. But it is worthless to uncover these bugs if the user can't understand them. I'm thinking of scenarios like this:

01  void foo(long x) {
02    if (x == 0)
03      return;
04
05    bar(x, nullptr);
06  }
07
08  void bar(int y, int *p) {
09    if (y == 0)
10      *p = 1; // warning: null dereference
11  }

The user will discard this as false positive because "I checked for zero in foo(), it obviously can't be zero in bar() in this context". There needs to be a note that explains the implicit truncation of an interesting* symbol on line 5. Maybe even mention truncation on line 9 as well (not sure how to word that).

I also wonder if a lot of such reports will also be false positives simply because the presumption of potential overflow is baseless. On paper it looks like "if the user didn't want to pass large values, they'd just use int instead of long". But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values). There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

__
*In this case it's interesting as a control flow dependency of the bug location; it sounds like without @Szelethus's control flow dependency tracking this advancement would have been virtually impossible.

@NoQ
This solution only intends to make correct calculations whenever cast occures. We can mark this as alpha or add an argument flag to turn cast reasoning on/off, or we can even disable any part of this patch with argument settings.

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

If it works out though, it might be the best thing to happen to the static analyzer in years.

Thank you. I appreciate your possitive evaluation.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths.

My proposition is to design and describe a paper of:

what cases shall be considered as erroneous and be reported;
what cases shall be ignored or considered as exceptions (i.e. static_cast);
what wordings shall we use in reports;
how paths of those reports shall look like;
your options;

But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values).

IMO this is great when we tell user that he/she should make sure of the value bounds before passing the arguments to such APIs.

There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

For example, this test easily passes.

void test(int x) {
  assert(0 < x && x < 42);
  char c = x;
  clang_analyzer_eval(c <= 0); // expected-warning {{FALSE}}
  clang_analyzer_eval(c >= 42); // expected-warning {{FALSE}}
}

Or you meant some other cases?

__
*In this case it's interesting as a control flow dependency of the bug location; it sounds like without @Szelethus's control flow dependency tracking this advancement would have been virtually impossible.

Can this tracking mechanism be adjusted then?

In the end, should we go further with this patch and make more adjustments in CSA or reject it in view of your concerns?

ASDenysPetrov mentioned this in D103317: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree.Jun 2 2021, 3:47 AM

In D103096#2789439, @ASDenysPetrov wrote:

@NoQ
This solution only intends to make correct calculations whenever cast occures. We can mark this as alpha or add an argument flag to turn cast reasoning on/off, or we can even disable any part of this patch with argument settings.

That would be awesome. I think we should land this patch under an -analyzer-config flag. This will allow us to experiment with viability of enabling cast symbols at any time as we improve the constraint solver. Additionally, presence of cast symbols is extremely valuable for z3 runs in which our concerns for increased complexity are eliminated.

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

Test on a lot of real-world code. Like, seriously, *A LOT* of real-world code. Say, 20x the amount of code we have in docker and csa-testbench, from a large variety of projects. Investigate newly appeared reports carefully to understand the impact. I'll be happy to help with this at some point.

With SymbolCasts in place our equations become much more complicated and therefore the constraint solver becomes much more likely to produce false positives in cases where it previously erred on the side of false negatives.

Another thing to test is our ability to explain bug paths.

My proposition is to design and describe a paper of:

what cases shall be considered as erroneous and be reported;

what cases shall be ignored or considered as exceptions (i.e. static_cast);

what wordings shall we use in reports;

how paths of those reports shall look like;

your options;

I think we should start from examples. While testing on real-world code, find poorly explained reports and see what piece of information is missing and preventing the user from understanding the warning. Then add that piece of information.

But in practice there may be other reasons to use a larger integer type, such as API requirements (eg., how isascii() accepts an int but only uses 266 values).

IMO this is great when we tell user that he/she should make sure of the value bounds before passing the arguments to such APIs.

I'm thinking about warnings inside the implementations of such APIs.

There's also the usual problem of overflow being impossible specifically on the current path; in this case we have to make sure that an appropriate assert() would actually suppress the warning (i.e., the constraint solver would be able to correctly solve the assert condition as well).

For example, this test easily passes.
void test(int x) {
  assert(0 < x && x < 42);
  char c = x;
  clang_analyzer_eval(c <= 0); // expected-warning {{FALSE}}
  clang_analyzer_eval(c >= 42); // expected-warning {{FALSE}}
}
Or you meant some other cases?

I meant real-world examples. We should see if it works on real-world code where constraints are significantly more complex.

There is so much fancy stuff going on upstream. Awesome to see.
I'm trying to catch up ASAP, I'm finally done with my master's thesis.

In D103096#2798238, @NoQ wrote:

Additionally, presence of cast symbols is extremely valuable for z3 runs in which our concerns for increased complexity are eliminated.

You are probably referring to D85528. (Without that patch, Z3 refutation crashes all over the place due to the not modeled widening/narrowing casts.)

But this still requires massive testing, a lot more than a typical constraint solver patch; extraordinary claims require extraordinary evidence.

What kind of tests do you think we need?

Test on a lot of real-world code. Like, seriously, *A LOT* of real-world code. Say, 20x the amount of code we have in docker and csa-testbench, from a large variety of projects. Investigate newly appeared reports carefully to understand the impact. I'll be happy to help with this at some point.

The CSA-testbench is capable of using the Conan package manager.
By cloning the https://github.com/conan-io/conan-center-index you can get a bunch of Conan package recipes, with tests actually using the given library.
Running their tests would ensure that header-only libraries get analyzed as well as normal libraries. However, frequently used packages would be analyzed over and over again, similarly to how headers suffer from this.

We planned to make use of this in the future to make CTU analysis and Z3 refutation more and more robust.
As soon as we have the infrastructure and scale, we plan to enable a small set of contributors of initiating such measurements, but don't expect it in the close future.

Added a boolean option handle-integral-cast-for-ranges under -analyzer-config flag. Disabled the feature by default.

@NoQ, @steakhal
How do you think whether it's neccesory to add any changes in SMTConstraintManager in scope of this patch?

Harbormaster completed remote builds in B108193: Diff 350579.Jun 8 2021, 4:50 AM

What about this patch?

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
931–940	That's definitely regresses the interface, so `NominalTypeList` should be definitely reworked.
1190–1217	This looks like a very `static` data structure to me, I don't see any reasons why the user should be able to create multiple copies of it. If it becomes a static data-structure, there will be no need in passing it around.
1266–1302	I think all this extra logic about how we infer ranges for casts is interesting, but should be a separate patch. For now, you can simply put `return Visit(Sym->getOperand());`. First, it will unblock you from depending on that `RangeFactory` feature. And also have quite a few questions about this particular implementation, so it will stagger this patch as well.
3226–3285	I need more explanation why we have this function and why we call it where we call it. Additionally, it again looks like it belongs in a separate patch.

This revision now requires changes to proceed.Jun 16 2021, 5:34 AM

In D103096#2821750, @vsavchenko wrote:

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

Thanks for the tips. I'll adress them in the next update. Actually, I thought about splitting before the first upload and splitted it into D103094 and the current one. This particular patch provides full mechanism implementing feasibility of the test cases. Honestly, I don't know what part could be cut to keep this mechanism holistic and self-sufficient. But I'll see what i can do.

In D103096#2822965, @ASDenysPetrov wrote:

In D103096#2821750, @vsavchenko wrote:

Hey, great work! I think that casts are extremely important, but it looks like you mixed so many things into this patch. Let's make one step at a time a split it into (at least) a couple of patches.

Honestly, I don't know what part could be cut to keep this mechanism holistic and self-sufficient. But I'll see what i can do.

I know: solver part = separate patch.
As I said, introduce a very minimal support in solver (aka VisitSymbolCast in Inferrer) and that's it. All other algorithms, like looking for constraints for the same expression, but casted to larger type, logically belong in a separate where you actually start producing symbolic casts.

vsavchenko added inline comments.Jun 17 2021, 1:23 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3268–3282	What's the point of this when you do reverse operation in `Inferrer`? As far as I understood, in `VisitSymbolCast`, you iterate over larger types and see if the same symbol was casted to any of those, and if yes you truncate the result and use that range. Here, when we are about to set the constraint for a casted symbol, you iterate over smaller types, truncate this range for a smaller type, construct a cast to that smaller type, and add constraint for that symbol as well. So, if this is correct, these two pieces of code DO THE SAME WORK and ONLY ONE should remain.

ASDenysPetrov mentioned this in D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast.Jul 2 2021, 2:47 AM

Splitted this revision and moved SValBuilder related changes to separate patch D105340. Added detailed comments. Made NominalTypeList static and as a result removed the forwarding across the functions. Spread handleSymbolCast logic to three methods: modifySymbolAndConstraints, updateExistingConstraints, getProperSymbolAndConstraint.

ASDenysPetrov edited parent revisions, added: D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast; removed: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.Jul 6 2021, 3:47 AM

Harbormaster completed remote builds in B112574: Diff 356668.Jul 6 2021, 3:47 AM

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 6 2021, 3:48 AM

This is a very complicated patch, I think we'll have to iterate on it quite a lot.
Additionally, we have to be sure that this doesn't crash our performance.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190	Comments on: why do we need it? why does it have four types? why do we not care about signed/unsigned types?
3343–3345	OK, but I still don't understand one thing. Here you go over all "smaller" types and artificially create constraints for them, and at the same time in `VisitSymbolCast` you do the opposite operation? Why? Shouldn't the map have constraints for smaller types already because of this action? Why do we need to do both?
3347–3348	This looks like a pattern and we should probably make into a method of `SymbolCast`

I found some issues. Working on improvement.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3343–3345	I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.
3347–3348	I did it :) but refused. It will just turn into: if (isa<SymbolCast>(Sym)) Sym = cast<SymbolCast>(Sym)->getRootOperand(); It looks pretty the same and brings no benefit IMO, does it? Every time I used `getRootOperand` I also needed some additional traverse through the types te get some another information, so I couldn't avoid the `while` loop there. So I decided not to introduce a new method in `SymbolCast`.

vsavchenko added inline comments.Jul 6 2021, 9:07 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3347–3348	Aha, I see your point. I guess we can take it into `SymExpr` and call it not `getRootOperand`, which won't tell much to a person reading the name, but something like `ignoreCasts`. It will fit well with `Expr::IgnoreCasts`, `Expr::IgnoreParens`, etc.

ASDenysPetrov added inline comments.Jul 6 2021, 10:11 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3347–3348	Nice idea! True, `getRootOperand` would only tell enough to user in scope of `SymbolCast`. I'll try to implement this in the next update.

Added SymExpr::ignoreCasts method. Added descriptive comments.

Harbormaster completed remote builds in B112999: Diff 357240.Jul 8 2021, 8:56 AM

Added more descriptive comments. Fixed RangeConstraintManager::updateExistingConstraints function.

Harbormaster completed remote builds in B113164: Diff 357463.Jul 9 2021, 3:04 AM

Can you please explain why you do the same thing in two different ways?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)

@vsavchenko

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3343–3345	I've fixed `RangeConstraintManager::updateExistingConstraints`. There was a mistake when I update smaller types from the root symbol, but correct symbol is the given symbol which is before calling `ignoreCast()`. May be now it would be more clear for you.

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

In D103096#2866730, @vsavchenko wrote:

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:

void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

In D103096#2867021, @ASDenysPetrov wrote:
In D103096#2866730, @vsavchenko wrote:

In D103096#2866704, @ASDenysPetrov wrote:

@vsavchenko

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:
void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

OK, it's something! Good!
I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	Why do you use `VisitSymExpr` here? You want to interrupt all `Visits or... I'm not sure I fully understand.
1274	Can we get a test for that?
1275	Same goes here.
1300	Why do you get associated constraint directly without consulting with what `SymbolRangeInferrer` can tell you about it?

Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:

int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)

That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

In D103096#2867104, @ASDenysPetrov wrote:
Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:
int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)
That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

What we do in Inferrer is that we try to look at many sources of information and intersect their ranges. And I repeat my question again in a bit different form, why can't it look up constraints for (char)x and for (short)x and intersect them?
You should admit you never really address this question. Why can't VisitSymolCast do everything?

In D103096#2867136, @ASDenysPetrov wrote:

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

It should not be about debugging, it's your code! Why did you write it this way!?

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

ASDenysPetrov added inline comments.Jul 9 2021, 10:50 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1300	What do you mean? I didn't get. Could you give an example?

In D103096#2867441, @ASDenysPetrov wrote:

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

[-10, 10] is also valid, right? You can't keep things at their best all the time. And if you want all constraints directly in the map then what's all this logic in VisitSymbolCast? That's why I keep asking why do you need both parts of this solution and didn't get any answer so far.
I'm hands down for the incremental approach and adding small-to-medium size improvements on top of each other. That makes my life as a reviewer easier :) That's said, I don't want to commit to a big solution, where the author doesn't want to explain why there are two parts of the solution instead of one.

I want you to tell me why the code that's in VisitSymbolCast does what it does. And the same about updateExistingConstraints. Also I want to hear a solid reason why it's split this way and why we need both of them.

You should understand that I'm not peaking on you personally. The review process takes a lot of my time too. I want to make it easier for both of us. When the reviewer understand what you are going for, it is much easier for them to help you in refining your solution. This patch is very big, but the summary doesn't cover the main part: the approach. And you leave me here dragging it out of you.

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 12 2021, 9:47 AM

@vsavchenko I've updated the summary. I hope, I addressed your question. Thanks.

ASDenysPetrov edited the summary of this revision. (Show Details)Jul 12 2021, 10:16 AM

ASDenysPetrov added inline comments.Jul 13 2021, 5:52 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	Here we want to delegate the reasoning to another handler as we don't support non-integral cast yet.
1274	I'll add some.

ASDenysPetrov added inline comments.Jul 13 2021, 6:23 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Do you think the recursive call is better than the loop? But, I guess, I see your point. You option could be safer if we had another implementation of the virtual method. Or you think such alike cast symbol is possible in the future? Well, for now `ignoreCasts` doesn't make sense to any other `Expr` successors.

I'll allocate some time to get into your summary, but for now here are my concerns about SymbolRangeInferrer and VisitSymbolCast.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	You are not delegating it here. `Visit` includes a runtime dispatch that calls a correct `VisitTYPE` method. Here you call `VisitSymExpr` directly, which is one of the `VisitTYPE` methods. No dispatch will happen, and since we use `VisitSymExpr` as the last resort (it's the most base class, if we got there, we don't actually support the expression), you interrupt the `Visit` and go directly to "the last resort". See the problem now?
1300	`getConstraint` returns whatever constraint we have stored directly in the constraint map. That's the main source of information for ranges, but not the only one. Here is the of things that you skip, when you do `getConstraint` here: we can understand that something is equality/disequality check and find the corresponding info in Equivalence Classes data structure we can see that the expression has the form `A - B` and we can find constraint for `B - A` we can see that the expression is comparison `A op B` and check what other comparison info we have on `A` and `B` (your own change) we can see that the expression is of form `A op B` and check if we know something about `A` and `B`, and produce a reasonable constraint out of this information In order to use the right information, you should use `infer` that will actually do all other things as well. That's how `SymbolRangeInferrer` is designed, to be recursive. Speaking of recursiveness. All these loops and manually checking for types of the cast's operand is against this pattern. Recursive visitors should call `Visit` for children nodes (like `RecursiveASTVisitor`). In other words, if `f(x)` is a visit function, it should be defined like this: f(x) = g(f(x->operand_1), f(x->operand_2), ... , f(x->operand_N)) or if we talk about your case specifically: f(x: SymbolCast) = h(f(x->Operand)) and the `h` function should transform the range set returned by `f(x->Operand)` into a range set appropriate for `x`. NOTE: `h` can also intersect different ranges

vsavchenko added inline comments.Jul 13 2021, 6:58 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Oh, wait, why is it even virtual? I don't think that it should be virtual. Are similar functions in `Expr` virtual? And I think that this implementation should live in `SymExpr` directly. Then it would look like: if (const SymbolCast *ThisAsCast = dyn_cast<SymbolCast>(this)) { return ThisAsCast->ignoreCasts(); } return this;

ASDenysPetrov added inline comments.Jul 13 2021, 7:37 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Yes, `SymExpr` is an abstract class. And because of limitations and dependency of inheritance we are not able to know the implementaion of `SymbolCast`. Unfortunately, this is not a CRTP.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	OK. I reject this idea before. If we call `Visit` inside `VisitSymbolCast`, we will go into recursive loop, because it will return us back to `VisitSymbolCast` as we have passed `Sym` as is. (This is theoretically, I didn't check in practice.) Or I'm missing smth? I choosed `VisitSymExpr` here because all kinds of `SymbolCast` were previously handled here. So I decided to pass all unsupproted forms of casts there.
1300	Thank you for useful notes! I'll take them into account.

vsavchenko added inline comments.Jul 13 2021, 8:01 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	Did I suggest to `Visit(Sym)`? Of course it is going to end up in a loop! Why isn't it `Visit(Sym->getOperand())` here? Before we started producing casts, casts were transparent. This logic would fit perfectly with that.
1275	And here, since we couldn't really reason about it, we usually return `infer(T)`.

OK, thanks for putting a summary. I now got a good idea why you need both.
At the same time, take a look at D105692. I'm about to land it and I think it's going to be useful for you.

ASDenysPetrov added inline comments.Jul 13 2021, 11:32 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	were transparent. Not exactly. There are still some cases when symbols are not equal to there roots(aka Operands). Such cases were handled by `VisitSymExpr` which uses `infer(Sym->getType());` instead of getOperand`. So this needs a sort of think twice. Also I see a problem with `EquivalenceClass`'es. Consider next: int x, y; if(x == y) if ((char)x == 2) if(y == 259) // Here we shall update `(char)x` and find this branch infeasible. Also such cases like: if(x == (short)y) // What we should do(store) with(in) `EquivalenceClass`es. Currently, I have an obscure vision of the solution.

// 1. `VisitSymbolCast`.
// Get a range for main `reg_$0<int x>` - [-2147483648, 2147483647]
// Cast main range to `short` - [-2147483648, 2147483647] -> [-32768, 32767].
// Now we get a valid range for further bifurcation - [-32768, 32767].

That's a great example, thanks for putting it together. I can see your point now!

Please, rebase your change and make use of ConstraintAssignor, and rework VisitSymbolCast.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Re-read my comment, please.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1262	There are still some cases when symbols are not equal to there roots(aka Operands) Right now we don't have casts, this is what we do currently. However faulty it is, it is the existing solution and we should respect that. Also I see a problem with EquivalenceClass'es. Because of the current situation with casts (or more precisely with their lack), `EquivalenceClass`es do not get merged for symbols with different types. It is as simple as that. You can find similar tests in `equality_tracking.c`.

ASDenysPetrov added inline comments.Jul 14 2021, 10:26 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	Oh, wait, why is it even virtual? `ignoreCasts` is a virtual function because I haven't found any other way to implement it better. I don't think that it should be virtual. Unfortunately, this is not a CRTP to avoid dynamic linking. Are similar functions in Expr virtual? `SymExpr` is an abstract class. I'm not sure about similarity but `SymExpr` has such virtual methods: computeComplexity getType getOriginRegion And I think that this implementation should live in SymExpr directly. It's impossible due to `SymExpr` implementation design. `SymExpr` knows nothing about implementation details of `SymbolCast` to invoke `ignoreCasts()`.

vsavchenko added inline comments.Jul 14 2021, 10:38 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	a) `Expr` is also an abstract class b) I put the implementation right there in the comment above. I don't see any reasons not to use it. c) I don't buy it about "impossible" and "implementation design" because you can always declare function in one place and define it in the other.

Rebased

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	I think I achieved of what you've been concerned.

Harbormaster completed remote builds in B114049: Diff 358685.Jul 14 2021, 1:25 PM

In D103096#2877818, @ASDenysPetrov wrote:

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

It is already a pattern in other type hierarchies.
Virtual functions are only good, when they can have multiple implementations. ignoreCasts by its name can have only one implementation and couldn't be virtual. That's it! It is more useable now, and less confusing for its users. The fact that its definition lives in some other cpp file doesn't change it.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	This function should be removed then.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3239–3241	No, please, remove duplication by putting it inside of the constraint assignor. It is designed specifically so we don't duplicate code around `assumeSymXX` functions.

@vsavchenko

It is already a pattern in other type hierarchies.

I just rarely met them. And it was hard to me reading the code searching for implementation all over the places.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗	(On Diff #357463)	NP.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3239–3241	+1. That's what I've recently thought about. :)

Improved ignoreCasts implementation. Adapted to ConstraintAssignor.

vsavchenko added inline comments.Jul 15 2021, 9:15 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2128–2132	That's not using `ConstraintAssignor`, you simply put your implementation in here. That won't do!

ASDenysPetrov added inline comments.Jul 15 2021, 9:31 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2128–2132	OK, please tell me how to use it correctly in my case.

vsavchenko added inline comments.Jul 15 2021, 9:36 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2128–2132	Can you read the comments first and then ask me if you have any specific questions?

Harbormaster completed remote builds in B114266: Diff 359007.Jul 15 2021, 11:20 AM

Adapted solution to ConstraintAssignor API. Added tests.

ASDenysPetrov added inline comments.Jul 16 2021, 6:38 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2128–2132	I think I did it. Could you please review the changes?

Harbormaster completed remote builds in B114485: Diff 359305.Jul 16 2021, 7:29 AM

Ping.

ASDenysPetrov retitled this revision from [analyzer] Implement cast for ranges of symbolic integers. to [analyzer] Implement cast for ranges of symbolic integers.Nov 17 2021, 1:59 AM

Rebased.

Harbormaster completed remote builds in B135157: Diff 388552.Nov 19 2021, 12:11 PM

ASDenysPetrov mentioned this in D114718: [analyzer] Implement a new checker for Strict Aliasing Rule..Nov 30 2021, 8:31 AM

ASDenysPetrov mentioned this in D115932: [Analyzer] Create and handle SymbolCast for pointer to integral conversion.Jan 10 2022, 4:35 AM

Ping. If there is somebody interested in this? :)

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 10:13 AM

First of all, thanks Denys for working on this, nice work! Here are my concerns and remarks.

I think this fixed set of types in NominalTypeList is too rigid. I don't like the fact that we have to iterate over all four types whenever we set a new constraint or when we try to infer. Also, I am thinking about downstream hardware architectures, where there might be integers with different bit-widths (@vabridgers). Also, at some point people will pursue us to support integers with arbitrary bitwidth (see _ExtInt)

Thus, I am proposing an alternative approach. We should have a SymExpr -> Set of SymExpr mapping in the State that represents the relation of symbols that are connected via some cast operations (see REGISTER_MAP_WITH_PROGRAMSTATE). Let's call this mapping as CastMap. The key should be the root symbol, i.e the symbol that is being declared first before all cast operations.

E.g. Let's have

int16 a = 128;

then we have a constraint [128,128] stored for $a. Then

if ((int8)a < 0)

creates a new symbol $a2 (SymbolCast) that has a new constraint [-128,-128] assigned to it. And we also keep track in the State, that $a and $a2 refers the same root symbol (a). We now have in the CastMap $a -> [$a2].

Now, let's say we have

if ((_ExtInt(7))a > 64)

then we can dig up the existing contraints from CastMap to check for the State's validity and we can update all the constraints of $a and $a2 as needed. Also, CastMap is updated: $a -> [$a2, $a3].

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
421–426	Does it really matter? I mean, why do we need this change?
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2212	Could you please rebase? The "simplification" code part had been merged already to llvm/main and it is not part of this change.
2227–2251	I think this hunk should remain in `assignSymExprToConst`. Why did you move it?
2280–2290	I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad. I mean, we should simply store the constraint directly for the symbol as it is. And then only in `VisitSymbolCast` should we infer the proper value from the stored constraint (if we can). (Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)
2281
2284
2285
2286
2322
2346–2350	Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g `0` or in case of the `true` rangeSet {[MIN, -1], [1, MAX]}).

Thank you for the review @martong! Your work is not less hard than mine. I'll rework and update the revision ASAP.

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
421–426	I investigated. This changes is not obligatory now. I'll remove it.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2227–2251	I'll remove. It's unrelated one.
2280–2290	I see what you mean. I thought about this. Here what I've faced with. Let's say you meet `(wchar_t)x > 0`, which you store like a pair {(wchar_t)x, [1,32767]}. Then you meet `(short)x < 0`, where you have to answer whether it's `true` or `false`. So would be your next step? Brute forcing all the constraint pairs to find any `x`-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient. What I propose is to "canonize" arbitrary types to a general form where this form could be a part of key along with `x` and we could get a constraint with a classic map complexity. So that: You meet `(wchar_t)x > 0`, which you convert `wchar_t` to `int16` and store like a pair {(int16)x, [1,32767]}. Then you meet `(short)x < 0`, where you convert `short` to `int16` and get a constraint. That's why I've introduced `NominalTypeList`. But now I admited your concern about arbitrary size of integers and will redesign my solution.
2346–2350	No, it's incorrect. Consider next: int x; if(x > 1000000 \|\| x < 100000) return; // x (100'000, 1000'000) if((int8)x != 42) return; // x (100'000, 1000'000) && (int8)x (42, 42) We can't just remove or invalidate `x (100'000, 1000'000)` because this range will still stay true. Strictly speaking `x` range should be updated with values 100394, 102442, 108586, ...,, 960554 and any other value within the range which has its lowest byte equals to 42. We can't just update the `RangeSet` with such a big amount of values due to performance issues. So we just assume it as less accurate.

martong added inline comments.Apr 25 2022, 3:07 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2280–2290	So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient. I don't think we need a brute force search among the constraints if we have the additional `CastMap` (that I have previously proposed). So, the next step would be: Lookup the root symbol of `(short)x` that is `(wchar_t)x` (supposing we have seen a declaration `wchar_t x;` first during the symbolic execution, but having a different root might work out as well). Then from the `CastMap` we can query in O(1) the set of the related cast symbols of the root symbol. From this set it takes to query the constraints for each of the members from the existing equivalneClass->constraint mapping. That's O(1) times the number of the members of the cast set (which is assumed to be very few in the usual case).
2346–2350	Okay, this makes perfect sense, thanks for the example!

@martong thank you for the idea. I've tried to implement it. Could you look at the patch once again, please? I've also described a new solution in the Summary.

Harbormaster completed remote builds in B161422: Diff 425248.Apr 26 2022, 9:37 AM

ASDenysPetrov marked 2 inline comments as done.Apr 26 2022, 9:37 AM

Thanks Denys for the update! This is getting really good.

I have some concerns though about the CastMap = Map<uint32_t, RangeSet>. I think we should have CastMap = Map<uint32_t, EquivalenceClass> instead, and we could get the RangeSet from the existing ConstraintRange mapping. By storing directly the RangeSet, the State might get out-of-sync when we introduce a constraint to another member in an equivalence class. (Besides that, our mapping of constraints is happening always by using the EquivalenceClasses as keys.)
I think this could solve the problematic code you posted earlier

int x, y;
if(x == y)
  if ((char)x == 2)
    if(y == 259)
      // Here we shall update `(char)x` and find this branch infeasible.

Here we have EqClass1: [x, y] , EqClass2: [(char)x] and they are not the same class, thus when you iterate over the CastMap, you can get the updated RangeSet for both classes, and the infeasibility can be discovered.

About this:

if(x == (short)y)
  // What we should do(store) with(in) `EquivalenceClass`es.

In this case, we have one EqClass with two members, the SymbolRef x and the SymbolCast (short)y. They both must have the same RangeSet associated to them. And this is already implemented. By referreing to the EqClass in the CastMap, we simply can reuse this information.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1337–1338	By using `llvm::Expected<T>` we would be more aligned with the llvm error handling practices. Besides, the `bool` in the tuple and the `Success` variable in the function below would not be needed.

NoQ added inline comments.Apr 28 2022, 1:54 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
871–872	These maps will need to be cleaned up when symbols become dead (as in `RangeConstraintManager::removeDeadBindings()`).

Giving it some more thought, the SymCastMap = Map<SymbolRef, CastMap> should be keyed as well with an equivalence class : SymCastMap = Map<EquivalenceClass, CastMap>. This is the only way to use the equivalence info correctly when we process the casts.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
871–872	Yes, the same way as we clean up e.g. the `DisequalityMap`.

martong added inline comments.Apr 29 2022, 3:04 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397–1399	There might be a problem here because the iteration of the map is non-deterministic. We should probably have a copy that is sorted, or the container should be sorted (sorted immutable list maybe?). Your tests below passed probably because the cast chains are too small. Could you please have a test, where the chain is really long (20 maybe) and shuffled. (My thanks for @steakhal for this additional comment.)
2496–2499	Same here.
clang/test/Analysis/symbol-integral-cast.cpp
32

martong mentioned this in D124658: [analyzer] Canonicalize SymIntExpr so the RHS is positive when possible.Apr 29 2022, 5:33 AM

Ping

In D103096#3502955, @martong wrote:

Ping

Thank you, folk, for taking you time. I'll surely make corresponding changes according to your suggestions and notify you then. Sorry, @martong, for the late response. I'm pretty loaded recent times.

ASDenysPetrov edited the summary of this revision. (Show Details)May 19 2022, 10:26 AM

martong added a parent revision: D126481: [analyzer] Handle SymbolCast in SValBuilder.May 26 2022, 8:44 AM

Denys, I've created a very simple patch that makes the SValBuilder to be able to look up and use a constraint for an operand of a SymbolCast. That change passes 2 of your test cases, thus I made that a parent patch.

clang/test/Analysis/symbol-integral-cast.cpp
13–37	These two tests are redundant because they are handled by the Parent patch I've just created. https://reviews.llvm.org/D126481

ASDenysPetrov mentioned this in D126481: [analyzer] Handle SymbolCast in SValBuilder.May 27 2022, 11:14 AM

@martong Just FYI. I've been working on reworking this solution to using EquivalenceClasses for several weeks. It turned out that this is an extremely hard task to acomplish. There're a lot of cast cases like: (int8)x==y, (uint16)a==(int64)b, (uint8)y == b, Merging and inferring all of this without going beyond the complexity O(n) is really tricky.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397–1399	I've checked. `ImmutableSet` gave me a sorted order. But I agree that it could be just a coincidence. I'll try to add more tests.

In D103096#3613059, @ASDenysPetrov wrote:

@martong Just FYI. I've been working on reworking this solution to using EquivalenceClasses for several weeks. It turned out that this is an extremely hard task to acomplish. There're a lot of cast cases like: (int8)x==y, (uint16)a==(int64)b, (uint8)y == b, Merging and inferring all of this without going beyond the complexity O(n) is really tricky.

Please elaborate. I don't see how is it different than merging and inferring without the handling of casts. My understanding is that, we have more symbols (additional SymbolCasts) indeed. But, the merging algorithm should not change at all, that should be agnostic to the actual symbol kind (whether that is a SymbolCast or a SymbolData or a SymSymEpxr).
The infer algorithm might be different though, but there I think the original algorithm you delved for SymExprs should work as well for EquivalenceClasses.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397–1399	Yes, your are right, and I was wrong. `ImmutableSet` is based on an AVL tree, which is a balanced binary tree and the `begin()` gives such an iterator that conducts an inorder - thus sorted - traversal. And the key is an integer. (I don't know now how, but we were mislead, @steakhal too. I guess, we thought that the key of `CastMap` is a pointer.)
2496–2499	Please disregard the comment above.

@martong Thank you for your patience. I started moving to a bit another direction that we can improving it iteratively.
Just spoiling, in my latest solution a single symbol will be associated to many classes. Here are also tricky cases:

Consider equalities

a32 == (int8)b32
b32 == c32

Class-Symbol map is:

class1 { a32 , (int8)b32 }
class2 { b32 , c32 }

Symbol-Class map is:

a32 { 32 : class1 }
b32 {  8 : class1, 32 : class2  }
c32 { 32 : class2 }

If we know:

(int8)c32 == -1

then what is:

(int8)a32 - ?

Should we traverse like a -> 32 -> class1 -> (int8)b32 -> b32 -> class2 -> c32 -> (int8)c32 ?

The x8 == y32 we can treat as a range of int8 ( {-128, 127} or {0, 255} ).

For (int8)x32 == (int16)x32 we can eliminate one of the symbols in the class a s a redundant one.

If x32 == 0 then we can simplify next classes (int16)x32 == y and (int8)x32 == z merging them into a single class {x32, y, z}.

I believe there are more cases.

Thanks Denys for your continued work on this. These are very good questions that must be answered, we need exactly such thinking to implement this properly. I believe we can advance gradually.

In D103096#3642681, @ASDenysPetrov wrote:
@martong Thank you for your patience. I started moving to a bit another direction that we can improving it iteratively.
Just spoiling, in my latest solution a single symbol will be associated to many classes. Here are also tricky cases:

Consider equalities
a32 == (int8)b32
b32 == c32
Class-Symbol map is:
class1 { a32 , (int8)b32 }
class2 { b32 , c32 }
Symbol-Class map is:
a32 { 32 : class1 }
b32 {  8 : class1, 32 : class2  }
c32 { 32 : class2 }
If we know:
(int8)c32 == -1
then what is:
(int8)a32 - ?
Should we traverse like a -> 32 -> class1 -> (int8)b32 -> b32 -> class2 -> c32 -> (int8)c32 ?

I think, we should have only a -> 32 -> class1 -> (int8)b32. The (int8)b32 -> b32 step would be incorrect according to the modulo logic.
With other words, we should check the equivalence class of the root symbol (and no other eq classes should be considered), in this case this is only class1. (More precisely we should check the SymCastMap of class1.)

The x8 == y32 we can treat as a range of int8 ( {-128, 127} or {0, 255} ).

I am not sure what you mean here.
You mean, when we bifurcate on x8 == y32 ? Then we have two branches, the false case x8 == y32: [0, 0] and x8 == y32: [[INT_MIN, -1], [1,INT_MAX]]

For (int8)x32 == (int16)x32 we can eliminate one of the symbols in the class a s a redundant one.

Yes, but this is more like an optimization step. I'd handle this with low priority and with a FIXME comment.

If x32 == 0 then we can simplify next classes (int16)x32 == y and (int8)x32 == z merging them into a single class {x32, y, z}.

Good point. This is an optimization for precision. I'd like to have this, but in a subsequent patch. Let's try to have the absolute simplest working version in this patch.
Also, this can extend to any concrete value that is meaningful in the smaller types. E.g. any single value of x32 in [0,127] could simplify (int8)x32.

I believe there are more cases.

Yes. Consider liveness for example. We should remove the class from SymCastMap if the class itself becomes dead. This should be part of this patch.

ASDenysPetrov mentioned this in D112621: [analyzer][solver] Introduce reasoning for not equal to operator.Jul 15 2022, 10:03 AM

Completely reworked solution.

ASDenysPetrov edited parent revisions, added: D138319: [analyzer] Prepare structures for integral cast feature introducing; removed: D126481: [analyzer] Handle SymbolCast in SValBuilder, D105340: [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast.Nov 22 2022, 9:33 AM

Harbormaster completed remote builds in B199009: Diff 477233.Nov 22 2022, 9:33 AM

Revision Contents

Path

Size

clang/

include/

clang/

StaticAnalyzer/

Checkers/

SValExplainer.h

5 lines

Core/

PathSensitive/

SymExpr.h

2 lines

lib/

StaticAnalyzer/

Checkers/

ExprInspectionChecker.cpp

6 lines

Core/

RangeConstraintManager.cpp

321 lines

SimpleSValBuilder.cpp

5 lines

SymbolManager.cpp

7 lines

test/

Analysis/

symbol-integral-cast.cpp

374 lines

Diff 388552

clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
// Add the relevant code once it does.		// Add the relevant code once it does.

std::string VisitSymSymExpr(const SymSymExpr *S) {		std::string VisitSymSymExpr(const SymSymExpr *S) {
return "(" + Visit(S->getLHS()) + ") " +		return "(" + Visit(S->getLHS()) + ") " +
std::string(BinaryOperator::getOpcodeStr(S->getOpcode())) +		std::string(BinaryOperator::getOpcodeStr(S->getOpcode())) +
" (" + Visit(S->getRHS()) + ")";		" (" + Visit(S->getRHS()) + ")";
}		}

// TODO: SymbolCast doesn't appear in practice.		std::string VisitSymbolCast(const SymbolCast *S) {
// Add the relevant code once it does.		return "(" + S->getType().getAsString() + ")" + Visit(S->getOperand());
		}

std::string VisitSymbolicRegion(const SymbolicRegion *R) {		std::string VisitSymbolicRegion(const SymbolicRegion *R) {
// Explain 'this' object here.		// Explain 'this' object here.
// TODO: Explain CXXThisRegion itself, find a way to test it.		// TODO: Explain CXXThisRegion itself, find a way to test it.
if (isThisObject(R))		if (isThisObject(R))
return "'this' object";		return "'this' object";
// Objective-C objects are not normal symbolic regions. At least,		// Objective-C objects are not normal symbolic regions. At least,
// they're always on the heap.		// they're always on the heap.
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	public:

virtual void dump() const;		virtual void dump() const;

virtual void dumpToStream(raw_ostream &os) const {}		virtual void dumpToStream(raw_ostream &os) const {}

virtual QualType getType() const = 0;		virtual QualType getType() const = 0;
virtual void Profile(llvm::FoldingSetNodeID &profile) = 0;		virtual void Profile(llvm::FoldingSetNodeID &profile) = 0;

		const SymExpr *ignoreCasts() const;

/// Iterator over symbols that the current symbol depends on.		/// Iterator over symbols that the current symbol depends on.
///		///
/// For SymbolData, it's the symbol itself; for expressions, it's the		/// For SymbolData, it's the symbol itself; for expressions, it's the
/// expression symbol and all the operands in it. Note, SymbolDerived is		/// expression symbol and all the operands in it. Note, SymbolDerived is
/// treated as SymbolData - the iterator will NOT visit the parent region.		/// treated as SymbolData - the iterator will NOT visit the parent region.
class symbol_iterator {		class symbol_iterator {
SmallVector<const SymExpr *, 5> itr;		SmallVector<const SymExpr *, 5> itr;

▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp

Show First 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	void ExprInspectionChecker::analyzerDenote(const CallExpr *CE,
const auto *E = dyn_cast<StringLiteral>(CE->getArg(1)->IgnoreParenCasts());		const auto *E = dyn_cast<StringLiteral>(CE->getArg(1)->IgnoreParenCasts());
if (!E) {		if (!E) {
reportBug("Not a string literal", C);		reportBug("Not a string literal", C);
return;		return;
}		}

ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();

		// Unwrap symbolic expression to skip argument casts on function call.
		// This is useful when there is no way for overloading function in C
		// but we need to pass different types of arguments and
		// implicit cast occures.
		Sym = Sym->ignoreCasts();

		martongUnsubmitted Done Reply Inline Actions Does it really matter? I mean, why do we need this change? martong: Does it really matter? I mean, why do we need this change?
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I investigated. This changes is not obligatory now. I'll remove it. ASDenysPetrov: I investigated. This changes is not obligatory now. I'll remove it.
C.addTransition(C.getState()->set<DenotedSymbols>(Sym, E));		C.addTransition(C.getState()->set<DenotedSymbols>(Sym, E));
}		}

namespace {		namespace {
class SymbolExpressor		class SymbolExpressor
: public SymExprVisitor<SymbolExpressor, Optional<std::string>> {		: public SymExprVisitor<SymbolExpressor, Optional<std::string>> {
ProgramStateRef State;		ProgramStateRef State;

▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

//== RangeConstraintManager.cpp - Manage range constraints.------*- C++ -*--==// //== RangeConstraintManager.cpp - Manage range constraints.------*- C++ -*--==//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This file defines RangeConstraintManager, a class that tracks simple // This file defines RangeConstraintManager, a class that tracks simple

// equality and inequality constraints on symbolic values of ProgramState. // equality and inequality constraints on symbolic values of ProgramState.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "clang/Basic/JsonSupport.h" #include "clang/Basic/JsonSupport.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/APSIntType.h" #include "clang/StaticAnalyzer/Core/PathSensitive/APSIntType.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h"

#include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h" #include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h"

#include "llvm/ADT/FoldingSet.h" #include "llvm/ADT/FoldingSet.h"

#include "llvm/ADT/ImmutableSet.h" #include "llvm/ADT/ImmutableSet.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/StringExtras.h"

#include "llvm/ADT/SmallSet.h" #include "llvm/ADT/SmallSet.h"

#include "llvm/ADT/StringExtras.h"

#include "llvm/Support/Compiler.h" #include "llvm/Support/Compiler.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include <algorithm> #include <algorithm>

#include <iterator> #include <iterator>

using namespace clang; using namespace clang;

using namespace ento; using namespace ento;

▲ Show 20 Lines • Show All 829 Lines • ▼ Show 20 Lines

REGISTER_MAP_WITH_PROGRAMSTATE(ClassMap, SymbolRef, EquivalenceClass) REGISTER_MAP_WITH_PROGRAMSTATE(ClassMap, SymbolRef, EquivalenceClass)

REGISTER_MAP_WITH_PROGRAMSTATE(ClassMembers, EquivalenceClass, SymbolSet) REGISTER_MAP_WITH_PROGRAMSTATE(ClassMembers, EquivalenceClass, SymbolSet)

REGISTER_MAP_WITH_PROGRAMSTATE(ConstraintRange, EquivalenceClass, RangeSet) REGISTER_MAP_WITH_PROGRAMSTATE(ConstraintRange, EquivalenceClass, RangeSet)

REGISTER_SET_FACTORY_WITH_PROGRAMSTATE(ClassSet, EquivalenceClass) REGISTER_SET_FACTORY_WITH_PROGRAMSTATE(ClassSet, EquivalenceClass)

REGISTER_MAP_WITH_PROGRAMSTATE(DisequalityMap, EquivalenceClass, ClassSet) REGISTER_MAP_WITH_PROGRAMSTATE(DisequalityMap, EquivalenceClass, ClassSet)

namespace { namespace {

/// This class encapsulates a set of symbols equal to each other. /// This class encapsulates a set of symbols equal to each other.

NoQUnsubmitted

Done

These maps will need to be cleaned up when symbols become dead (as in RangeConstraintManager::removeDeadBindings()).

NoQ: These maps will need to be cleaned up when symbols become dead (as in `RangeConstraintManager…

martongUnsubmitted

Done

Yes, the same way as we clean up e.g. the DisequalityMap.

martong: Yes, the same way as we clean up e.g. the `DisequalityMap`.

/// ///

/// The main idea of the approach requiring such classes is in narrowing /// The main idea of the approach requiring such classes is in narrowing

/// and sharing constraints between symbols within the class. Also we can /// and sharing constraints between symbols within the class. Also we can

/// conclude that there is no practical need in storing constraints for /// conclude that there is no practical need in storing constraints for

/// every member of the class separately. /// every member of the class separately.

/// ///

/// Main terminology: /// Main terminology:

/// ///

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines public:

/// members. In this case, the class is still non-trivial (it still has the /// members. In this case, the class is still non-trivial (it still has the

/// mappings in ClassMembers), even though it has only one member. /// mappings in ClassMembers), even though it has only one member.

LLVM_NODISCARD inline bool isTrivial(ProgramStateRef State) const; LLVM_NODISCARD inline bool isTrivial(ProgramStateRef State) const;

/// Return true if the current class is trivial and its only member is dead. /// Return true if the current class is trivial and its only member is dead.

LLVM_NODISCARD inline bool isTriviallyDead(ProgramStateRef State, LLVM_NODISCARD inline bool isTriviallyDead(ProgramStateRef State,

SymbolReaper &Reaper) const; SymbolReaper &Reaper) const;

LLVM_NODISCARD static inline ProgramStateRef LLVM_NODISCARD static inline ProgramStateRef

markDisequal(RangeSet::Factory &F, ProgramStateRef State, SymbolRef First, markDisequal(RangeSet::Factory &F, ProgramStateRef State, SymbolRef First,

SymbolRef Second); SymbolRef Second);

LLVM_NODISCARD static inline ProgramStateRef LLVM_NODISCARD static inline ProgramStateRef

markDisequal(RangeSet::Factory &F, ProgramStateRef State, markDisequal(RangeSet::Factory &F, ProgramStateRef State,

EquivalenceClass First, EquivalenceClass Second); EquivalenceClass First, EquivalenceClass Second);

LLVM_NODISCARD inline ProgramStateRef LLVM_NODISCARD inline ProgramStateRef

markDisequal(RangeSet::Factory &F, ProgramStateRef State, markDisequal(RangeSet::Factory &F, ProgramStateRef State,

EquivalenceClass Other) const; EquivalenceClass Other) const;

LLVM_NODISCARD static inline ClassSet LLVM_NODISCARD static inline ClassSet

vsavchenkoUnsubmitted

Not Done

That's definitely regresses the interface, so NominalTypeList should be definitely reworked.

vsavchenko: That's definitely regresses the interface, so `NominalTypeList` should be definitely reworked.

getDisequalClasses(ProgramStateRef State, SymbolRef Sym); getDisequalClasses(ProgramStateRef State, SymbolRef Sym);

LLVM_NODISCARD inline ClassSet LLVM_NODISCARD inline ClassSet

getDisequalClasses(ProgramStateRef State) const; getDisequalClasses(ProgramStateRef State) const;

LLVM_NODISCARD inline ClassSet LLVM_NODISCARD inline ClassSet

getDisequalClasses(DisequalityMapTy Map, ClassSet::Factory &Factory) const; getDisequalClasses(DisequalityMapTy Map, ClassSet::Factory &Factory) const;

LLVM_NODISCARD static inline Optional<bool> areEqual(ProgramStateRef State, LLVM_NODISCARD static inline Optional<bool> areEqual(ProgramStateRef State,

EquivalenceClass First, EquivalenceClass First,

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines

} }

LLVM_NODISCARD ProgramStateRef setConstraint(ProgramStateRef State, LLVM_NODISCARD ProgramStateRef setConstraint(ProgramStateRef State,

EquivalenceClass Class, EquivalenceClass Class,

RangeSet Constraint) { RangeSet Constraint) {

return State->set<ConstraintRange>(Class, Constraint); return State->set<ConstraintRange>(Class, Constraint);

} }

LLVM_NODISCARD ProgramStateRef setConstraint(ProgramStateRef State,

SymbolRef Sym,

RangeSet Constraint) {

return State->set<ConstraintRange>(EquivalenceClass::find(State, Sym),

Constraint);

}

LLVM_NODISCARD ProgramStateRef setConstraints(ProgramStateRef State, LLVM_NODISCARD ProgramStateRef setConstraints(ProgramStateRef State,

ConstraintRangeTy Constraints) { ConstraintRangeTy Constraints) {

return State->set<ConstraintRange>(Constraints); return State->set<ConstraintRange>(Constraints);

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Equality/diseqiality abstraction // Equality/diseqiality abstraction

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines LLVM_NODISCARD inline

} }

return intersect(F, Second, Tail...); return intersect(F, Second, Tail...);

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Symbolic reasoning logic // Symbolic reasoning logic

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

/// This class is used for integral symbolic casts feature as a helper instance.

vsavchenkoUnsubmitted

Not Done

Comments on:

why do we need it?
why does it have four types?
why do we not care about signed/unsigned types?

vsavchenko: Comments on: * why do we need it? * why does it have four types? * why do we not care about…

///

/// It represents a list of integral types of different sizes going in ascending

/// order from 1 to 8 bytes. It aggregates several functions for convenience of

/// usage. We can iterate through the types and find a type by size (bit width).

///

/// We use FOUR integer types: `int8`, `int16`, `int32`, `int64`, because we

/// only support casts between types, which are lower or equal to 64-bit width.

///

/// We use these types for creating SymbolCast to find constraints in the

/// constraint map. This allows to canonize a `key-value` to store and retrieve

/// constraints instead of brute force.

///

/// We don't care about the type signedness. Signedness is just a way of bits

/// representation. We just care about saving data. It's enough for us to store

/// specific constraints for the type for a specific bit width. We never use

/// retrieved constraint directly. We always use RangeSet::Factory::castTo to

/// get ranges for a needed type (signed or unsigned) after retrieving.

class NominalTypeList {

CanQualType Types[4];

public:

using Iterator = CanQualType *;

void init(ASTContext &C) {

Types[0] = C.Char8Ty;

Types[1] = C.Char16Ty;

Types[2] = C.Char32Ty;

vsavchenkoUnsubmitted

Not Done

This looks like a very static data structure to me, I don't see any reasons why the user should be able to create multiple copies of it.
If it becomes a static data-structure, there will be no need in passing it around.

vsavchenko: This looks like a very `static` data structure to me, I don't see any reasons why the user…

Types[3] = C.LongLongTy;

}

Iterator findByWidth(uint32_t Width) {

int index = 4;

switch (Width) {

case 8:

index = 0;

break;

case 16:

index = 1;

break;

case 32:

index = 2;

break;

case 64:

index = 3;

};

return Types + index;

}

Iterator begin() { return std::begin(Types); }

Iterator end() { return std::end(Types); }

};

// We should initialize NTL with `init` method before use.

static NominalTypeList NTL;

/// A little component aggregating all of the reasoning we have about /// A little component aggregating all of the reasoning we have about

/// the ranges of symbolic expressions. /// the ranges of symbolic expressions.

/// ///

/// Even when we don't know the exact values of the operands, we still /// Even when we don't know the exact values of the operands, we still

/// can get a pretty good estimate of the result's range. /// can get a pretty good estimate of the result's range.

class SymbolicRangeInferrer class SymbolicRangeInferrer

: public SymExprVisitor<SymbolicRangeInferrer, RangeSet> { : public SymExprVisitor<SymbolicRangeInferrer, RangeSet> {

public: public:

template <class SourceType> template <class SourceType>

static RangeSet inferRange(RangeSet::Factory &F, ProgramStateRef State, static RangeSet inferRange(RangeSet::Factory &F, ProgramStateRef State,

SourceType Origin) { SourceType Origin) {

SymbolicRangeInferrer Inferrer(F, State); SymbolicRangeInferrer Inferrer(F, State);

return Inferrer.infer(Origin); return Inferrer.infer(Origin);

} }

RangeSet VisitSymbolCast(const SymbolCast *Sym) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

if (!Opts.ShouldSupportSymbolicIntegerCasts)

return VisitSymExpr(Sym);

vsavchenkoUnsubmitted

Not Done

Why do you use VisitSymExpr here? You want to interrupt all `Visits or... I'm not sure I fully understand.

vsavchenko: Why do you use `VisitSymExpr` here? You want to interrupt all `Visits or... I'm not sure I…

ASDenysPetrovAuthorUnsubmitted

Done

Here we want to delegate the reasoning to another handler as we don't support non-integral cast yet.

ASDenysPetrov: Here we want to delegate the reasoning to another handler as we don't support non-integral cast…

vsavchenkoUnsubmitted

Not Done

You are not delegating it here. Visit includes a runtime dispatch that calls a correct VisitTYPE method. Here you call VisitSymExpr directly, which is one of the VisitTYPE methods. No dispatch will happen, and since we use VisitSymExpr as the last resort (it's the most base class, if we got there, we don't actually support the expression), you interrupt the Visit and go directly to "the last resort".

See the problem now?

vsavchenko: You are not delegating it here. `Visit` includes a runtime dispatch that calls a correct…

ASDenysPetrovAuthorUnsubmitted

Done

OK. I reject this idea before. If we call Visit inside VisitSymbolCast, we will go into recursive loop, because it will return us back to VisitSymbolCast as we have passed Sym as is. (This is theoretically, I didn't check in practice.) Or I'm missing smth?
I choosed VisitSymExpr here because all kinds of SymbolCast were previously handled here. So I decided to pass all unsupproted forms of casts there.

ASDenysPetrov: OK. I reject this idea before. If we call `Visit` inside `VisitSymbolCast`, we will go into…

vsavchenkoUnsubmitted

Not Done

Did I suggest to Visit(Sym)? Of course it is going to end up in a loop!
Why isn't it Visit(Sym->getOperand()) here? Before we started producing casts, casts were transparent. This logic would fit perfectly with that.

vsavchenko: Did I suggest to `Visit(Sym)`? Of course it is going to end up in a loop! Why isn't it…

ASDenysPetrovAuthorUnsubmitted

Done

were transparent.

Not exactly. There are still some cases when symbols are not equal to there roots(aka Operands). Such cases were handled by VisitSymExpr which uses infer(Sym->getType()); instead of getOperand`. So this needs a sort of think twice. Also I see a problem with EquivalenceClass'es. Consider next:

int x, y;
if(x == y)
  if ((char)x == 2)
    if(y == 259)
      // Here we shall update `(char)x` and find this branch infeasible.

Also such cases like:

if(x == (short)y)
  // What we should do(store) with(in) `EquivalenceClass`es.

Currently, I have an obscure vision of the solution.

ASDenysPetrov: > were transparent. Not exactly. There are still some cases when symbols are not equal to there…

vsavchenkoUnsubmitted

Not Done

There are still some cases when symbols are not equal to there roots(aka Operands)

Right now we don't have casts, this is what we do currently. However faulty it is, it is the existing solution and we should respect that.

Also I see a problem with EquivalenceClass'es.

Because of the current situation with casts (or more precisely with their lack), EquivalenceClasses do not get merged for symbols with different types. It is as simple as that.
You can find similar tests in equality_tracking.c.

vsavchenko: > There are still some cases when symbols are not equal to there roots(aka Operands) Right now…

// Unwrap symbol to get an underlying(root) symbol.

// Store every next type except the inner(original) one.

SmallVector<QualType, 2> Types;

uint32_t MinBitWidth = UINT32_MAX;

SymbolRef RootSym = Sym;

ASTContext &C = ValueFactory.getContext();

do {

// We only handle integral cast, when all the types are integrals.

// Otherwise, pass the given symbol to VisitSymExpr.

QualType T = RootSym->getType();

if (!T->isIntegralOrEnumerationType())

vsavchenkoUnsubmitted

Not Done

Can we get a test for that?

vsavchenko: Can we get a test for that?

ASDenysPetrovAuthorUnsubmitted

Done

I'll add some.

ASDenysPetrov: I'll add some.

return VisitSymExpr(Sym);

vsavchenkoUnsubmitted

Not Done

Same goes here.

vsavchenko: Same goes here.

vsavchenkoUnsubmitted

Not Done

And here, since we couldn't really reason about it, we usually return infer(T).

vsavchenko: And here, since we couldn't really reason about it, we usually return `infer(T)`.

MinBitWidth = std::min(MinBitWidth, C.getIntWidth(T));

Types.push_back(T);

RootSym = cast<SymbolCast>(RootSym)->getOperand();

} while (isa<SymbolCast>(RootSym));

QualType RootTy = RootSym->getType();

const uint32_t RootBitWidth = C.getIntWidth(RootTy);

// Check if we have any known truncated ranges of the root symbol.

// Truncated ranges usually are more precise then the original one.

// The more truncated is the range the more precise it should be.

// Example: Consider the given SymbolCast is (int8)(int64)(int16){int32 x}.

// `int8` - is the smallest type. Than the range will fit in it.

// Traverse through NTL types, that are smaller then the root type:

// [int8, int32).

const RangeSet *RSPtr = nullptr;

auto It = NTL.findByWidth(MinBitWidth);

auto E = NTL.findByWidth(RootBitWidth);

for (; !RSPtr && It < E; ++It) {

// Produce canonical symbols with the nominal type.

SymbolRef S =

State->getSymbolManager().getCastSymbol(RootSym, RootTy, *It);

// Find the first constraint and exit the loop.

RSPtr = getConstraint(State, S);

vsavchenkoUnsubmitted

Not Done

Why do you get associated constraint directly without consulting with what SymbolRangeInferrer can tell you about it?

vsavchenko: Why do you get associated constraint directly without consulting with what…

ASDenysPetrovAuthorUnsubmitted

Done

What do you mean? I didn't get. Could you give an example?

ASDenysPetrov: What do you mean? I didn't get. Could you give an example?

vsavchenkoUnsubmitted

Not Done

getConstraint returns whatever constraint we have stored directly in the constraint map. That's the main source of information for ranges, but not the only one.

Here is the of things that you skip, when you do getConstraint here:

we can understand that something is equality/disequality check and find the corresponding info in Equivalence Classes data structure
we can see that the expression has the form A - B and we can find constraint for B - A
we can see that the expression is comparison A op B and check what other comparison info we have on A and B (your own change)
we can see that the expression is of form A op B and check if we know something about A and B, and produce a reasonable constraint out of this information

In order to use the right information, you should use infer that will actually do all other things as well. That's how SymbolRangeInferrer is designed, to be recursive.

Speaking of recursiveness. All these loops and manually checking for types of the cast's operand is against this pattern. Recursive visitors should call Visit for children nodes (like RecursiveASTVisitor). In other words, if f(x) is a visit function, it should be defined like this:

f(x) = g(f(x->operand_1), f(x->operand_2), ... , f(x->operand_N))

or if we talk about your case specifically:

f(x: SymbolCast) = h(f(x->Operand))

and the h function should transform the range set returned by f(x->Operand) into a range set appropriate for x.

NOTE: h can also intersect different ranges

vsavchenko: `getConstraint` returns whatever constraint we have stored directly in the constraint map.

ASDenysPetrovAuthorUnsubmitted

Done

Thank you for useful notes! I'll take them into account.

ASDenysPetrov: Thank you for useful notes! I'll take them into account.

}

// If we didn't find any truncated ranges, look for the constraint for

vsavchenkoUnsubmitted

Not Done

I think all this extra logic about how we infer ranges for casts is interesting, but should be a separate patch.
For now, you can simply put return Visit(Sym->getOperand());.

First, it will unblock you from depending on that RangeFactory feature.
And also have quite a few questions about this particular implementation, so it will stagger this patch as well.

vsavchenko: I think all this extra logic about how we infer ranges for casts is interesting, but should be…

// the root type.

// Example (cont.): Use the root symbol `{int32 x}`.

if (!RSPtr)

RSPtr = getConstraint(State, RootSym);

// If there's no existing range, create it based on the root type.

// Example (cont.): Make range based on `int32`.

RangeSet RS = RSPtr ? *RSPtr : infer(RootTy);

// Cast the range to the cast types from inner to outer one by one.

// Example (cont.): Go through 3 types from `int16` to `int8`.

auto TypesReversedRange = llvm::make_range(Types.rbegin(), Types.rend());

for (const QualType T : TypesReversedRange)

RS = RangeFactory.castTo(RS, T);

// Finally we got a range of Sym->getType() type.

// Example (cont.): Type of range is `int8`.

return RS;

}

RangeSet VisitSymExpr(SymbolRef Sym) { RangeSet VisitSymExpr(SymbolRef Sym) {

// If we got to this function, the actual type of the symbolic // If we got to this function, the actual type of the symbolic

// expression is not supported for advanced inference. // expression is not supported for advanced inference.

// In this case, we simply backoff to the default "let's simply // In this case, we simply backoff to the default "let's simply

// infer the range from the expression's type". // infer the range from the expression's type".

return infer(Sym->getType()); return infer(Sym->getType());

} }

RangeSet VisitSymIntExpr(const SymIntExpr *Sym) { RangeSet VisitSymIntExpr(const SymIntExpr *Sym) {

return VisitBinaryOperator(Sym); return VisitBinaryOperator(Sym);

} }

RangeSet VisitIntSymExpr(const IntSymExpr *Sym) { RangeSet VisitIntSymExpr(const IntSymExpr *Sym) {

return VisitBinaryOperator(Sym); return VisitBinaryOperator(Sym);

} }

RangeSet VisitSymSymExpr(const SymSymExpr *Sym) { RangeSet VisitSymSymExpr(const SymSymExpr *Sym) {

martongUnsubmitted

Done

By using llvm::Expected<T> we would be more aligned with the llvm error handling practices. Besides, the bool in the tuple and the Success variable in the function below would not be needed.

martong: By using `llvm::Expected<T>` we would be more aligned with the [[ https://llvm.

return intersect( return intersect(

RangeFactory, RangeFactory,

// If Sym is (dis)equality, we might have some information // If Sym is (dis)equality, we might have some information

// on that in our equality classes data structure. // on that in our equality classes data structure.

getRangeForEqualities(Sym), getRangeForEqualities(Sym),

// And we should always check what we can get from the operands. // And we should always check what we can get from the operands.

VisitBinaryOperator(Sym)); VisitBinaryOperator(Sym));

} }

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines return intersect(

getRangeForComparisonSymbol(Sym), getRangeForComparisonSymbol(Sym),

// Apart from the Sym itself, we can infer quite a lot if we look // Apart from the Sym itself, we can infer quite a lot if we look

// into subexpressions of Sym. // into subexpressions of Sym.

Visit(Sym)); Visit(Sym));

} }

RangeSet infer(EquivalenceClass Class) { RangeSet infer(EquivalenceClass Class) {

if (const RangeSet *AssociatedConstraint = getConstraint(State, Class)) if (const RangeSet *AssociatedConstraint = getConstraint(State, Class))

return *AssociatedConstraint; return *AssociatedConstraint;

return infer(Class.getType()); return infer(Class.getType());

martongUnsubmitted

Done

There might be a problem here because the iteration of the map is non-deterministic. We should probably have a copy that is sorted, or the container should be sorted (sorted immutable list maybe?).

Your tests below passed probably because the cast chains are too small. Could you please have a test, where the chain is really long (20 maybe) and shuffled.
(My thanks for @steakhal for this additional comment.)

martong: There might be a problem here because the iteration of the map is non-deterministic. We should…

ASDenysPetrovAuthorUnsubmitted

Done

I've checked. ImmutableSet gave me a sorted order. But I agree that it could be just a coincidence. I'll try to add more tests.

ASDenysPetrov: I've checked. `ImmutableSet` gave me a sorted order. But I agree that it could be just a…

martongUnsubmitted

Done

Yes, your are right, and I was wrong. ImmutableSet is based on an AVL tree, which is a balanced binary tree and the begin() gives such an iterator that conducts an inorder - thus sorted - traversal. And the key is an integer. (I don't know now how, but we were mislead, @steakhal too. I guess, we thought that the key of CastMap is a pointer.)

martong: Yes, your are right, and I was wrong. `ImmutableSet` is based on an AVL tree, which is a…

} }

/// Infer range information solely from the type. /// Infer range information solely from the type.

RangeSet infer(QualType T) { RangeSet infer(QualType T) {

// Lazily generate a new RangeSet representing all possible values for the // Lazily generate a new RangeSet representing all possible values for the

// given symbol type. // given symbol type.

RangeSet Result(RangeFactory, ValueFactory.getMinValue(T), RangeSet Result(RangeFactory, ValueFactory.getMinValue(T),

ValueFactory.getMaxValue(T)); ValueFactory.getMaxValue(T));

▲ Show 20 Lines • Show All 463 Lines • ▼ Show 20 Lines

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Constraint manager implementation details // Constraint manager implementation details

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

class RangeConstraintManager : public RangedConstraintManager { class RangeConstraintManager : public RangedConstraintManager {

public: public:

RangeConstraintManager(ExprEngine *EE, SValBuilder &SVB) RangeConstraintManager(ExprEngine *EE, SValBuilder &SVB)

: RangedConstraintManager(EE, SVB), F(getBasicVals()) {} : RangedConstraintManager(EE, SVB), F(getBasicVals()) {

NTL.init(SVB.getContext());

}

//===------------------------------------------------------------------===// //===------------------------------------------------------------------===//

// Implementation for interface from ConstraintManager. // Implementation for interface from ConstraintManager.

//===------------------------------------------------------------------===// //===------------------------------------------------------------------===//

bool haveEqualConstraints(ProgramStateRef S1, bool haveEqualConstraints(ProgramStateRef S1,

ProgramStateRef S2) const override { ProgramStateRef S2) const override {

// NOTE: ClassMembers are as simple as back pointers for ClassMap, // NOTE: ClassMembers are as simple as back pointers for ClassMap,

▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines

/// ConstraintAssignorBase is a small utility class that unifies visitor /// ConstraintAssignorBase is a small utility class that unifies visitor

/// for ranges with a visitor for constraints (rangeset/range/constant). /// for ranges with a visitor for constraints (rangeset/range/constant).

/// ///

/// It is designed to have one derived class, but generally it can have more. /// It is designed to have one derived class, but generally it can have more.

/// Derived class can control which types we handle by defining methods of the /// Derived class can control which types we handle by defining methods of the

/// following form: /// following form:

/// ///

/// bool handle${SYMBOL}To${CONSTRAINT}(const SYMBOL *Sym, /// bool assign${SYMBOL}To${CONSTRAINT}(const SYMBOL *Sym,

/// CONSTRAINT Constraint); /// CONSTRAINT Constraint);

/// ///

/// where SYMBOL is the type of the symbol (e.g. SymSymExpr, SymbolCast, etc.) /// where SYMBOL is the type of the symbol (e.g. SymSymExpr, SymbolCast, etc.)

/// CONSTRAINT is the type of constraint (RangeSet/Range/Const) /// CONSTRAINT is the type of constraint (RangeSet/Range/Const)

/// return value signifies whether we should try other handle methods /// return value signifies whether we should try other handle methods

/// (i.e. false would mean to stop right after calling this method) /// (i.e. false would mean to stop right after calling this method)

template <class Derived> class ConstraintAssignorBase { template <class Derived> class ConstraintAssignorBase {

public: public:

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines if (!Constraint.containsZero()) {

if (!State) if (!State)

return false; return false;

} }

return true; return true;

} }

inline bool assignSymExprToConst(const SymExpr *Sym, Const Constraint); inline bool assignSymExprToConst(const SymExpr *Sym, Const Constraint);

inline bool assignSymExprToRangeSet(const SymExpr *Sym, RangeSet Constraint);

inline bool assignSymIntExprToRangeSet(const SymIntExpr *Sym, inline bool assignSymIntExprToRangeSet(const SymIntExpr *Sym,

RangeSet Constraint) { RangeSet Constraint) {

return handleRemainderOp(Sym, Constraint); return handleRemainderOp(Sym, Constraint);

} }

inline bool assignSymSymExprToRangeSet(const SymSymExpr *Sym, inline bool assignSymSymExprToRangeSet(const SymSymExpr *Sym,

RangeSet Constraint); RangeSet Constraint);

inline bool assignSymbolCastToRangeSet(const SymbolCast *Sym,

RangeSet Constraint);

private: private:

ConstraintAssignor(ProgramStateRef State, SValBuilder &Builder, ConstraintAssignor(ProgramStateRef State, SValBuilder &Builder,

RangeSet::Factory &F) RangeSet::Factory &F)

: State(State), Builder(Builder), RangeFactory(F) {} : State(State), Builder(Builder), RangeFactory(F) {}

using Base = ConstraintAssignorBase<ConstraintAssignor>; using Base = ConstraintAssignorBase<ConstraintAssignor>;

/// Base method for handling new constraints for symbols. /// Base method for handling new constraints for symbols.

LLVM_NODISCARD ProgramStateRef assign(SymbolRef Sym, RangeSet NewConstraint) { LLVM_NODISCARD ProgramStateRef assign(SymbolRef Sym, RangeSet NewConstraint) {

// All constraints are actually associated with equivalence classes, and // All constraints are actually associated with equivalence classes, and

// that's what we are going to do first. // that's what we are going to do first.

State = assign(EquivalenceClass::find(State, Sym), NewConstraint); State = assign(EquivalenceClass::find(State, Sym), NewConstraint);

if (!State) if (!State)

return nullptr; return nullptr;

vsavchenkoUnsubmitted

Done

That's not using ConstraintAssignor, you simply put your implementation in here. That won't do!

vsavchenko: That's not using `ConstraintAssignor`, you simply put your implementation in here. That won't…

ASDenysPetrovAuthorUnsubmitted

Done

OK, please tell me how to use it correctly in my case.

ASDenysPetrov: OK, please tell me how to use it correctly in my case.

vsavchenkoUnsubmitted

Done

Can you read the comments first and then ask me if you have any specific questions?

vsavchenko: Can you read the comments first and then ask me if you have any specific questions?

ASDenysPetrovAuthorUnsubmitted

Done

I think I did it. Could you please review the changes?

ASDenysPetrov: I think I did it. Could you please review the changes?

// And after that we can check what other things we can get from this // And after that we can check what other things we can get from this

// constraint. // constraint.

Base::assign(Sym, NewConstraint); Base::assign(Sym, NewConstraint);

return State; return State;

} }

/// Base method for handling new constraints for classes. /// Base method for handling new constraints for classes.

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines private:

ProgramStateRef trackEquality(ProgramStateRef State, SymbolRef LHS, ProgramStateRef trackEquality(ProgramStateRef State, SymbolRef LHS,

SymbolRef RHS) { SymbolRef RHS) {

return EquivalenceClass::merge(RangeFactory, State, LHS, RHS); return EquivalenceClass::merge(RangeFactory, State, LHS, RHS);

} }

LLVM_NODISCARD Optional<bool> interpreteAsBool(RangeSet Constraint) { LLVM_NODISCARD Optional<bool> interpreteAsBool(RangeSet Constraint) {

assert(!Constraint.isEmpty() && "Empty ranges shouldn't get here"); assert(!Constraint.isEmpty() && "Empty ranges shouldn't get here");

if (Constraint.getConcreteValue()) if (const llvm::APSInt *Int = Constraint.getConcreteValue())

return !Constraint.getConcreteValue()->isZero(); return !Int->isZero();

if (!Constraint.containsZero()) if (!Constraint.containsZero())

return true; return true;

return llvm::None; return llvm::None;

} }

void updateExistingConstraints(SymbolRef Sym, RangeSet R);

SymbolRef getProperSymbol(SymbolRef Sym);

ProgramStateRef State; ProgramStateRef State;

SValBuilder &Builder; SValBuilder &Builder;

RangeSet::Factory &RangeFactory; RangeSet::Factory &RangeFactory;

}; };

//===----------------------------------------------------------------------===//

// ConstraintAssignor implementation details

//===----------------------------------------------------------------------===//

bool ConstraintAssignor::assignSymExprToRangeSet(const SymExpr *Sym,

martongUnsubmitted

Done

Could you please rebase? The "simplification" code part had been merged already to llvm/main and it is not part of this change.

martong: Could you please rebase? The "simplification" code part had been merged already to llvm/main…

RangeSet Constraint) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

if (Opts.ShouldSupportSymbolicIntegerCasts ||

!Sym->getType()->isIntegralOrEnumerationType()) {

updateExistingConstraints(Sym, Constraint);

if (!State)

return false;

}

// Next assignments is based on the fact that Constraint is a concrete value.

// Make sure of this.

if (!Constraint.getConcreteValue())

return true;

llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;

// Iterate over all equivalence classes and try to simplify them.

ClassMembersTy Members = State->get<ClassMembers>();

for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {

EquivalenceClass Class = ClassToSymbolSet.first;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

if (!State)

return false;

SimplifiedClasses.insert(Class);

}

// Trivial equivalence classes (those that have only one symbol member) are

// not stored in the State. Thus, we must skim through the constraints as

// well. And we try to simplify symbols in the constraints.

ConstraintRangeTy Constraints = State->get<ConstraintRange>();

for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {

EquivalenceClass Class = ClassConstraint.first;

if (SimplifiedClasses.count(Class)) // Already simplified.

continue;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

if (!State)

return false;

}

return true;

martongUnsubmitted

Done

I think this hunk should remain in assignSymExprToConst. Why did you move it?

martong: I think this hunk should remain in `assignSymExprToConst`. Why did you move it?

ASDenysPetrovAuthorUnsubmitted

Done

I'll remove. It's unrelated one.

ASDenysPetrov: I'll remove. It's unrelated one.

}

bool ConstraintAssignor::assignSymbolCastToRangeSet(const SymbolCast *Sym,

RangeSet R) {

AnalyzerOptions &Opts = State->getAnalysisManager().getAnalyzerOptions();

// If symbol is not integral or the option is off, we need another handler.

if (!Opts.ShouldSupportSymbolicIntegerCasts ||

!Sym->getType()->isIntegralOrEnumerationType())

return false;

// If range is empty, the branch is infeasible.

if (R.isEmpty()) {

State = nullptr;

return false;

}

SymbolRef S = getProperSymbol(Sym);

// If symbol is not integral, we need another handler.

if (!S)

return true;

R = RangeFactory.castTo(R, S->getType());

updateExistingConstraints(S, R);

State = setConstraint(State, S, R);

return false;

}

/// Return a symbol which is the best canidate to save it in the constraint

/// map. We should correct symbol because in case of truncation cast we can

martongUnsubmitted

Done

/// Return a symbol which is the best canidate to save it in the constraint

- /// map. We should correct symbol because in case of truncation cast we can

+ /// map. We should correct the symbol because in case of truncation cast we can

/// only reason about truncated bytes but not the whole value. E.g. (char)(int

martong:

/// only reason about truncated bytes but not the whole value. E.g. (char)(int

/// x), we can store constraints for the first lower byte but we still don't

/// know the root value. Also in case of promotion or converion we should

martongUnsubmitted

Done

/// x), we can store constraints for the first lower byte but we still don't

- /// know the root value. Also in case of promotion or converion we should

+ /// know the root value. Also, in case of promotion or conversion we should

/// store the root value instead of cast symbol, because we can always get

martong:

/// store the root value instead of cast symbol, because we can always get

martongUnsubmitted

Done

/// know the root value. Also in case of promotion or converion we should

- /// store the root value instead of cast symbol, because we can always get

+ /// store the root value instead of the cast symbol because we can always get

/// a correct range using `castTo` metho. And we are not intrested in any

martong:

/// a correct range using `castTo` metho. And we are not intrested in any

martongUnsubmitted

Done

/// store the root value instead of cast symbol, because we can always get

- /// a correct range using `castTo` metho. And we are not intrested in any

+ /// a correct range using the `castTo` method. And we are not interested in any

/// constraints of cast symbol but the root symbol in `if` expression

martong:

/// constraints of cast symbol but the root symbol in `if` expression

/// or any bifurcation. We can return:

/// - a new symbol based on the root, in case of a truncation,

/// - a root symbol if it is not a truncation.

martongUnsubmitted

Done

I think, we should definitely store the constraints as they appear in the analyzed code. This way we mix the infer logic into the constraint setting, which is bad.
I mean, we should simply store the constraint directly for the symbol as it is. And then only in VisitSymbolCast should we infer the proper value from the stored constraint (if we can).

(Of course, if we have related symbols (casts of the original symbol) then their constraints must be updated.)

martong: I think, we should definitely store the constraints as they appear in the analyzed code. This…

ASDenysPetrovAuthorUnsubmitted

Done

I see what you mean. I thought about this. Here what I've faced with.

Let's say you meet (wchar_t)x > 0, which you store like a pair {(wchar_t)x, [1,32767]}.
Then you meet (short)x < 0, where you have to answer whether it's true or false.
So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient.

What I propose is to "canonize" arbitrary types to a general form where this form could be a part of key along with x and we could get a constraint with a classic map complexity. So that:

You meet (wchar_t)x > 0, which you convert wchar_t to int16 and store like a pair {(int16)x, [1,32767]}.
Then you meet (short)x < 0, where you convert short to int16 and get a constraint.

That's why I've introduced NominalTypeList.
But now I admited your concern about arbitrary size of integers and will redesign my solution.

ASDenysPetrov: I see what you mean. I thought about this. Here what I've faced with. # Let's say you meet `…

martongUnsubmitted

Done

So would be your next step? Brute forcing all the constraint pairs to find any x-related symbol? Obviously, O(n) complexity for every single integral symbol is inefficient.

I don't think we need a brute force search among the constraints if we have the additional CastMap (that I have previously proposed).
So, the next step would be: Lookup the root symbol of (short)x that is (wchar_t)x (supposing we have seen a declaration wchar_t x; first during the symbolic execution, but having a different root might work out as well).
Then from the CastMap we can query in O(1) the set of the related cast symbols of the root symbol. From this set it takes to query the constraints for each of the members from the existing equivalneClass->constraint mapping. That's O(1) times the number of the members of the cast set (which is assumed to be very few in the usual case).

martong: > So would be your next step? Brute forcing all the constraint pairs to find any x-related…

///

/// \param Sym -- a given symbol.

/// \returns a corrected symbol based on a given one. Symbol is null if the

/// given symbol is unsupported. We support only integral casts.

SymbolRef ConstraintAssignor::getProperSymbol(SymbolRef Sym) {

// We don't need to do any extra work for non-SymbolCast's.

if (!isa<SymbolCast>(Sym))

return Sym;

// Extract a root symbol and compare it to outer types.

ASTContext &C = Builder.getContext();

SymbolRef RootSym = Sym;

// Get the root symbol.

uint32_t MinBitWidth = UINT32_MAX;

do {

// We only handle integral cast, when all the types are integrals.

// Return `None` in this particular case to notify user that we can not

// handle non-integral SymbolCast.

QualType T = RootSym->getType();

if (!T->isIntegralOrEnumerationType())

return nullptr;

MinBitWidth = std::min(MinBitWidth, C.getIntWidth(T));

RootSym = cast<SymbolCast>(RootSym)->getOperand();

} while (isa<SymbolCast>(RootSym));

// Check for trunation.

QualType RootTy = RootSym->getType();

uint32_t RootBitWidth = C.getIntWidth(RootTy);

const bool IsTruncated = (MinBitWidth < RootBitWidth);

if (IsTruncated) {

// Trancation occurred. High bits lost. We can't reason about ranges of

martongUnsubmitted

Done

if (IsTruncated) {

- // Trancation occurred. High bits lost. We can't reason about ranges of

+ // Truncation occurred. High bits lost. We can't reason about ranges of

// the original(root) operand in this case, so we should not add it to the

martong:

// the original(root) operand in this case, so we should not add it to the

// constraint map. Canonize Sym instead.

// We produce a new symbol using a NTL type equals to the smallest type of

// Sym. For instance:

// - (int)(uchar)x -> (char8)x

// - (long)(ushort)(short)x -> (char16)x

// Produce a new SymbolCast.

CanQualType Ty = *NTL.findByWidth(MinBitWidth);

RootSym = State->getSymbolManager().getCastSymbol(RootSym, RootTy, Ty);

}

return RootSym;

}

/// Update exsiting constraints for all truncated SymbolCasts based on the

/// given symbol which types are less than the current one.

/// For instance, for Sym:

/// - {int8 x} update nothing;

/// - {int16 x} update (int8)x;

/// - {int32 x} update (int8)x, (int16)x;

/// - {int64 x} update (int8)x, (int16)x, (int32)x.

///

/// FIXME: Update bigger casts. We only can reason about ranges of smaller

/// types, because it would be too complicated to update, say, the entire

/// `int` range if you only have knowledge that its lowest byte has been

/// changed. So we don't touch bigger casts and they may be potentially

/// invalid. For future, for:

martongUnsubmitted

Done

Instead of a noop we should be more conservative in this case. We should invalidate (remove) the constraints of all the symbols that have more bits than the currently set symbol. However, we might be clever in cases of special values (e.g 0 or in case of the true rangeSet {[MIN, -1], [1, MAX]}).

martong: Instead of a noop we should be more conservative in this case. We should invalidate (remove)…

ASDenysPetrovAuthorUnsubmitted

Done

No, it's incorrect. Consider next:

int x;
if(x > 1000000 || x < 100000) 
  return;
// x (100'000, 1000'000) 
if((int8)x != 42) 
  return;
// x (100'000, 1000'000) && (int8)x (42, 42)

We can't just remove or invalidate x (100'000, 1000'000) because this range will still stay true.
Strictly speaking x range should be updated with values 100394, 102442, 108586, ...,, 960554 and any other value within the range which has its lowest byte equals to 42.
We can't just update the RangeSet with such a big amount of values due to performance issues. So we just assume it as less accurate.

ASDenysPetrov: No, it's incorrect. Consider next: ``` int x; if(x > 1000000 || x < 100000) return; // x…

martongUnsubmitted

Done

Okay, this makes perfect sense, thanks for the example!

martong: Okay, this makes perfect sense, thanks for the example!

/// - {int8 x} update (int16)x, (int32)x, (int64)x;

/// - {int16 x} update (int32)x, (int64)x;

/// - {int32 x} update (int64)x;

/// - {int64 x} update nothing.

///

/// \param State -- current program state.

/// \param Sym -- a considered symbol.

/// \param R -- a known range for the given symbol.

/// \note: needs check of null state after use.

void ConstraintAssignor::updateExistingConstraints(SymbolRef Sym, RangeSet R) {

unsigned SymBitWidth = Builder.getContext().getIntWidth(Sym->getType());

// Get a root symbol in case of SymbolCast.

Sym = Sym->ignoreCasts();

QualType SymTy = Sym->getType();

auto SmallerNTLTypes =

llvm::make_range(NTL.begin(), NTL.findByWidth(SymBitWidth));

SymbolManager &SM = State->getSymbolManager();

for (const QualType T : SmallerNTLTypes) {

// Use NTL typr to create canonical SymbolCast to find an existing

// constraint.

SymbolRef S = SM.getCastSymbol(Sym, SymTy, T);

// If such constraint is found, update it by intersecting.

if (const RangeSet *RS = getConstraint(State, S)) {

RangeSet TruncR = RangeFactory.castTo(R, T);

TruncR = RangeFactory.intersect(*RS, TruncR);

// If intersection is empty, then the branch is infisible.

if (TruncR.isEmpty()) {

State = nullptr;

break;

}

// Update the constraint.

State = setConstraint(State, S, TruncR);

}

bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym,

const llvm::APSInt &Constraint) { const llvm::APSInt &Constraint) {

llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses; llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;

// Iterate over all equivalence classes and try to simplify them. // Iterate over all equivalence classes and try to simplify them.

ClassMembersTy Members = State->get<ClassMembers>(); ClassMembersTy Members = State->get<ClassMembers>();

for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) { for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {

EquivalenceClass Class = ClassToSymbolSet.first; EquivalenceClass Class = ClassToSymbolSet.first;

▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines

LLVM_DUMP_METHOD void EquivalenceClass::dumpToStream(ProgramStateRef State, LLVM_DUMP_METHOD void EquivalenceClass::dumpToStream(ProgramStateRef State,

raw_ostream &os) const { raw_ostream &os) const {

SymbolSet ClassMembers = getClassMembers(State); SymbolSet ClassMembers = getClassMembers(State);

for (const SymbolRef &MemberSym : ClassMembers) { for (const SymbolRef &MemberSym : ClassMembers) {

MemberSym->dump(); MemberSym->dump();

os << "\n"; os << "\n";

} }

inline EquivalenceClass EquivalenceClass::find(ProgramStateRef State, inline EquivalenceClass EquivalenceClass::find(ProgramStateRef State,

SymbolRef Sym) { SymbolRef Sym) {

assert(State && "State should not be null"); assert(State && "State should not be null");

martongUnsubmitted

Done

Same here.

martong: Same here.

martongUnsubmitted

Done

Please disregard the comment above.

martong: Please disregard the comment above.

assert(Sym && "Symbol should not be null"); assert(Sym && "Symbol should not be null");

// We store far from all Symbol -> Class mappings // We store far from all Symbol -> Class mappings

if (const EquivalenceClass *NontrivialClass = State->get<ClassMap>(Sym)) if (const EquivalenceClass *NontrivialClass = State->get<ClassMap>(Sym))

return *NontrivialClass; return *NontrivialClass;

// This is a trivial class of Sym. // This is a trivial class of Sym.

return Sym; return Sym;

} }

▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines

// The syntax for ranges below is mathematical, using [x, y] for closed ranges // The syntax for ranges below is mathematical, using [x, y] for closed ranges

// and (x, y) for open ranges. These ranges are modular, corresponding with // and (x, y) for open ranges. These ranges are modular, corresponding with

// a common treatment of C integer overflow. This means that these methods // a common treatment of C integer overflow. This means that these methods

// do not have to worry about overflow; RangeSet::Intersect can handle such a // do not have to worry about overflow; RangeSet::Intersect can handle such a

// "wraparound" range. // "wraparound" range.

// As an example, the range [UINT_MAX-1, 3) contains five values: UINT_MAX-1, // As an example, the range [UINT_MAX-1, 3) contains five values: UINT_MAX-1,

// UINT_MAX, 0, 1, and 2. // UINT_MAX, 0, 1, and 2.

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymNE(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymNE(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within) if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)

return St; return St;

llvm::APSInt Point = AdjustmentType.convert(Int) - Adjustment; llvm::APSInt Point = AdjustmentType.convert(Int) - Adjustment;

RangeSet New = getRange(St, Sym); RangeSet New = getRange(St, Sym);

New = F.deletePoint(New, Point); New = F.deletePoint(New, Point);

return setRange(St, Sym, New); return setRange(St, Sym, New);

} }

vsavchenkoUnsubmitted

Done

No, please, remove duplication by putting it inside of the constraint assignor. It is designed specifically so we don't duplicate code around assumeSymXX functions.

vsavchenko: No, please, remove duplication by putting it inside of the constraint assignor. It is designed…

ASDenysPetrovAuthorUnsubmitted

Done

+1. That's what I've recently thought about. :)

ASDenysPetrov: +1. That's what I've recently thought about. :)

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymEQ(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within) if (AdjustmentType.testInRange(Int, true) != APSIntType::RTR_Within)

return nullptr; return nullptr;

// [Int-Adjustment, Int-Adjustment] // [Int-Adjustment, Int-Adjustment]

llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment; llvm::APSInt AdjInt = AdjustmentType.convert(Int) - Adjustment;

RangeSet New = getRange(St, Sym); RangeSet New = getRange(St, Sym);

New = F.intersect(New, AdjInt); New = F.intersect(New, AdjInt);

return setRange(St, Sym, New); return setRange(St, Sym, New);

} }

RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St, RangeSet RangeConstraintManager::getSymLTRange(ProgramStateRef St,

SymbolRef Sym, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

switch (AdjustmentType.testInRange(Int, true)) { switch (AdjustmentType.testInRange(Int, true)) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

return F.getEmptySet(); return F.getEmptySet();

case APSIntType::RTR_Within: case APSIntType::RTR_Within:

break; break;

case APSIntType::RTR_Above: case APSIntType::RTR_Above:

return getRange(St, Sym); return getRange(St, Sym);

} }

// Special case for Int == Min. This is always false. // Special case for Int == Min. This is always false.

llvm::APSInt ComparisonVal = AdjustmentType.convert(Int); llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);

llvm::APSInt Min = AdjustmentType.getMinValue(); llvm::APSInt Min = AdjustmentType.getMinValue();

if (ComparisonVal == Min) if (ComparisonVal == Min)

return F.getEmptySet(); return F.getEmptySet();

llvm::APSInt Lower = Min - Adjustment; llvm::APSInt Lower = Min - Adjustment;

llvm::APSInt Upper = ComparisonVal - Adjustment; llvm::APSInt Upper = ComparisonVal - Adjustment;

--Upper; --Upper;

vsavchenkoUnsubmitted

Done

What's the point of this when you do reverse operation in Inferrer?

As far as I understood, in VisitSymbolCast, you iterate over larger types and see if the same symbol was casted to any of those, and if yes you truncate the result and use that range.
Here, when we are about to set the constraint for a casted symbol, you iterate over smaller types, truncate this range for a smaller type, construct a cast to that smaller type, and add constraint for that symbol as well.

So, if this is correct, these two pieces of code DO THE SAME WORK and ONLY ONE should remain.

vsavchenko: What's the point of this when you do reverse operation in `Inferrer`? As far as I understood…

RangeSet Result = getRange(St, Sym); RangeSet Result = getRange(St, Sym);

return F.intersect(Result, Lower, Upper); return F.intersect(Result, Lower, Upper);

vsavchenkoUnsubmitted

Done

I need more explanation why we have this function and why we call it where we call it. Additionally, it again looks like it belongs in a separate patch.

vsavchenko: I need more explanation why we have this function and why we call it where we call it.

} }

ProgramStateRef ProgramStateRef

RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym, RangeConstraintManager::assumeSymLT(ProgramStateRef St, SymbolRef Sym,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

RangeSet New = getSymLTRange(St, Sym, Int, Adjustment); RangeSet New = getSymLTRange(St, Sym, Int, Adjustment);

return setRange(St, Sym, New); return setRange(St, Sym, New);

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines RangeSet RangeConstraintManager::getSymGERange(ProgramStateRef St,

const llvm::APSInt &Int, const llvm::APSInt &Int,

const llvm::APSInt &Adjustment) { const llvm::APSInt &Adjustment) {

// Before we do any real work, see if the value can even show up. // Before we do any real work, see if the value can even show up.

APSIntType AdjustmentType(Adjustment); APSIntType AdjustmentType(Adjustment);

switch (AdjustmentType.testInRange(Int, true)) { switch (AdjustmentType.testInRange(Int, true)) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

return getRange(St, Sym); return getRange(St, Sym);

case APSIntType::RTR_Within: case APSIntType::RTR_Within:

break; break;

case APSIntType::RTR_Above: case APSIntType::RTR_Above:

return F.getEmptySet(); return F.getEmptySet();

vsavchenkoUnsubmitted

Done

OK, but I still don't understand one thing.
Here you go over all "smaller" types and artificially create constraints for them, and at the same time in VisitSymbolCast you do the opposite operation? Why? Shouldn't the map have constraints for smaller types already because of this action? Why do we need to do both?

vsavchenko: OK, but I still don't understand one thing. Here you go over all "smaller" types and…

ASDenysPetrovAuthorUnsubmitted

Done

I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.

ASDenysPetrov: I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.

ASDenysPetrovAuthorUnsubmitted

Done

I've fixed RangeConstraintManager::updateExistingConstraints. There was a mistake when I update smaller types from the root symbol, but correct symbol is the given symbol which is before calling ignoreCast().
May be now it would be more clear for you.

ASDenysPetrov: I've fixed `RangeConstraintManager::updateExistingConstraints`. There was a mistake when I…

} }

// Special case for Int == Min. This is always feasible. // Special case for Int == Min. This is always feasible.

vsavchenkoUnsubmitted

Done

This looks like a pattern and we should probably make into a method of SymbolCast

vsavchenko: This looks like a pattern and we should probably make into a method of `SymbolCast`

ASDenysPetrovAuthorUnsubmitted

Done

I did it :) but refused. It will just turn into:

if (isa<SymbolCast>(Sym))
  Sym = cast<SymbolCast>(Sym)->getRootOperand();

It looks pretty the same and brings no benefit IMO, does it?
Every time I used getRootOperand I also needed some additional traverse through the types te get some another information, so I couldn't avoid the while loop there. So I decided not to introduce a new method in SymbolCast.

ASDenysPetrov: I did it :) but refused. It will just turn into: ``` if (isa<SymbolCast>(Sym)) Sym =…

vsavchenkoUnsubmitted

Done

Aha, I see your point. I guess we can take it into SymExpr and call it not getRootOperand, which won't tell much to a person reading the name, but something like ignoreCasts. It will fit well with Expr::IgnoreCasts, Expr::IgnoreParens, etc.

vsavchenko: Aha, I see your point. I guess we can take it into `SymExpr` and call it not `getRootOperand`…

ASDenysPetrovAuthorUnsubmitted

Done

Nice idea! True, getRootOperand would only tell enough to user in scope of SymbolCast. I'll try to implement this in the next update.

ASDenysPetrov: Nice idea! True, `getRootOperand` would only tell enough to user in scope of `SymbolCast`. I'll…

llvm::APSInt ComparisonVal = AdjustmentType.convert(Int); llvm::APSInt ComparisonVal = AdjustmentType.convert(Int);

llvm::APSInt Min = AdjustmentType.getMinValue(); llvm::APSInt Min = AdjustmentType.getMinValue();

if (ComparisonVal == Min) if (ComparisonVal == Min)

return getRange(St, Sym); return getRange(St, Sym);

llvm::APSInt Max = AdjustmentType.getMaxValue(); llvm::APSInt Max = AdjustmentType.getMaxValue();

llvm::APSInt Lower = ComparisonVal - Adjustment; llvm::APSInt Lower = ComparisonVal - Adjustment;

llvm::APSInt Upper = Max - Adjustment; llvm::APSInt Upper = Max - Adjustment;

▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp

Show First 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	case nonloc::ConcreteIntKind: {
default:		default:
return makeSymExprValNN(op, InputLHS, InputRHS, resultTy);		return makeSymExprValNN(op, InputLHS, InputRHS, resultTy);
}		}
}		}
case nonloc::SymbolValKind: {		case nonloc::SymbolValKind: {
// We only handle LHS as simple symbols or SymIntExprs.		// We only handle LHS as simple symbols or SymIntExprs.
SymbolRef Sym = lhs.castAs<nonloc::SymbolVal>().getSymbol();		SymbolRef Sym = lhs.castAs<nonloc::SymbolVal>().getSymbol();

		// Unwrap SymbolCast trying to find SymIntExpr inside.
		SymbolRef S = Sym->ignoreCasts();

// LHS is a symbolic expression.		// LHS is a symbolic expression.
if (const SymIntExpr *symIntExpr = dyn_cast<SymIntExpr>(Sym)) {		if (const SymIntExpr *symIntExpr = dyn_cast<SymIntExpr>(S)) {

// Is this a logical not? (!x is represented as x == 0.)		// Is this a logical not? (!x is represented as x == 0.)
if (op == BO_EQ && rhs.isZeroConstant()) {		if (op == BO_EQ && rhs.isZeroConstant()) {
// We know how to negate certain expressions. Simplify them here.		// We know how to negate certain expressions. Simplify them here.

BinaryOperator::Opcode opc = symIntExpr->getOpcode();		BinaryOperator::Opcode opc = symIntExpr->getOpcode();
switch (opc) {		switch (opc) {
default:		default:
▲ Show 20 Lines • Show All 687 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/SymbolManager.cpp

Show First 20 Lines • Show All 537 Lines • ▼ Show 20 Lines	if (Store store = reapedStore.getStore()) {
return hasRegion;		return hasRegion;
}		}

return false;		return false;
}		}

return VarContext->isParentOf(CurrentContext);		return VarContext->isParentOf(CurrentContext);
}		}

		SymbolRef SymExpr::ignoreCasts() const {
		SymbolRef Sym = this;
		while (isa<SymbolCast>(Sym))
		Sym = cast<SymbolCast>(Sym)->getOperand();
		return Sym;
		}

clang/test/Analysis/symbol-integral-cast.cpp

This file was added.

// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection -analyzer-config eagerly-assume=false -analyzer-config support-symbolic-integer-casts=true -verify %s

template <typename T>

void clang_analyzer_eval(T);

void clang_analyzer_warnIfReached();

typedef short int16_t;

typedef int int32_t;

typedef unsigned short uint16_t;

typedef unsigned int uint32_t;

void test1(int x) {

// Even if two lower bytes of `x` equal to zero, it doesn't mean that

// the entire `x` is zero. We are not able to know the exact value of x.

// It can be one of 65536 possible values like [0, 65536, 131072, ...]

// and so on. To avoid huge range sets we still assume `x` in the range

// [INT_MIN, INT_MAX].

if (!(short)x) {

if (!x)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test2(int x) {

// If two lower bytes of `x` equal to zero, and we know x to be 65537,

// which is not truncated to short as zero. Thus the branch is infisible.

short s = x;

if (!s) {

if (x == 65537)

clang_analyzer_warnIfReached(); // no-warning

martongUnsubmitted

Done

if (!s) {

- if (x == 65537)

+ if (x == 65537 || x == 131073)

clang_analyzer_warnIfReached(); // no-warning

martong:

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

martongUnsubmitted

Not Done

These two tests are redundant because they are handled by the Parent patch I've just created. https://reviews.llvm.org/D126481

martong: These two tests are redundant because they are handled by the Parent patch I've just created.

void test3(int x, short s) {

s = x;

if ((short)x > -10 && s < 10) {

if (x > 0 && x < 10) {

// If the range of the whole variable was constrained then reason again

// about truncated bytes to make the ranges more precise.

clang_analyzer_eval((short)x <= 0); // expected-warning {{FALSE}}

}

void test4(unsigned x) {

if ((char)x > 8) {

// Constraint the range of the lowest byte of `x` to [9, CHAR_MAX].

// The original range of `x` still remains [0, UINT_MAX].

clang_analyzer_eval((char)x < 42); // expected-warning {{UNKNOWN}}

if (x < 42) {

// Constraint the original range to [0, 42] and update (re-constraint)

// the range of the lowest byte of 'x' to [9, 42].

clang_analyzer_eval((char)x < 42); // expected-warning {{TRUE}}

}

void test5(unsigned x) {

if ((char)x > -10 && (char)x < 10) {

if ((short)x == 8) {

// If the range of higher bytes(short) was constrained then reason again

// about smaller truncated ranges(char) to make it more precise.

clang_analyzer_eval((char)x == 8); // expected-warning {{TRUE}}

clang_analyzer_eval((short)x == 8); // expected-warning {{TRUE}}

// We still assume full version of `x` in the range [INT_MIN, INT_MAX].

clang_analyzer_eval(x == 8); // expected-warning {{UNKNOWN}}

}

void test6(int x) {

// Even if two lower bytes of `x` less than zero, it doesn't mean that `x`

// can't be greater than zero. Thence we don't change the native range of

// `x` and this branch is feasible.

if (x > 0)

if ((short)x < 0)

clang_analyzer_eval(x > 0); // expected-warning {{TRUE}}

}

void test7(int x) {

// The range of two lower bytes of `x` [1, SHORT_MAX] is enough to cover

// all possible values of char [CHAR_MIN, CHAR_MAX]. So the lowest byte

// can be lower than zero.

if ((short)x > 0) {

if ((char)x < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test8(int x) {

// Promotion from `signed int` to `signed long long` also reasoning about the

// original range, because we know the fact that even after promotion it

// remains in the range [INT_MIN, INT_MAX].

if ((long long)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{TRUE}}

}

void test9(signed int x) {

// Any cast `signed` to `unsigned` produces an unsigned range, which is

// [0, UNSIGNED_MAX] and can not be lower than zero.

if ((unsigned long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned int)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned short)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((unsigned char)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test10(unsigned int x, signed char sc) {

// Promotion from `unsigned` to `signed` produces a signed range,

// which is able to cover all the values of the original,

// so that such cast is not lower than zero.

if ((signed long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

// Any other cast(conversion or truncation) from `unsigned` to `signed`

// produces a signed range, which is [SIGNED_MIN, SIGNED_MAX]

// and can be lower than zero.

if ((signed int)x < 0) // explicit cast

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

signed short ss = x; // initialization

if (ss < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

sc = x; // assignment

if (sc < 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test11(unsigned int x) {

// Promotion from 'unsigned' to 'signed' entirely covers the original range.

// Thence such cast is not lower than zero and the `true` branch is

// infiseable. But it doesn't affect the original range, which still remains

// as [0, UNSIGNED_MAX].

if ((signed long long)x < 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

// Any other cast(conversion or truncation) from `unsigned` to `signed`

// produces a signed range, which is [SIGNED_MIN, SIGNED_MAX]. But it doesn't

// affect the original range, which still remains as [0, UNSIGNED_MAX].

if ((signed int)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

if ((signed short)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

if ((signed char)x < 0)

clang_analyzer_eval(x < 0); // expected-warning {{FALSE}}

}

void test12(int x, char c) {

if (x >= 5308) {

if (x <= 5419) {

// Truncation on assignment: int[5308, 5419] -> char[-68, 43]

c = x;

clang_analyzer_eval(-68 <= c && c <= 43); // expected-warning {{TRUE}}

if (c < 50)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

// Truncation on initializaion: int[5308, 5419] -> char[-68, 43]

char c1 = x;

clang_analyzer_eval(-68 <= c1 && c1 <= 43); // expected-warning {{TRUE}}

}

void test13(int x) {

if (x > 913440767 && x < 913440769) { // 0x36720000

if ((short)x) // Truncation: int[913440768] -> short[0]

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

if ((short)x != 0)

clang_analyzer_warnIfReached(); // no-warning

else

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

}

void test14(int x) {

if (x >= -1569193983 && x <= 578290016) {

// The big range of `x` covers all possible values of short.

// Truncation: int[-1569193983, 578290016] -> short[-32768, 32767]

if ((short)x > 0) {

clang_analyzer_eval(-1569193983 <= x && x <= 578290016); // expected-warning {{TRUE}}

short s = x;

clang_analyzer_eval(-32768 <= s && s <= 32767); // expected-warning {{TRUE}}

}

void test15(int x) {

if (x >= -1569193983 && x <= -1569193871) { // [0xA2780001, 0xA2780071]

// The small range of `x` covers only several values of short.

// Truncation: int[-1569193983, -1569193871] -> short[1, 113]

if ((short)x)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

if ((short)x > 0)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

if ((short)x < 114)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

}

void test16(char x) {

if (x < 0)

clang_analyzer_eval(-128 <= x && x < 0); // expected-warning {{TRUE}}

else

clang_analyzer_eval(0 <= x && x <= 127); // expected-warning {{TRUE}}

}

void test17(char x) {

if (-11 <= x && x <= -10) {

unsigned u = x;

// Conversion: char[-11, -10] -> unsigned int[4294967285, 4294967286]

clang_analyzer_eval(4294967285 <= u && u <= 4294967286); // expected-warning {{TRUE}}

unsigned short us = x;

// Conversion: char[-11, -10] -> unsigned short[65525, 65526]

clang_analyzer_eval(65525 <= us && us <= 65526); // expected-warning {{TRUE}}

unsigned char uc = x;

// Conversion: char[-11, -10] -> unsigned char[245, 246]

clang_analyzer_eval(245 <= uc && uc <= 246); // expected-warning {{TRUE}}

}

void test18(char c, short s, int i) {

// Any char value always is less then 1000.

int OneThousand = 1000;

c = i;

if (c < OneThousand)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

// Any short value always is greater then 40000.

int MinusFourtyThousands = -40000;

s = i;

if (s > MinusFourtyThousands)

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

else

clang_analyzer_warnIfReached(); // no-warning

}

void test19(char x, short y) {

if (-43 <= x && x <= -42) { // x[-42, -43]

y = 42;

clang_analyzer_eval(int16_t(x) < int16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < int32_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < int16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < int32_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < uint16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int16_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(int32_t(x) < uint16_t(y)); // expected-warning {{TRUE}}

clang_analyzer_eval(int32_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < int16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < int32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < int16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < int32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < uint16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint16_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < uint16_t(y)); // expected-warning {{FALSE}}

clang_analyzer_eval(uint32_t(x) < uint32_t(y)); // expected-warning {{FALSE}}

}

void test20(char x, short y) {

if (42 <= y && y <= 43) { // y[42, 43]

x = -42;