Page MenuHomePhabricator

[analyzer] Implement cast for ranges of symbolic integers.
Needs ReviewPublic

Authored by ASDenysPetrov on May 25 2021, 9:19 AM.

Details

Summary

Support integral cast for ranges of symbolic integers. Previously we only support integral cast for concrete integers.
Reason about the ranges of SymbolCast expressions. Apply truncations, promotions and conversions to get a correct range set using nested types of a SymbolCast.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=51036

NOTE: This patch may only be possible after approving previous patches. See parent revisions in the stack.

The approach:

NOTE: This description considers long long is of 8-byte size, int - 4-byte, short - 2-byte, char - 1-byte.

From this point we consider every integral symbol may have 0-3 auxiliary symbols. Auxiliary symbols are based on the same symbol. For instance (short) (reg_$0<long x>), (int) (reg_$0<long x>), (char) (reg_$0<long x>) are based on reg_$0<long x>. Let's call the symbol reg_$0<long x> as main and cast symbols as aux for convinience.

We introduce a list of 4 types so called nominal types (NTL - nominal type list.). The types are of different sizes from 1 to 4 bytes and ordered in a list. We choose these types(Char8Ty, Char16Ty, Char32Ty and LongLongTy) to be sure about their sizes. Aux symbol is a SymbolCast. We create aux symbol artificially using a main symbol and one of NTL types. Every aux symbol is used specificaly as a key-value for the constraint map. Aux symbol represents a truncated(trimmed) version of the main symbol type and is associated to specific bytes of the main symbol. For example, with (Char8Ty) (reg_$0<int x>) we can set or get the range of the lowest byte of the main symbol reg_$0<int x>. That means that for the main symbol of type char it can't be any aux symbol. And for the main symbol of type int it can be no more than two aux symbols: Char16Ty and Char8Ty.

Let's show graphically. Consider integral symbols as plain bytes:
[][][][][][][][] - main symbol(x) of long long
[][][][] - aux symbol of Char32Ty
[][] - aux symbol of Char16Ty
[] - aux symbol of Char8Ty
After (short)(x) == 0xABCD we have
[][][][][][][][] - main symbol of long long
[][][][] - aux symbol of Char32Ty
ABCD - aux symbol of Char16Ty
[] - aux symbol of Char8Ty
We can not reason about "higher" ranges of Char32Ty and the main one. But we can reason about Char8Ty:
[][][][][][][][] - main symbol of long long
[][][][] - aux symbol of Char32Ty
ABCD - aux symbol of Char16Ty
CD - aux symbol of Char8Ty
Then, if x == 0x12345678ABCDEF12. Then update all existing constraints:
12345678ABCDEF12 - main symbol of long long
ABCDEF12 - aux symbol of Char32Ty
XXXX - aux symbol of Char16Ty - result range is null while intersecting,
XX - aux symbol of Char8Ty - result range is null while intersecting.
The branch, where Char16Ty == ABCD, is infeasible. The branch, where Char8Ty == CD, is infeasible.

NOTE: We don't care about signedness. Signedness is a representation of data. Bytes are a data. We do care about data. Any time we can get any sign by casting data to corresponding type.

We implement VisitSymbolCast to get a valid range from constraint map using main and aux symbols. We get all related symbols from the map and intersect them. Note, that before intersection we shall cast them to needed type (see D103094 for details).

We implement modifySymbolAndConstraints as a wrapper for getProperSymbolAndConstraint and updateExistingConstraints. It invokes from RangeConstraintManager::assume... methods to do additional handling of integral symbols (see below).

We implement getProperSymbolAndConstraint to find(or create) a suitable symbol and adjust a given range to symbol's type. For instance, if we can reason about range of all the bytes of the symbol, then we just return the main symbol. And if we can reason only partially about range of a few bytes, then we produce an aux symbol. Then cast a given range to the return type.

We implement updateExistingConstraints to update constraints in the map for existing aux symbols. Every time we get a new range for some bytes of the main symbol, we shall to update all existing ranges for the symbols by cast'n'intersect, which types are smaller than the new range type. It may happen that the result range is null, then we can conclude that the considered branch is infeasible.

Let's consider an example:

int x;
if ((char)x == 2) {

  // 1. `VisitSymbolCast`.
  // Get a range for main `reg_$0<int x>` - [-2147483648, 2147483647].
  // Cast main range to `char` - [-2147483648, 2147483647] -> [-128, 127].
  // Now we get a valid range for further bifurcation - [-128, 127].
  // 2. `assumeSymEQ`.
  // Intersect retieved range according to the if-condition [-128, 127] x [2, 2].
  // Now we get a range to store in the map - [2, 2].
  // 3. `getProperSymbolAndConstraint`
  // Replace `((char) (reg_$0<int x>))` with `((Char8Ty) (reg_$0<int x>))`.
  // Store the pair in the constraint map.

  if ((short)x == 259) {

    // 1. `VisitSymbolCast`.
    // Get a range for main `reg_$0<int x>` - [-2147483648, 2147483647]
    // Cast main range to `short` - [-2147483648, 2147483647] -> [-32768, 32767].
    // Now we get a valid range for further bifurcation - [-32768, 32767].
    // 2. `assumeSymEQ`.
    // Intersect retieved range according to the if-condition [-32768, 32767] x [259, 259].
    // Now we get a range to store in the map - [259, 259].
    // 3. `getProperSymbolAndConstraint`
    // Replace `((short) (reg_$0<int x>))` with `((Char16Ty) (reg_$0<int x>))`.
    // 4. `updateExistingConstraints`
    // Update `((Char8Ty) (reg_$0<int x>))`.
    // Cast the range [259, 259] to `Char8Ty` - [3, 3].
    // Intersect result range [3, 3] x [2, 2] = null.
    // Now we get a null range which says that the branch is impossible(infeasible).
    // It's important to update smaller existing constraints.
    // Without attempting of intersecting with known ranges of smaller types
    // we'll never know about the branch infeasibility.
    // That may cause unnecessary bifurcation.
  }
}

See tests in symbol-integral-cast.cpp for more examples.

Diff Detail

Unit TestsFailed

TimeTest
2,670 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,470 msx64 debian > libarcher.critical::lock-nested.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock-nested.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock-nested.c
2,640 msx64 debian > libarcher.parallel::parallel-simple.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp -latomic && env OMP_TOOL_VERBOSE_INIT=stderr env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp.log 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c --check-prefixes CHECK,TSAN_ON
2,710 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
3,000 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (21 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Splitted this revision and moved SValBuilder related changes to separate patch D105340. Added detailed comments. Made NominalTypeList static and as a result removed the forwarding across the functions. Spread handleSymbolCast logic to three methods: modifySymbolAndConstraints, updateExistingConstraints, getProperSymbolAndConstraint.

ASDenysPetrov edited the summary of this revision. (Show Details)Tue, Jul 6, 3:48 AM

This is a very complicated patch, I think we'll have to iterate on it quite a lot.
Additionally, we have to be sure that this doesn't crash our performance.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1118

Comments on:

  • why do we need it?
  • why does it have four types?
  • why do we not care about signed/unsigned types?
3107–3109

OK, but I still don't understand one thing.
Here you go over all "smaller" types and artificially create constraints for them, and at the same time in VisitSymbolCast you do the opposite operation? Why? Shouldn't the map have constraints for smaller types already because of this action? Why do we need to do both?

3111–3112

This looks like a pattern and we should probably make into a method of SymbolCast

I found some issues. Working on improvement.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3107–3109

I've been preparing an answer for you, but suddenly you inspired me on some impovements. Thanks.

3111–3112

I did it :) but refused. It will just turn into:

if (isa<SymbolCast>(Sym))
  Sym = cast<SymbolCast>(Sym)->getRootOperand();

It looks pretty the same and brings no benefit IMO, does it?
Every time I used getRootOperand I also needed some additional traverse through the types te get some another information, so I couldn't avoid the while loop there. So I decided not to introduce a new method in SymbolCast.

vsavchenko added inline comments.Tue, Jul 6, 9:07 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3111–3112

Aha, I see your point. I guess we can take it into SymExpr and call it not getRootOperand, which won't tell much to a person reading the name, but something like ignoreCasts. It will fit well with Expr::IgnoreCasts, Expr::IgnoreParens, etc.

ASDenysPetrov added inline comments.Tue, Jul 6, 10:11 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3111–3112

Nice idea! True, getRootOperand would only tell enough to user in scope of SymbolCast. I'll try to implement this in the next update.

Added SymExpr::ignoreCasts method. Added descriptive comments.

Added more descriptive comments. Fixed RangeConstraintManager::updateExistingConstraints function.

Can you please explain why you do the same thing in two different ways?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3107–3109

I've fixed RangeConstraintManager::updateExistingConstraints. There was a mistake when I update smaller types from the root symbol, but correct symbol is the given symbol which is before calling ignoreCast().
May be now it would be more clear for you.

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:

void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

That's not the question I'm asking. Why do you need to set constraints for other symbolic expressions, when SymbolicInferrer can look them up on its own? Which cases will fail if we remove that part altogether?

I see. Here is what fails in case if we don't update other constraints:

void test(int x) {
  if ((char)x > -10 && (char)x < 10) {
    if ((short)x == 8) {
      // If you remove updateExistingConstraints,
      // then `c` won't be 8. It would be [-10, 10] instead.
      char c = x;
      if (c != 8)
        clang_analyzer_warnIfReached(); // should no-warning, but fail
    }
  }
}

OK, it's something! Good!
I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

Why do you use VisitSymExpr here? You want to interrupt all `Visits or... I'm not sure I fully understand.

1202

Can we get a test for that?

1203

Same goes here.

1228

Why do you get associated constraint directly without consulting with what SymbolRangeInferrer can tell you about it?

ASDenysPetrov added a comment.EditedFri, Jul 9, 7:55 AM

Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:

int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)

That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

Generally, with this patch we kinda have several constraints for each cast of a single symbol. And we shall care for all of that constraints and timely update them (if possible).
For instance, we have int x and meet casts of this symbol in code:

int x;
(char)x; // we can reason about the 1st byte
(short)x; // we can reason about the 2 lowest bytes
(ushort)x; // we can reason about the 2 lowest bytes (in this case we may not store for unsigned separately, as we already stored 2 bytes for signed)

That's like we have a knowledge of a lower part of the integer. And every time we have a new constraints, for example, for (short)x; (aka 2 bytes) then we have to update all the constraints that have two bytes or lower ((char)xin this case) as well to make them consistent.

What we do in Inferrer is that we try to look at many sources of information and intersect their ranges. And I repeat my question again in a bit different form, why can't it look up constraints for (char)x and for (short)x and intersect them?
You should admit you never really address this question. Why can't VisitSymolCast do everything?

@vsavchenko

I still want to hear a good explanation why is it done this way. Here c is mapped to (char)x, and we have [-10, 10] directly associated with it, but we also have (short)x associated with [8, 8]. Why can't VisitSymbolCast look up constraints for (short)x it already looks up for constraints for different casts already.

Hm, you've confused me. I'll make some debugging and report.

It should not be about debugging, it's your code! Why did you write it this way!?

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

ASDenysPetrov added inline comments.Fri, Jul 9, 10:50 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1228

What do you mean? I didn't get. Could you give an example?

@vsavchenko

Why did you write it this way!?

I want the map contains only valid constraints at any time, so we can easely get them without traversing with all variants intersecting with each other. I'm gonna move updateExistingConstraints logic to VisitSymbolCast. I think your suggestion can even improve the feature and cover some more cases. I'll add more tests in the next update. Thanks!

[-10, 10] is also valid, right? You can't keep things at their best all the time. And if you want all constraints directly in the map then what's all this logic in VisitSymbolCast? That's why I keep asking why do you need both parts of this solution and didn't get any answer so far.
I'm hands down for the incremental approach and adding small-to-medium size improvements on top of each other. That makes my life as a reviewer easier :) That's said, I don't want to commit to a big solution, where the author doesn't want to explain why there are two parts of the solution instead of one.

I want you to tell me why the code that's in VisitSymbolCast does what it does. And the same about updateExistingConstraints. Also I want to hear a solid reason why it's split this way and why we need both of them.

You should understand that I'm not peaking on you personally. The review process takes a lot of my time too. I want to make it easier for both of us. When the reviewer understand what you are going for, it is much easier for them to help you in refining your solution. This patch is very big, but the summary doesn't cover the main part: the approach. And you leave me here dragging it out of you.

ASDenysPetrov edited the summary of this revision. (Show Details)Mon, Jul 12, 9:47 AM
ASDenysPetrov added a comment.EditedMon, Jul 12, 9:50 AM

@vsavchenko I've updated the summary. I hope, I addressed your question. Thanks.

ASDenysPetrov edited the summary of this revision. (Show Details)Mon, Jul 12, 10:16 AM
ASDenysPetrov added inline comments.Tue, Jul 13, 5:52 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

Here we want to delegate the reasoning to another handler as we don't support non-integral cast yet.

1202

I'll add some.

ASDenysPetrov added inline comments.Tue, Jul 13, 6:23 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

Do you think the recursive call is better than the loop? But, I guess, I see your point. You option could be safer if we had another implementation of the virtual method. Or you think such alike cast symbol is possible in the future? Well, for now ignoreCasts doesn't make sense to any other Expr successors.

I'll allocate some time to get into your summary, but for now here are my concerns about SymbolRangeInferrer and VisitSymbolCast.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

You are not delegating it here. Visit includes a runtime dispatch that calls a correct VisitTYPE method. Here you call VisitSymExpr directly, which is one of the VisitTYPE methods. No dispatch will happen, and since we use VisitSymExpr as the last resort (it's the most base class, if we got there, we don't actually support the expression), you interrupt the Visit and go directly to "the last resort".

See the problem now?

1228

getConstraint returns whatever constraint we have stored directly in the constraint map. That's the main source of information for ranges, but not the only one.

Here is the of things that you skip, when you do getConstraint here:

  • we can understand that something is equality/disequality check and find the corresponding info in Equivalence Classes data structure
  • we can see that the expression has the form A - B and we can find constraint for B - A
  • we can see that the expression is comparison A op B and check what other comparison info we have on A and B (your own change)
  • we can see that the expression is of form A op B and check if we know something about A and B, and produce a reasonable constraint out of this information

In order to use the right information, you should use infer that will actually do all other things as well. That's how SymbolRangeInferrer is designed, to be recursive.

Speaking of recursiveness. All these loops and manually checking for types of the cast's operand is against this pattern. Recursive visitors should call Visit for children nodes (like RecursiveASTVisitor). In other words, if f(x) is a visit function, it should be defined like this:

f(x) = g(f(x->operand_1), f(x->operand_2), ... , f(x->operand_N))

or if we talk about your case specifically:

f(x: SymbolCast) = h(f(x->Operand))

and the h function should transform the range set returned by f(x->Operand) into a range set appropriate for x.

NOTE: h can also intersect different ranges
vsavchenko added inline comments.Tue, Jul 13, 6:58 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

Oh, wait, why is it even virtual? I don't think that it should be virtual.
Are similar functions in Expr virtual?
And I think that this implementation should live in SymExpr directly. Then it would look like:

if (const SymbolCast *ThisAsCast = dyn_cast<SymbolCast>(this)) {
  return ThisAsCast->ignoreCasts();
}
return this;
ASDenysPetrov added inline comments.Tue, Jul 13, 7:37 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

Yes, SymExpr is an abstract class. And because of limitations and dependency of inheritance we are not able to know the implementaion of SymbolCast. Unfortunately, this is not a CRTP.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

OK. I reject this idea before. If we call Visit inside VisitSymbolCast, we will go into recursive loop, because it will return us back to VisitSymbolCast as we have passed Sym as is. (This is theoretically, I didn't check in practice.) Or I'm missing smth?
I choosed VisitSymExpr here because all kinds of SymbolCast were previously handled here. So I decided to pass all unsupproted forms of casts there.

1228

Thank you for useful notes! I'll take them into account.

vsavchenko added inline comments.Tue, Jul 13, 8:01 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

Did I suggest to Visit(Sym)? Of course it is going to end up in a loop!
Why isn't it Visit(Sym->getOperand()) here? Before we started producing casts, casts were transparent. This logic would fit perfectly with that.

1203

And here, since we couldn't really reason about it, we usually return infer(T).

OK, thanks for putting a summary. I now got a good idea why you need both.
At the same time, take a look at D105692. I'm about to land it and I think it's going to be useful for you.

ASDenysPetrov added inline comments.Tue, Jul 13, 11:32 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

were transparent.

Not exactly. There are still some cases when symbols are not equal to there roots(aka Operands). Such cases were handled by VisitSymExpr which uses infer(Sym->getType()); instead of getOperand`. So this needs a sort of think twice. Also I see a problem with EquivalenceClass'es. Consider next:

int x, y;
if(x == y)
  if ((char)x == 2)
    if(y == 259)
      // Here we shall update `(char)x` and find this branch infeasible.

Also such cases like:

if(x == (short)y)
  // What we should do(store) with(in) `EquivalenceClass`es.

Currently, I have an obscure vision of the solution.

// 1. `VisitSymbolCast`.
// Get a range for main `reg_$0<int x>` - [-2147483648, 2147483647]
// Cast main range to `short` - [-2147483648, 2147483647] -> [-32768, 32767].
// Now we get a valid range for further bifurcation - [-32768, 32767].

That's a great example, thanks for putting it together. I can see your point now!

Please, rebase your change and make use of ConstraintAssignor, and rework VisitSymbolCast.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

Re-read my comment, please.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1190

There are still some cases when symbols are not equal to there roots(aka Operands)

Right now we don't have casts, this is what we do currently. However faulty it is, it is the existing solution and we should respect that.

Also I see a problem with EquivalenceClass'es.

Because of the current situation with casts (or more precisely with their lack), EquivalenceClasses do not get merged for symbols with different types. It is as simple as that.
You can find similar tests in equality_tracking.c.

ASDenysPetrov added inline comments.Wed, Jul 14, 10:26 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

Oh, wait, why is it even virtual?

ignoreCasts is a virtual function because I haven't found any other way to implement it better.

I don't think that it should be virtual.

Unfortunately, this is not a CRTP to avoid dynamic linking.

Are similar functions in Expr virtual?

SymExpr is an abstract class. I'm not sure about similarity but SymExpr has such virtual methods:

  • computeComplexity
  • getType
  • getOriginRegion

And I think that this implementation should live in SymExpr directly.

It's impossible due to SymExpr implementation design. SymExpr knows nothing about implementation details of SymbolCast to invoke ignoreCasts().

vsavchenko added inline comments.Wed, Jul 14, 10:38 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

a) Expr is also an abstract class
b) I put the implementation right there in the comment above. I don't see any reasons not to use it.
c) I don't buy it about "impossible" and "implementation design" because you can always declare function in one place and define it in the other.

ASDenysPetrov updated this revision to Diff 358685.EditedWed, Jul 14, 11:58 AM

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

I think I achieved of what you've been concerned.

Made ignoreCast non-virtual.
P.S. IMO, this change is not something that can be taken as a pattern, though.

It is already a pattern in other type hierarchies.
Virtual functions are only good, when they can have multiple implementations. ignoreCasts by its name can have only one implementation and couldn't be virtual. That's it! It is more useable now, and less confusing for its users. The fact that its definition lives in some other cpp file doesn't change it.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

This function should be removed then.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3002–3004

No, please, remove duplication by putting it inside of the constraint assignor. It is designed specifically so we don't duplicate code around assumeSymXX functions.

@vsavchenko

It is already a pattern in other type hierarchies.

I just rarely met them. And it was hard to me reading the code searching for implementation all over the places.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
293–296 ↗(On Diff #357463)

NP.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
3002–3004

+1. That's what I've recently thought about. :)

ASDenysPetrov edited the summary of this revision. (Show Details)

Improved ignoreCasts implementation. Adapted to ConstraintAssignor.

vsavchenko added inline comments.Thu, Jul 15, 9:15 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1917–1921

That's not using ConstraintAssignor, you simply put your implementation in here. That won't do!

ASDenysPetrov added inline comments.Thu, Jul 15, 9:31 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1917–1921

OK, please tell me how to use it correctly in my case.

vsavchenko added inline comments.Thu, Jul 15, 9:36 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1917–1921

Can you read the comments first and then ask me if you have any specific questions?

Adapted solution to ConstraintAssignor API. Added tests.

ASDenysPetrov added inline comments.Fri, Jul 16, 6:38 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1917–1921

I think I did it. Could you please review the changes?