This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Extend constraint manager to be able to compare simple SymSymExprs
Needs ReviewPublic

Authored by baloghadamsoftware on Apr 9 2020, 5:03 AM.

Details

Summary

This patch extends the constraint manager to be able to reason about trivial sym-sym comparisons.

We all know that a < b if the maximum value of a is still less than the minimum value of b.
This reasoning can be applied for the <,<=,>,>= relation operators.
This patch solely implements this functionality.

This patch does not address transitity like:
If a < b and b < c than a < c.

This patch is necessary to be able to express hidden function preconditions.
For example, with the D69726 we could express the connection between the
extent size of src and the size (n) parameter of the function.

#define ANALYZER_ASSERT assert

void my_memcpy(char *dst, char *src, int n) {
  ANALYZER_ASSERT(clang_analyzer_getExtent(src) >= n)
  ANALYZER_ASSERT(clang_analyzer_getExtent(dst) >= n)
  
  for (int i = 0; i < n; ++i) {
    // The extent of dst would be compared to the index i.
    dst[i] = src[i]; // each memory access in-bound, no-warning
  }
}

Diff Detail

Event Timeline

steakhal created this revision.Apr 9 2020, 5:03 AM

You seem to have forgotten -U9999 :^)

I like it, nice work!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
774

Is it possible to ever return with None? Other assume functions usually just return with nullptr on infeasible state as you do. What would be the meaning if None is returned, is that another kind of infeasibility?

797

Perhaps this hunk could be greatly simplified if CompareResult was actually BinaryOperator::Opcode.

Maybe (?):

if (Res == Op)
  return State;
return InfeasibleState;
clang/test/Analysis/constraint-manager-sym-sym.c
71

How is it different than // [0,5] and [5,10]?

173

I think we could benefit from tests for cases like this:

{[0,1],[5,6]} < {[9,9],[11,42],[44,44]}
steakhal updated this revision to Diff 256420.Apr 9 2020, 3:04 PM
steakhal marked 4 inline comments as done.
  • add full diff context
  • NFC refactored RangeSet comparison function
  • add tests for compund RangeSets like: {[0,1],[5,6]} < {[9,9],[11,42],[44,44]}
  • NFC clang-format test file

You seem to have forgotten -U9999 :^)

Nice catch!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
774

The rest of the functions there is no maybe answer. There is always yes or no, returning the State or the nullptr.
In our case, we don't know in advance that we know a definitive answer.

Imagine the following:
When the analyzer sees the a < b comparison.
It queries whether it canReasonAbout(). In our case that would return true due to my change.
When tries to reason about the given SymSymExpr (a < b), we don't know if the corresponding value ranges of a and b are disjunct or not, we need to check that.
If we can prove that the ranges are disjunct (or just connected: a: [a1,x] and b: [x,b2]) then we know an answer. That can be true (State) or false (nullptr). Otherwise the result should be UNKNOWN (llvm::None).

797

Honestly, I had exactly the same thoughts.

I think an Opcode is a different thing than a result of a comparison.
Here, we have a partial ordering among RangeSets.
But you can convince me :)

But I agree, it looks clunky, looking forward to a better solution. Any idea?

clang/test/Analysis/constraint-manager-sym-sym.c
71

It covers the same. I just wanted a full and clear showcase of the possible intervals.
I can be convinced to remove this testcase.

173

Really good idea.

I added tests for these. But I'm not really sure what's going on there :D
I'm sure that these test cases are not testing what I meant to test.
(The exploded graphs are showing that each subrange is assumed for a given value (l and r) on a given path)

Any idea of how to express the proper value ranges?

steakhal updated this revision to Diff 256551.Apr 10 2020, 5:33 AM
  • rewritten the RangeSet::compare function and checked the assumptions on WolframAlpha
  • moved the RangeSet::compare function to a cpp file
  • added comments to the RangeSet::compare function
  • fixed the comment in RangeConstraintManager::canReasonAbout function
  • introduced the RangeSet::CompareResult::identical enum value to be complete
  • updated the RangeConstraintManager::tryAssumeSymSymOp accoding the identical CompareResult.
  • omited testing the [2,5] < [5,10] testcase, since that is covered by [0,5] < [5,10]
steakhal marked an inline comment as done.Apr 10 2020, 5:34 AM

@steakhal
What about unsigneds? Does it work for unsigned values as well?

NoQ added inline comments.May 5 2020, 5:42 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2293

I believe you don't need to return an optional here. The method simply acknowledges any assumptions it could make in the existing state and returns the updated state. Therefore, if it wasn't able to record any assumptions, it returns the existing state. Because the only reasonable behavior the caller could implement when the current implementation returns None is to roll back to the existing state anyway.

clang/test/Analysis/constraint-manager-sym-sym.c
183

You can also explicitly create a single path with disconnected ranges (as opposed to like 3 different paths on each of which the range is a single segment) like this:

assert(r >= 9 && r <= 44 && r != 10 && r != 43);
ASDenysPetrov added inline comments.May 5 2020, 7:28 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
110–117

Can we use Optional<BinaryOperatorKind> instead, to reduce similar enums? Or you want to separate the meaning in a such way?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
31–34

As you use here getMaxValue twice which is not O(1), it'd better to use a buffer variable.

199–206

Can we improve llvm::ImmutableSet this to make getMaxValue complexity O(1)?

clang/test/Analysis/constraint-manager-sym-sym.c
182

Could you add some tests for unsigned, unsigned and signed, unsigned?

steakhal abandoned this revision.Aug 18 2020, 12:51 AM

I no longer think that we should support Symbol to Symbol comparisons.
It would introduce certain anomalies, like Why could the CM reason about this and that comparisons wile could not in others.

As of now, it's clear that we can not compare Symbols to Symbols - preventing such confusing questions to arise in the future.
If we were to implement Symbol to Symbol comparisons, we should cover a lot more cases than this patch could.

So I decided to abandon this patch.

baloghadamsoftware commandeered this revision.Sep 18 2020, 5:03 AM
baloghadamsoftware edited reviewers, added: steakhal; removed: baloghadamsoftware.

We found use cases which can be solved using this improvement. That is why I commandeer this patch now.

baloghadamsoftware updated this revision to Diff 292755.

Crash fixed and new tests added.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2293

Actually, tryAssumeSymSymOp() does not assume anything and therefore it does not return a new state. What it actually returns is a ternary logical value: the assumption is either valid, invalid or we cannot reason about it. Maybe Optional<bool> would be better here and we should also chose a less misleading name, because it does not "try to assume" anything, but tries to reason about the existing assumption.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
110–117

Good question. @NoQ?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
31–34

getMaxValue() for the current and other.getMaxValue() are different.

199–206

llvm::ImmutableSet seems to me a heap-like structure, a tree optimized for minimum-search: the mininum is in the root of the tree. Maximum is linear.

clang/test/Analysis/constraint-manager-sym-sym.c
182

I will do it.

Tests updated: some execution paths merged by refactoring assertions.

baloghadamsoftware marked an inline comment as done.Sep 21 2020, 2:42 AM

Adding Valeriy as a reviewer. He's been working with the solver recently, so his insights might be really useful here.

Hi @baloghadamsoftware

Great job! I also wanted to support more operations for range-based logic.

However, I don't think that this is the right place to make this kind of assumptions. A couple months ago, I added the SymbolicRangeInferrer component to aggregate all of the reasoning we have about range constraints and I strongly believe that the logic from this patch should be integrated in that component.

NoQ added inline comments.Sep 21 2020, 5:50 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
110–117

The meaning is obviously completely different even when the names actually match. BO_LT is not a "result", it's the operation itself. It gets even worse for other opcodes (what kind of comparison result is BO_PtrMemI???).

The idea to re-use the enum is noble but we definitely need to find some other enum.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2293

I dislike this design because it turns the code into a spaghetti in which every facility in the constraint manager has to be aware of any other facility in the constraint manager and agree on who's responsible for what. It's too easy to miss something and missing something isn't as bad as doing double work.

Like, should different facilities in the constraint manager try to record as *much* information in the program state ("fat constraints") as possible or as *little* as possible ("thin constraints")? Eg., in this example, if we know that the range for $x is strictly below the range for $y, should we also add the opaque constraint $x < $y to the program state, or should we instead teach every facility to infer that $x < $y by looking at their ranges?

I feel we should go with fat constraints.

  1. Con: They're fat. They eat more memory, cause more immutable tree rebalances, etc.
  2. Con: It's easy to accidentally make two similar states look different. Eg., { $x: [1, 2], $y: [3, 4] } and { $x : [1, 2], $y: [3, 4], $x < $y: true } are the same state but they won't be deduplicated and if they appear on different paths at the same program point these paths won't get merged.
  3. Pro: They minimize the amount of false positives by giving every facility in the constraint manager as much data as possible to conclude that the state is infeasible.
  4. Pro: As i said above, they're easier to implement and to get right. In case of thin constraints, every facility has to actively let other facilites know that they don't need to handle the assumption anymore because this facility has "consumed" it. In our example it means returning an Optional<ProgramStateRef> which would be empty if the constraint wasn't consumed and needs to be handled by another facility (namely, assumeSymUnsupported). In case of fat constraints you simply add your information and you don't care if other facilities ever read it or not.

Con 1. should be dismissed as a premature optimization: we can always add some thinness to fat constraints if we have a proof that their fatness is an actual problem. Con 2. is a real issue but it's a fairly minor issue; path merges are fairly rare anyway; mergeability is a nice goal to pursue but not at the cost of having false positives. So i think pros outweight the cons here.