This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
2/4
RangeConstraintManager.cpp
-
test/Analysis/
-
Analysis/
-
constant-folding.c

Differential D105436

[analyzer][solver] Use all sources of constraints
ClosedPublic

Authored by vsavchenko on Jul 5 2021, 9:37 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
martong
steakhal
Szelethus
ASDenysPetrov
manas
RedDocMD

Commits

rG6017cb31bb35: [analyzer][solver] Use all sources of constraints

Summary

Prior to this patch, we always gave priority to constraints that we
actually know about symbols in question. However, these can get
outdated and we can get better results if we look at all possible
sources of knowledge, including sub-expressions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vsavchenko created this revision.Jul 5 2021, 9:37 AM

Herald added subscribers: dkrupp, donat.nagy, mikhail.ramalho and 4 others. · View Herald TranscriptJul 5 2021, 9:37 AM

vsavchenko requested review of this revision.Jul 5 2021, 9:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2021, 9:37 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

I compared issues produced by this patch to the issues produced before that on all projects from clang/utils/analyzer/projects, and didn't find any difference.

Performance measurements also show the we are within the same margins.

Harbormaster completed remote builds in B112455: Diff 356518.Jul 5 2021, 10:10 AM

such passes
@
much LLVM

Performance measurements also show the we are within the same margins.

Great! I'd expect massive constraint solver improvements to actually make performance better because they cut infeasible paths. This one's probably not that massive but it's still amazing.

This revision is now accepted and ready to land.Jul 5 2021, 9:43 PM

Closed by commit rG6017cb31bb35: [analyzer][solver] Use all sources of constraints (authored by vsavchenko). · Explain WhyJul 6 2021, 1:09 AM

This revision was automatically updated to reflect the committed changes.

vsavchenko added a commit: rG6017cb31bb35: [analyzer][solver] Use all sources of constraints.

martong added inline comments.Jul 6 2021, 1:40 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
908	Alright. So, this is correct because `Visit` boils down finally to either `infer(Sym->getType)` or to `VisitBinaryOperator`. And both of them do a correct over-approximation of the ranges. Please confirm. First, I was a bit concerned b/c it is not immediate and not documented here. And it is easy to think by the first look that this might be faulty if we take the approximation of one operand of a binop that might not be true for the whole binop expression. Again, that is not the case because we approximate only in case of such ops where we can do a correct over-approximation (i.e. `\|`, `&` and `%`). My point is, I'd like to see more explanatory comments here.

vsavchenko added inline comments.Jul 6 2021, 2:46 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
908	I'm sorry, but I don't really understand your point here. Everything that this solver provides is conservative ranges, from whatever source it comes. If you intersect two conservative ranges, you get a conservative range. It doesn't matter what we do in `Visit` as long as it is correct. If `Visit` is incorrect then the previous version of this code that gave preference to some sources over the other ones was also incorrect.

martong added inline comments.Jul 6 2021, 3:03 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
908	Thanks for your reply. So, with other words, I didn't see why it is immediate that a range for a sub-expression is a good approximation for the whole expression. Maybe it's just me, but that's not obvious until one checks that what exactly happens in `Visit`.

vsavchenko added inline comments.Jul 6 2021, 3:10 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
908	Oh, I mean, it's not correct. Symbolic expressions are N-ary operators, and if we know constraints for at least some of these N operands, we can provide a conservative range for the whole symbol using some knowledge of the operator. It doesn't say anywhere that we use a range for a sub-expression as an approximation for the whole range. Actually I want to move some of these other sources inside of `Visit` as well because they trigger only to very specific kinds of symbolic expressions (e.g. binary minus, equality/disequality, comparisons).

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

42 lines

test/

Analysis/

constant-folding.c

30 lines

Diff 356628

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show First 20 Lines • Show All 878 Lines • ▼ Show 20 Lines	if (ActualType->isIntegralOrEnumerationType() \|\|
return infer(Sym);		return infer(Sym);
}		}
// Otherwise, let's simply infer from the destination type.		// Otherwise, let's simply infer from the destination type.
// We couldn't figure out nothing else about that expression.		// We couldn't figure out nothing else about that expression.
return infer(DestType);		return infer(DestType);
}		}

RangeSet infer(SymbolRef Sym) {		RangeSet infer(SymbolRef Sym) {
if (Optional<RangeSet> ConstraintBasedRange = intersect(		return intersect(
RangeFactory, getConstraint(State, Sym),		RangeFactory,
		// Of course, we should take the constraint directly associated with
		// this symbol into consideration.
		getConstraint(State, Sym),
// If Sym is a difference of symbols A - B, then maybe we have range		// If Sym is a difference of symbols A - B, then maybe we have range
// set stored for B - A.		// set stored for B - A.
//		//
// If we have range set stored for both A - B and B - A then		// If we have range set stored for both A - B and B - A then
// calculate the effective range set by intersecting the range set		// calculate the effective range set by intersecting the range set
// for A - B and the negated range set of B - A.		// for A - B and the negated range set of B - A.
getRangeForNegatedSub(Sym), getRangeForEqualities(Sym))) {		getRangeForNegatedSub(Sym),
return *ConstraintBasedRange;		// If Sym is (dis)equality, we might have some information on that
}		// in our equality classes data structure.
		getRangeForEqualities(Sym),
// If Sym is a comparison expression (except <=>),		// If Sym is a comparison expression (except <=>),
// find any other comparisons with the same operands.		// find any other comparisons with the same operands.
// See function description.		// See function description.
if (Optional<RangeSet> CmpRangeSet = getRangeForComparisonSymbol(Sym)) {		getRangeForComparisonSymbol(Sym),
return *CmpRangeSet;		// Apart from the Sym itself, we can infer quite a lot if we look
}		// into subexpressions of Sym.
		Visit(Sym));
		martongUnsubmitted Not Done Reply Inline Actions Alright. So, this is correct because `Visit` boils down finally to either `infer(Sym->getType)` or to `VisitBinaryOperator`. And both of them do a correct over-approximation of the ranges. Please confirm. First, I was a bit concerned b/c it is not immediate and not documented here. And it is easy to think by the first look that this might be faulty if we take the approximation of one operand of a binop that might not be true for the whole binop expression. Again, that is not the case because we approximate only in case of such ops where we can do a correct over-approximation (i.e. `\|`, `&` and `%`). My point is, I'd like to see more explanatory comments here. martong: Alright. So, this is correct because `Visit` boils down finally to either `infer(Sym->getType)`…
		vsavchenkoAuthorUnsubmitted Done Reply Inline Actions I'm sorry, but I don't really understand your point here. Everything that this solver provides is conservative ranges, from whatever source it comes. If you intersect two conservative ranges, you get a conservative range. It doesn't matter what we do in `Visit` as long as it is correct. If `Visit` is incorrect then the previous version of this code that gave preference to some sources over the other ones was also incorrect. vsavchenko: I'm sorry, but I don't really understand your point here. Everything that this solver provides…
		martongUnsubmitted Not Done Reply Inline Actions Thanks for your reply. So, with other words, I didn't see why it is immediate that a range for a sub-expression is a good approximation for the whole expression. Maybe it's just me, but that's not obvious until one checks that what exactly happens in `Visit`. martong: Thanks for your reply. So, with other words, I didn't see why it is immediate that a range for…
		vsavchenkoAuthorUnsubmitted Done Reply Inline Actions Oh, I mean, it's not correct. Symbolic expressions are N-ary operators, and if we know constraints for at least some of these N operands, we can provide a conservative range for the whole symbol using some knowledge of the operator. It doesn't say anywhere that we use a range for a sub-expression as an approximation for the whole range. Actually I want to move some of these other sources inside of `Visit` as well because they trigger only to very specific kinds of symbolic expressions (e.g. binary minus, equality/disequality, comparisons). vsavchenko: Oh, I mean, it's not correct. Symbolic expressions are N-ary operators, and if we know…
return Visit(Sym);
}		}

RangeSet infer(EquivalenceClass Class) {		RangeSet infer(EquivalenceClass Class) {
if (const RangeSet *AssociatedConstraint = getConstraint(State, Class))		if (const RangeSet *AssociatedConstraint = getConstraint(State, Class))
return *AssociatedConstraint;		return *AssociatedConstraint;

return infer(Class.getType());		return infer(Class.getType());
}		}
▲ Show 20 Lines • Show All 1,589 Lines • Show Last 20 Lines

clang/test/Analysis/constant-folding.c

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	if (a < 10) {
clang_analyzer_eval((a \| 20) >= 20); // expected-warning{{TRUE}}		clang_analyzer_eval((a \| 20) >= 20); // expected-warning{{TRUE}}
}		}

if (a > 10) {		if (a > 10) {
clang_analyzer_eval((a & 1) <= 1); // expected-warning{{TRUE}}		clang_analyzer_eval((a & 1) <= 1); // expected-warning{{TRUE}}
}		}
}		}

		unsigned reset();

		void testCombinedSources(unsigned a, unsigned b) {
		if (b >= 10 && (a \| b) <= 30) {
		// Check that we can merge constraints from (a \| b), a, and b.
		// Because of the order of assumptions, we already know that (a \| b) is [10, 30].
		clang_analyzer_eval((a \| b) >= 10 && (a \| b) <= 30); // expected-warning{{TRUE}}
		}

		a = reset();
		b = reset();

		if ((a \| b) <= 30 && b >= 10) {
		// Check that we can merge constraints from (a \| b), a, and b.
		// At this point, we know that (a \| b) is [0, 30], but the knowledge
		// of b >= 10 added later can help us to refine it and change it to [10, 30].
		clang_analyzer_eval(10 <= (a \| b) && (a \| b) <= 30); // expected-warning{{TRUE}}
		}

		a = reset();
		b = reset();

		unsigned c = (a \| b) & (a != b);
		if (c <= 40 && a == b) {
		// Even though we have a directo constraint for c [0, 40],
		// we can get a more precise range by looking at the expression itself.
		clang_analyzer_eval(c == 0); // expected-warning{{TRUE}}
		}
		}

void testRemainderRules(unsigned int a, unsigned int b, int c, int d) {		void testRemainderRules(unsigned int a, unsigned int b, int c, int d) {
// Check that we know that remainder of zero divided by any number is still 0.		// Check that we know that remainder of zero divided by any number is still 0.
clang_analyzer_eval((0 % c) == 0); // expected-warning{{TRUE}}		clang_analyzer_eval((0 % c) == 0); // expected-warning{{TRUE}}

clang_analyzer_eval((10 % a) <= 10); // expected-warning{{TRUE}}		clang_analyzer_eval((10 % a) <= 10); // expected-warning{{TRUE}}

if (a <= 30 && b <= 50) {		if (a <= 30 && b <= 50) {
clang_analyzer_eval((40 % a) < 30); // expected-warning{{TRUE}}		clang_analyzer_eval((40 % a) < 30); // expected-warning{{TRUE}}
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines