Download Raw Diff

Details

Reviewers

steakhal
NoQ

Commits

rG5f02ad880e42: [analyzer][solver] Improve reasoning for not equal to operator

Summary

This patch fixes certain cases where solver was not able to infer
disequality due to overlapping of values in rangeset. This case was
casting from lower signed type to bigger unsigned type.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

manas created this revision.Dec 14 2022, 6:57 PM

Herald added a reviewer: NoQ. · View Herald TranscriptDec 14 2022, 6:57 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ASDenysPetrov, martong, dkrupp and 7 others. · View Herald Transcript

manas requested review of this revision.Dec 14 2022, 6:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 6:57 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B203270: Diff 483067.Dec 14 2022, 7:47 PM

Rebase

Harbormaster completed remote builds in B203316: Diff 483135.Dec 15 2022, 5:37 AM

Thanks for going the extra mile to address this last thing. I really appreciate it.
I've got only a few minor comments and suggestions.

I'd recommend spell-checking the comments and the summary of this revision.
See my technical comments inline.

The test coverage looks good to me.

Good job.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1639	I think in this context `!=` should achieve the same, and we usually prefer this operator for this.
1639–1640	I was thinking that maybe `if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.isSigned() && RHS.isUnsigned())` results in cleaner code, as it would require one level fewer indentations. The control-flow looks complicated already enough.
1640	Why do we need this additional condition? If I remove these, I get no test failures, which suggests to me that we have some undertested code paths here.
1642–1647
1664–1667	I was thinking of using init-ifs, but on second thought I'm not sure if that's more readable. if (RHS.getAPSIntType().convert(LHS.getMaxValue()) < RHS.getMinValue()) return getTrueRange(T); Shouldn't be too bad.

In D140086#3998426, @steakhal wrote:

Thanks for going the extra mile to address this last thing. I really appreciate it.
I've got only a few minor comments and suggestions.

I'd recommend spell-checking the comments and the summary of this revision.
See my technical comments inline.

Thank you once again for reviewing @steakhal :)

I did the spell-checking.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1639–1640	I was thinking that maybe if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.isSigned() && RHS.isUnsigned()) results in cleaner code I did this, and I also combined both blocks into one to remove redundant code.
1640	Why do we need this additional condition? Bitwidth was important because we should ideally cast smaller bitwidth type to bigger bitwidth type. Consider if we have `LHS(u8), RHS(i32)`, then without checking for bitwidth, we would be casting RHS's maxValue to LHS's type, which will result in lose of information and will not serve our purpose.
1664–1667	I think its readable. But to combine two blocks into one, I had to use different variable names, which makes this expression bigger. Should we still go with init-ifs?

Remove redundant branches

Harbormaster completed remote builds in B203784: Diff 483786.Dec 17 2022, 4:51 PM

About spellings. In the summary you used 'lesser', I think as a synonym for 'smaller' or something like that. Anyway, not important.
Great stuff.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1640	If you think we need that bitwidth check, why did you remove it? I'd like to see test cases demonstrating what we are talking about and see if we want that behavior or not.
1640	Interesting. I like it. I'd however recommend to move this and the other variable to the beginning of this guarded block. That way it would be easier to see that the guard condition relates to this ternary condition.

manas added inline comments.Dec 18 2022, 3:35 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

1640

This test fails.

void testfoo(unsigned char u, signed int s) {
  if (u >= 253 && u <= 255 && s < INT_MAX - 2) {
    // u: [253, 254], s: [INT_MIN, INT_MAX - 2]
    clang_analyzer_eval(u != s); // expected-warning{{UNKNOWN}}
                                 // but returns TRUE
  }
}

Bitwidth was important because we should ideally cast smaller bitwidth type to bigger bitwidth type.
Consider if we have LHS(u8), RHS(i32), then without checking for bitwidth, we would be casting RHS's maxValue to LHS's type, which will result in lose of information and will not serve our purpose.

If you think we need that bitwidth check, why did you remove it?
I'd like to see test cases demonstrating what we are talking about and see if we want that behavior or not.

This test fails.
void testfoo(unsigned char u, signed int s) {
  if (u >= 253 && u <= 255 && s < INT_MAX - 2) {
    // u: [253, 254], s: [INT_MIN, INT_MAX - 2]
    clang_analyzer_eval(u != s); // expected-warning{{UNKNOWN}}
                                 // but returns TRUE
  }
}

I feel like we have something to talk about.
When I do the review pro bono, I'd like to focus on higher-level issues and let the submitter deal with the smaller concerns.
That's why I'm expecting the submitter to:

Explain in the summary what the patch aims to solve (aka. why did your work on it)
What & how it implemented it
What obstacles you had when you tried to implement it? Because the reviewer will most likely think the same way, it's better to highlight what you tried and why you failed that way.
Most importantly, attach the test cases you uncovered during development about the edge-cases of the previous point.

I'm also expecting that the change compiles, works, and is well-tested. This generally means that tests are covering all the branches in the modified parts and the change runs and are capable of analyzing non-trivial projects without crashing or producing unacceptable reports.
In particular, the Core and the range-based solver are the foundation of the engine, hence even more rigorous testing is required, so correctness is a must in these contexts.

Coming back to this review, I don't want to validate the correctness of the math. I trust you to do this, which you prove by tests or (Z3 solution in addition to that).
Getting more concrete, returning Unknown is fine, but returning the wrong answer like True for cases where we should not be able to deduce it, it's a serious issue.

I hope it helps to align us.

This revision now requires changes to proceed.Dec 21 2022, 1:10 AM

Re-introduce bitwidth comparison

In D140086#4010175, @steakhal wrote:
This test fails.
void testfoo(unsigned char u, signed int s) {
  if (u >= 253 && u <= 255 && s < INT_MAX - 2) {
    // u: [253, 254], s: [INT_MIN, INT_MAX - 2]
    clang_analyzer_eval(u != s); // expected-warning{{UNKNOWN}}
                                 // but returns TRUE
  }
}
I feel like we have something to talk about.
When I do the review pro bono, I'd like to focus on higher-level issues and let the submitter deal with the smaller concerns.

I think there has been some miscommunication. When I mentioned the failing example, I didn't mean to leave/delegate the pending work.

Nonetheless, I fixed it by re-introducing bitwidth comparison. Here is a Z3 proof https://gist.github.com/weirdsmiley/a9917815e71e4ec09e076522df039841

manas marked 5 inline comments as done.Jan 4 2023, 4:26 PM

manas added inline comments.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1641–1642	I combined the two conditionals as they both were returning TrueRange. Is it readable?

Harbormaster completed remote builds in B205798: Diff 486425.Jan 4 2023, 5:24 PM

Sorry, I don't have the time this week.

Gentle ping.

Looks good. Thanks.

This revision is now accepted and ready to land.Jan 24 2023, 4:42 AM

Closed by commit rG5f02ad880e42: [analyzer][solver] Improve reasoning for not equal to operator (authored by manas). · Explain WhyJan 24 2023, 1:03 PM

This revision was automatically updated to reflect the committed changes.

manas added a commit: rG5f02ad880e42: [analyzer][solver] Improve reasoning for not equal to operator.

Diff 491885

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show First 20 Lines • Show All 1,616 Lines • ▼ Show 20 Lines

RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_NE>(RangeSet LHS, RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_NE>(RangeSet LHS,

RangeSet RHS, RangeSet RHS,

QualType T) { QualType T) {

assert(!LHS.isEmpty() && !RHS.isEmpty()); assert(!LHS.isEmpty() && !RHS.isEmpty());

if (LHS.getAPSIntType() == RHS.getAPSIntType()) { if (LHS.getAPSIntType() == RHS.getAPSIntType()) {

if (intersect(RangeFactory, LHS, RHS).isEmpty()) if (intersect(RangeFactory, LHS, RHS).isEmpty())

return getTrueRange(T); return getTrueRange(T);

} else { } else {

// We can only lose information if we are casting smaller signed type to

// bigger unsigned type. For e.g.,

// LHS (unsigned short): [2, USHRT_MAX]

// RHS (signed short): [SHRT_MIN, 0]

// Casting RHS to LHS type will leave us with overlapping values

// CastedRHS : [0, 0] U [SHRT_MAX + 1, USHRT_MAX]

// We can avoid this by checking if signed type's maximum value is lesser

// than unsigned type's minimum value.

// If both have different signs then only we can get more information.

if (LHS.isUnsigned() != RHS.isUnsigned()) {

steakhalUnsubmitted

Done

// If both have different signs then only we can get more information.

- if (LHS.isUnsigned() ^ RHS.isUnsigned()) {

+ if (LHS.isUnsigned() != RHS.isUnsigned()) {

if (LHS.isUnsigned() && (LHS.getBitWidth() >= RHS.getBitWidth())) {

I think in this context != should achieve the same, and we usually prefer this operator for this.

steakhal: I think in this context `!=` should achieve the same, and we usually prefer this operator for…

if (LHS.isUnsigned() && (LHS.getBitWidth() >= RHS.getBitWidth())) {

steakhalUnsubmitted

Done

Why do we need this additional condition?
If I remove these, I get no test failures, which suggests to me that we have some undertested code paths here.

steakhal: Why do we need this additional condition? If I remove these, I get no test failures, which…

manasAuthorUnsubmitted

Done

Why do we need this additional condition?

Bitwidth was important because we should ideally cast smaller bitwidth type to bigger bitwidth type.

Consider if we have LHS(u8), RHS(i32), then without checking for bitwidth, we would be casting RHS's maxValue to LHS's type, which will result in lose of information and will not serve our purpose.

manas: > Why do we need this additional condition? Bitwidth was important because we should ideally…

steakhalUnsubmitted

Done

If you think we need that bitwidth check, why did you remove it?
I'd like to see test cases demonstrating what we are talking about and see if we want that behavior or not.

steakhal: If you think we need that bitwidth check, why did you remove it? I'd like to see test cases…

manasAuthorUnsubmitted

Done

This test fails.

void testfoo(unsigned char u, signed int s) {
  if (u >= 253 && u <= 255 && s < INT_MAX - 2) {
    // u: [253, 254], s: [INT_MIN, INT_MAX - 2]
    clang_analyzer_eval(u != s); // expected-warning{{UNKNOWN}}
                                 // but returns TRUE
  }
}

manas: This test fails. ``` void testfoo(unsigned char u, signed int s) { if (u >= 253 && u <= 255…

steakhalUnsubmitted

Done

I was thinking that maybe if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.isSigned() && RHS.isUnsigned()) results in cleaner code, as it would require one level fewer indentations.
The control-flow looks complicated already enough.

steakhal: I was thinking that maybe `if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.isSigned…

manasAuthorUnsubmitted

Done

I was thinking that maybe if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.isSigned() && RHS.isUnsigned()) results in cleaner code

I did this, and I also combined both blocks into one to remove redundant code.

manas: > I was thinking that maybe if (LHS.isUnsigned() && RHS.isSigned()) {} ... else if (LHS.

steakhalUnsubmitted

Done

Interesting. I like it. I'd however recommend to move this and the other variable to the beginning of this guarded block. That way it would be easier to see that the guard condition relates to this ternary condition.

steakhal: Interesting. I like it. I'd however recommend to move this and the other variable to the…

if (RHS.getMaxValue().isNegative() ||

LHS.getAPSIntType().convert(RHS.getMaxValue()) < LHS.getMinValue())

manasAuthorUnsubmitted

Not Done

I combined the two conditionals as they both were returning TrueRange. Is it readable?

manas: I combined the two conditionals as they both were returning TrueRange. Is it readable?

return getTrueRange(T);

} else if (RHS.isUnsigned() && (LHS.getBitWidth() <= RHS.getBitWidth())) {

if (LHS.getMaxValue().isNegative() ||

RHS.getAPSIntType().convert(LHS.getMaxValue()) < RHS.getMinValue())

steakhalUnsubmitted

Done

if (LHS.isUnsigned() && (LHS.getBitWidth() >= RHS.getBitWidth())) {

// If signed range is <Zero, then we can simply infer that expression

// will return true.

- llvm::APSInt Zero = RHS.getAPSIntType().getZeroValue();

- bool IsRHSNegative = RHS.getMaxValue() < Zero;

- if (IsRHSNegative)

+ if (RHS.getMaxValue().isNegative())

return getTrueRange(T);

// If signed range may have APSInt >=Zero, then maximum value of signed

steakhal:

return getTrueRange(T);

}

// Both RangeSets should be casted to bigger unsigned type. // Both RangeSets should be casted to bigger unsigned type.

APSIntType CastingType(std::max(LHS.getBitWidth(), RHS.getBitWidth()), APSIntType CastingType(std::max(LHS.getBitWidth(), RHS.getBitWidth()),

LHS.isUnsigned() || RHS.isUnsigned()); LHS.isUnsigned() || RHS.isUnsigned());

RangeSet CastedLHS = RangeFactory.castTo(LHS, CastingType); RangeSet CastedLHS = RangeFactory.castTo(LHS, CastingType);

RangeSet CastedRHS = RangeFactory.castTo(RHS, CastingType); RangeSet CastedRHS = RangeFactory.castTo(RHS, CastingType);

if (intersect(RangeFactory, CastedLHS, CastedRHS).isEmpty()) if (intersect(RangeFactory, CastedLHS, CastedRHS).isEmpty())

return getTrueRange(T); return getTrueRange(T);

} }

// In all other cases, the resulting range cannot be deduced. // In all other cases, the resulting range cannot be deduced.

return infer(T); return infer(T);

} }

template <> template <>

steakhalUnsubmitted

Done

I was thinking of using init-ifs, but on second thought I'm not sure if that's more readable.

if (RHS.getAPSIntType().convert(LHS.getMaxValue()) < RHS.getMinValue())
  return getTrueRange(T);

Shouldn't be too bad.

steakhal: I was thinking of using init-ifs, but on second thought I'm not sure if that's more readable.

manasAuthorUnsubmitted

Done

I think its readable. But to combine two blocks into one, I had to use different variable names, which makes this expression bigger. Should we still go with init-ifs?

manas: I think its readable. But to combine two blocks into one, I had to use different variable names…

RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_Or>(Range LHS, Range RHS, RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_Or>(Range LHS, Range RHS,

QualType T) { QualType T) {

APSIntType ResultType = ValueFactory.getAPSIntType(T); APSIntType ResultType = ValueFactory.getAPSIntType(T);

llvm::APSInt Zero = ResultType.getZeroValue(); llvm::APSInt Zero = ResultType.getZeroValue();

bool IsLHSPositiveOrZero = LHS.From() >= Zero; bool IsLHSPositiveOrZero = LHS.From() >= Zero;

bool IsRHSPositiveOrZero = RHS.From() >= Zero; bool IsRHSPositiveOrZero = RHS.From() >= Zero;

▲ Show 20 Lines • Show All 496 Lines • ▼ Show 20 Lines [[nodiscard]] std::optional<bool> interpreteAsBool(RangeSet Constraint) {

return std::nullopt; return std::nullopt;

} }

ProgramStateRef State; ProgramStateRef State;

SValBuilder &Builder; SValBuilder &Builder;

RangeSet::Factory &RangeFactory; RangeSet::Factory &RangeFactory;

}; };

bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym, bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym,

const llvm::APSInt &Constraint) { const llvm::APSInt &Constraint) {

llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses; llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;

// Iterate over all equivalence classes and try to simplify them. // Iterate over all equivalence classes and try to simplify them.

ClassMembersTy Members = State->get<ClassMembers>(); ClassMembersTy Members = State->get<ClassMembers>();

for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) { for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {

EquivalenceClass Class = ClassToSymbolSet.first; EquivalenceClass Class = ClassToSymbolSet.first;

State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class); State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);

▲ Show 20 Lines • Show All 1,238 Lines • Show Last 20 Lines

clang/test/Analysis/constant-folding.c

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	if (u1 >= INT_MIN && u1 <= INT_MIN + 2 &&
s1 > INT_MIN + 2 && s1 < INT_MIN + 4) {		s1 > INT_MIN + 2 && s1 < INT_MIN + 4) {
// u1: [INT_MAX+1, INT_MAX+1]U[INT_MAX+4, INT_MAX+4],		// u1: [INT_MAX+1, INT_MAX+1]U[INT_MAX+4, INT_MAX+4],
// s1: [INT_MIN+3, INT_MIN+3]		// s1: [INT_MIN+3, INT_MIN+3]
clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}}		clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}}
}		}

if (s1 < 0 && s1 > -4 && u1 > UINT_MAX - 4 && u1 < UINT_MAX - 1) {		if (s1 < 0 && s1 > -4 && u1 > UINT_MAX - 4 && u1 < UINT_MAX - 1) {
// s1: [-3, -1], u1: [UINT_MAX - 3, UINT_MAX - 2]		// s1: [-3, -1], u1: [UINT_MAX - 3, UINT_MAX - 2]
clang_analyzer_eval(u1 != s1); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}}
		clang_analyzer_eval(s1 != u1); // expected-warning{{TRUE}}
}		}

if (s1 < 1 && s1 > -6 && s1 != -4 && s1 != -3 &&		if (s1 < 1 && s1 > -6 && s1 != -4 && s1 != -3 &&
u1 > UINT_MAX - 4 && u1 < UINT_MAX - 1) {		u1 > UINT_MAX - 4 && u1 < UINT_MAX - 1) {
// s1: [-5, -5]U[-2, 0], u1: [UINT_MAX - 3, UINT_MAX - 2]		// s1: [-5, -5]U[-2, 0], u1: [UINT_MAX - 3, UINT_MAX - 2]
clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}}		clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}}
}		}

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	void testDisequalityRules(unsigned int u1, unsigned int u2, unsigned int u3,
}		}

// Checks for char-uchar types		// Checks for char-uchar types
if (uch >= 1 && sch <= 1) {		if (uch >= 1 && sch <= 1) {
// uch: [1, UCHAR_MAX], sch: [SCHAR_MIN, 1]		// uch: [1, UCHAR_MAX], sch: [SCHAR_MIN, 1]
clang_analyzer_eval(uch != sch); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(uch != sch); // expected-warning{{UNKNOWN}}
}		}

// FIXME: Casting smaller signed types to unsigned one may leave us with
// overlapping values, falsely indicating UNKNOWN, where it is possible to
// assert TRUE.
if (uch > 1 && sch < 1) {		if (uch > 1 && sch < 1) {
// uch: [2, UCHAR_MAX], sch: [SCHAR_MIN, 0]		// uch: [2, UCHAR_MAX], sch: [SCHAR_MIN, 0]
clang_analyzer_eval(uch != sch); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(uch != sch); // expected-warning{{TRUE}}
		clang_analyzer_eval(sch != uch); // expected-warning{{TRUE}}
}		}

if (uch <= 1 && uch >= 1 && sch <= 1 && sch >= 1) {		if (uch <= 1 && uch >= 1 && sch <= 1 && sch >= 1) {
// uch: [1, 1], sch: [1, 1]		// uch: [1, 1], sch: [1, 1]
clang_analyzer_eval(uch != sch); // expected-warning{{FALSE}}		clang_analyzer_eval(uch != sch); // expected-warning{{FALSE}}
}		}

// Checks for short-ushort types		// Checks for short-ushort types
if (ush >= 1 && ssh <= 1) {		if (ush >= 1 && ssh <= 1) {
// ush: [1, USHRT_MAX], ssh: [SHRT_MIN, 1]		// ush: [1, USHRT_MAX], ssh: [SHRT_MIN, 1]
clang_analyzer_eval(ush != ssh); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(ush != ssh); // expected-warning{{UNKNOWN}}
}		}

// FIXME: Casting leave us with overlapping values. Should be TRUE.
if (ush > 1 && ssh < 1) {		if (ush > 1 && ssh < 1) {
// ush: [2, USHRT_MAX], ssh: [SHRT_MIN, 0]		// ush: [2, USHRT_MAX], ssh: [SHRT_MIN, 0]
clang_analyzer_eval(ush != ssh); // expected-warning{{UNKNOWN}}		clang_analyzer_eval(ush != ssh); // expected-warning{{TRUE}}
}		}

if (ush <= 1 && ush >= 1 && ssh <= 1 && ssh >= 1) {		if (ush <= 1 && ush >= 1 && ssh <= 1 && ssh >= 1) {
// ush: [1, 1], ssh: [1, 1]		// ush: [1, 1], ssh: [1, 1]
clang_analyzer_eval(ush != ssh); // expected-warning{{FALSE}}		clang_analyzer_eval(ush != ssh); // expected-warning{{FALSE}}
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][solver] Improve reasoning for not equal to operator
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491885

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

clang/test/Analysis/constant-folding.c

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][solver] Improve reasoning for not equal to operatorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491885

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

clang/test/Analysis/constant-folding.c

[analyzer][solver] Improve reasoning for not equal to operator
ClosedPublic