This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/StaticAnalyzer/Core/PathSensitive/
-
clang/
-
StaticAnalyzer/
-
Core/
-
PathSensitive/
-
RangedConstraintManager.h
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
8/10
RangeConstraintManager.cpp
-
test/Analysis/
-
Analysis/
2/2
expr-inspection-printState-eq-classes.c
-
symbol-simplification-disequality-info.cpp
4/4
symbol-simplification-fixpoint-iteration-unreachable-code.cpp
2/2
symbol-simplification-fixpoint-one-iteration.cpp
2/2
symbol-simplification-fixpoint-two-iterations.cpp
-
symbol-simplification-reassume.cpp

Differential D106823

[analyzer][solver] Iterate to a fixpoint during symbol simplification with constants
ClosedPublic

Authored by martong on Jul 26 2021, 1:58 PM.

Download Raw Diff

Details

Reviewers

vsavchenko
NoQ
steakhal
Szelethus

Commits

rG806329da0700: [analyzer][solver] Iterate to a fixpoint during symbol simplification with…

Summary

D103314 introduced symbol simplification when a new constant constraint is
added. Currently, we simplify existing equivalence classes by iterating over
all existing members of them and trying to simplify each member symbol with
simplifySVal.

At the end of such a simplification round we may end up introducing a
new constant constraint. Example:

if (a + b + c != d)
  return;
if (c + b != 0)
  return;
// Simplification starts here.
if (b != 0)
  return;

The c == 0 constraint is the result of the first simplification iteration.
However, we could do another round of simplification to reach the conclusion
that a == d. Generally, we could do as many new iterations until we reach a
fixpoint.

We can reach to a fixpoint by recursively calling State->assume on the
newly simplified symbol. By calling State->assume we re-ignite the
whole assume machinery (along e.g with adjustment handling).

Why should we do this? By reaching a fixpoint in simplification we are capable
of discovering infeasible states at the moment of the introduction of the
first constant constraint.
Let's modify the previous example just a bit, and consider what happens without
the fixpoint iteration.

if (a + b + c != d)
  return;
if (c + b != 0)
  return;
// Adding a new constraint.
if (a == d)
  return;
// This brings in a contradiction.
if (b != 0)
  return;
clang_analyzer_warnIfReached(); // This produces a warning.
            // The path is already infeasible...
if (c == 0) // ...but we realize that only when we evaluate `c == 0`.
  return;

What happens currently, without the fixpoint iteration? As the inline comments
suggest, without the fixpoint iteration we are doomed to realize that we are on
an infeasible path only after we are already walking on that. With fixpoint
iteration we can detect that before stepping on that. With fixpoint iteration,
the clang_analyzer_warnIfReached does not warn in the above example b/c
during the evaluation of b == 0 we realize the contradiction. The engine and
the checkers do rely on that either assume(Cond) or assume(!Cond) should be
feasible. This is in fact assured by the so called expensive checks
(LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of
the checkers that has a very similar assertion.
(Actually, recognizing an infeasible parent state could have happened even
before simplification have been introduced. Because of the imperfection of the
solver, adding a new constraint can highlight that the parent state had been
infeasible already.)

Before this patch, we simply added the simplified symbol to the equivalence
class. In this patch, after we have added the simplified symbol, we remove the
old (more complex) symbol from the members of the equivalence class
(ClassMembers). Removing the old symbol is beneficial because during the next
iteration of the simplification we don't have to consider again the old symbol.

Contrary to how we handle ClassMembers, we don't remove the old Sym->Class
relation from the ClassMap. This is important for two reasons: The
constraints of the old symbol can still be found via it's equivalence class
that it used to be the member of (1). We can spare one removal and thus one
additional tree in the forest of ClassMap (2).

Performance and complexity: Let us assume that in a State we have N non-trivial
equivalence classes and that all constraints and disequality info is related to
non-trivial classes. In the worst case, we can simplify only one symbol of one
class in each iteration. The number of symbols in one class cannot grow b/c we
replace the old symbol with the simplified one. Also, the number of the
equivalence classes can decrease only, b/c the algorithm does a merge operation
optionally. We need N iterations in this case to reach the fixpoint. Thus, the
steps needed to be done in the worst case is proportional to N*N. Empirical
results (attached) show that there is some hardly noticeable run-time and peak
memory discrepancy compared to the baseline. In my opinion, these differences
could be the result of measurement error.
This worst case scenario can be extended to that cases when we have trivial
classes in the constraints and in the disequality map are transforming to such
a State where there are only non-trivial classes, b/c the algorithm does merge
operations. A merge operation on two trivial classes results in one non-trivial
class.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

martong created this revision.Jul 26 2021, 1:58 PM

Herald added a reviewer: Szelethus. · View Herald TranscriptJul 26 2021, 1:58 PM

Herald added subscribers: manas, ASDenysPetrov, gamesh411 and 10 others. · View Herald Transcript

martong requested review of this revision.Jul 26 2021, 1:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2021, 1:58 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

For more details please check out this html file:

stats.html174 KBDownload

Rename test functions

Harbormaster completed remote builds in B116271: Diff 361793.Jul 26 2021, 3:24 PM

martong edited the summary of this revision. (Show Details)Jul 27 2021, 8:41 AM

Ping. @vsavchenko Could you please take a look?

Looking great!
I have a couple of nit picks and I kind of want to check that it doesn't affect the performance on a different set of projects as well.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2148–2149	Maybe after removing you can check that `ClsMembers` is not empty? I just don't like relying on `getHeight` because it looks like an implementation detail and shouldn't be used. We use it only in one place in the whole codebase (in equivalence classes :) ), but since we can re-write this assertion to have a simpler condition, I think that it should be preferred.
clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
21	Do we need to print states in this test?
clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp
33	Same question here
clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
37	OK, I understand the next two classes. But how did we produce this one?

Thanks for the review!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2148–2149	Okay, I've also found it inconvenient to use `getHeight` but the API of immutable maps is quite sparse, I mean there is no way to query the size. The solution you suggest is better in the sense we don't have to use the internal API, so I've changed it that way, though, it has the disadvantage that we must check the precondition of the function in the middle of it which is weird to read. What about having a free function that takes a tree as a param and returns whether it has two members? Or we could even extend the API of the immutable AVL tree with a new member function.
clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
21	Good point, I've accidentally left it here.
clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp
33	OK, I understand the next two classes. But how did we produce this one? Simplification is done on each equivalence class we can find in the state, no matter if they are non-trivial or trivial classes. Here is what happens in this case: We skim through the constraints and try to simplify all equivalence classes there. During this we start simplifying the trivial equivalence class `((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)`. The first and only member of this class can be simplified with `(reg_$0<int a>) != (reg_$2<int c>)` because `b==0`. Now, we merge the two trivial classes of the original non-simplified and the new simplified symbols. At this point we receive a non-trivial class with two members: `((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)` and `(reg_$0<int a>) != (reg_$2<int c>)`. And then we remove the old symbol from the class. That results in a non-trivial class with one member: `(reg_$0<int a>) != (reg_$2<int c>)`.
clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
37	I am giving an answer to this in the previous test file, because that case is shorter and easier to explain.

Remove superfluous clang_analyzer_printState
Assert on isEmpty instead of using getHeight

Harbormaster completed remote builds in B118093: Diff 364373.Aug 5 2021, 2:12 AM

@vsavchenko gentle ping

martong added inline comments.Oct 1 2021, 7:01 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
2144	Remove `const`
2159	Comment that this is a precondition.
2234	TODO add Performance and complexity essay here.

I'm gonna get back to this on Monday.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1741–1742	IMO we should have a `llvm::Statistic` here, tracking the maximum iteration count to reach the fixed point and an average iteration count.
1775	I'd love to see a coverage report of the tests you add with this patch.
clang/test/Analysis/expr-inspection-printState-eq-classes.c
11	Why do you need to change this?
clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
17	Is it important to have this instead of `b + c`?

Reach the fixpoint by recursively calling State->assume on the simplified symbol.
Address review nits.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1741–1742	We can't do this once we reach the fixpoint with recursive `assume` calls.
1775	Ok, I am going to check the coverage and add the missing cases.
clang/test/Analysis/expr-inspection-printState-eq-classes.c
11	We don't need it, I removed.
clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
17	No, I changed it to `b+c` as you suggested.

Add essay about complexity.

Harbormaster completed remote builds in B131989: Diff 384122.Nov 2 2021, 9:06 AM

steakhal added inline comments.Nov 3 2021, 2:55 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1775	Ok, I am going to check the coverage and add the missing cases. As of diff 5, line 1767 and all the code in the block at line 2184 are uncovered by the tests you provided.

Add new tests to cover missing line coverage.

As of diff 5, line 1767 and all the code in the block at line 2184 are uncovered by the tests you provided.

Thanks, I've added new tests that cover the re-assume logic (the block of line 2184).

However, I was unable to add a test that covers the case when the simplification of a trivial symbol in the disequality info results an infeasible state (line 1767). Here is how I tried: I had changed the line to an assertion and then initiated the static analysis of the following opensource projects: memcached,tmux,curl,twin,redis,vim,openssl,sqlite,ffmpeg,postgresql,tinyxml2,libwebm,xerces,bitcoin,protobuf. My idea had been, if the assertion would fired then I would use creduce to the create the test case (btw, this is how I added the other infeasible state test case). However, it did not fired.

IMHO, having a defensive check at L1767 is correct b/c there is a slight chance of reaching an infeasible state there. Although the chance is minimal, I cannot prove that is 0.

Harbormaster completed remote builds in B132493: Diff 384802.Nov 4 2021, 11:58 AM

martong edited the summary of this revision. (Show Details)Nov 5 2021, 3:00 AM

martong edited the summary of this revision. (Show Details)Nov 5 2021, 3:18 AM

There are no runtime or peak memory usage growth with this patch (actually, the runtime decreases with a few %).
I am attaching the measurements results of the latest Diff.

stats.html185 KBDownload

martong added a child revision: D113261: [analyzer][solver] Remove reference to RangedConstraintManager.Nov 5 2021, 4:06 AM

In D106823#3109469, @martong wrote:

As of diff 5, line 1767 and all the code in the block at line 2184 are uncovered by the tests you provided.

Thanks, I've added new tests that cover the re-assume logic (the block of line 2184).

However, I was unable to add a test that covers the case when the simplification of a trivial symbol in the disequality info results an infeasible state (line 1767). Here is how I tried: I had changed the line to an assertion and then initiated the static analysis of the following opensource projects: memcached,tmux,curl,twin,redis,vim,openssl,sqlite,ffmpeg,postgresql,tinyxml2,libwebm,xerces,bitcoin,protobuf. My idea had been, if the assertion would fired then I would use creduce to the create the test case (btw, this is how I added the other infeasible state test case). However, it did not fired.

IMHO, having a defensive check at L1767 is correct b/c there is a slight chance of reaching an infeasible state there. Although the chance is minimal, I cannot prove that is 0.

It's fine by me. Thanks for the investigation.

In D106823#3111110, @martong wrote:

There are no runtime or peak memory usage growth with this patch (actually, the runtime decreases with a few %).
I am attaching the measurements results of the latest Diff.
stats.html185 KBDownload

Great!

The tests refer to reg_$0 by spelling the id number, which is unfortunate, but I suspect these tests will break for the slightest changes in the solver anyway so I'm not too bothered with this.

Please wait a week for the rest of the members of the community to have a look before committing.
@NoQ @Szelethus @ASDenysPetrov

This revision is now accepted and ready to land.Nov 5 2021, 9:35 AM

Closed by commit rG806329da0700: [analyzer][solver] Iterate to a fixpoint during symbol simplification with… (authored by martong). · Explain WhyNov 12 2021, 2:57 AM

This revision was automatically updated to reflect the committed changes.

martong added a commit: rG806329da0700: [analyzer][solver] Iterate to a fixpoint during symbol simplification with….

Revision Contents

Path

Size

clang/

include/

clang/

StaticAnalyzer/

Core/

PathSensitive/

RangedConstraintManager.h

14 lines

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

119 lines

test/

Analysis/

expr-inspection-printState-eq-classes.c

8 lines

symbol-simplification-disequality-info.cpp

65 lines

symbol-simplification-fixpoint-iteration-unreachable-code.cpp

55 lines

symbol-simplification-fixpoint-one-iteration.cpp

40 lines

symbol-simplification-fixpoint-two-iterations.cpp

45 lines

symbol-simplification-reassume.cpp

37 lines

Diff 386775

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	public:
/// where N = size(this)		/// where N = size(this)
bool contains(llvm::APSInt Point) const { return containsImpl(Point); }		bool contains(llvm::APSInt Point) const { return containsImpl(Point); }

bool containsZero() const {		bool containsZero() const {
APSIntType T{getMinValue()};		APSIntType T{getMinValue()};
return contains(T.getZeroValue());		return contains(T.getZeroValue());
}		}

		/// Test if the range is the [0,0] range.
		///
		/// Complexity: O(1)
		bool encodesFalseRange() const {
		const llvm::APSInt *Constant = getConcreteValue();
		return Constant && Constant->isZero();
		}

		/// Test if the range doesn't contain zero.
		///
		/// Complexity: O(logN)
		/// where N = size(this)
		bool encodesTrueRange() const { return !containsZero(); }

void dump(raw_ostream &OS) const;		void dump(raw_ostream &OS) const;
void dump() const;		void dump() const;

bool operator==(const RangeSet &Other) const { return Impl == Other.Impl; }		bool operator==(const RangeSet &Other) const { return Impl == Other.Impl; }
bool operator!=(const RangeSet &Other) const { return !(*this == Other); }		bool operator!=(const RangeSet &Other) const { return !(*this == Other); }

private:		private:
/* implicit / RangeSet(ContainerType RawContainer) : Impl(RawContainer) {}		/* implicit / RangeSet(ContainerType RawContainer) : Impl(RawContainer) {}
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show First 20 Lines • Show All 595 Lines • ▼ Show 20 Lines	public:
getDisequalClasses(DisequalityMapTy Map, ClassSet::Factory &Factory) const;		getDisequalClasses(DisequalityMapTy Map, ClassSet::Factory &Factory) const;

LLVM_NODISCARD static inline Optional<bool> areEqual(ProgramStateRef State,		LLVM_NODISCARD static inline Optional<bool> areEqual(ProgramStateRef State,
EquivalenceClass First,		EquivalenceClass First,
EquivalenceClass Second);		EquivalenceClass Second);
LLVM_NODISCARD static inline Optional<bool>		LLVM_NODISCARD static inline Optional<bool>
areEqual(ProgramStateRef State, SymbolRef First, SymbolRef Second);		areEqual(ProgramStateRef State, SymbolRef First, SymbolRef Second);

		/// Remove one member from the class.
		LLVM_NODISCARD ProgramStateRef removeMember(ProgramStateRef State,
		const SymbolRef Old);

/// Iterate over all symbols and try to simplify them.		/// Iterate over all symbols and try to simplify them.
LLVM_NODISCARD static inline ProgramStateRef		LLVM_NODISCARD static inline ProgramStateRef
simplify(SValBuilder &SVB, RangeSet::Factory &F, RangedConstraintManager &RCM,		simplify(SValBuilder &SVB, RangeSet::Factory &F, RangedConstraintManager &RCM,
ProgramStateRef State, EquivalenceClass Class);		ProgramStateRef State, EquivalenceClass Class);

void dumpToStream(ProgramStateRef State, raw_ostream &os) const;		void dumpToStream(ProgramStateRef State, raw_ostream &os) const;
LLVM_DUMP_METHOD void dump(ProgramStateRef State) const {		LLVM_DUMP_METHOD void dump(ProgramStateRef State) const {
dumpToStream(State, llvm::errs());		dumpToStream(State, llvm::errs());
Show All 39 Lines	private:
SymbolRef getRepresentativeSymbol() const {		SymbolRef getRepresentativeSymbol() const {
return reinterpret_cast<SymbolRef>(ID);		return reinterpret_cast<SymbolRef>(ID);
}		}
static inline SymbolSet::Factory &getMembersFactory(ProgramStateRef State);		static inline SymbolSet::Factory &getMembersFactory(ProgramStateRef State);

inline ProgramStateRef mergeImpl(RangeSet::Factory &F, ProgramStateRef State,		inline ProgramStateRef mergeImpl(RangeSet::Factory &F, ProgramStateRef State,
SymbolSet Members, EquivalenceClass Other,		SymbolSet Members, EquivalenceClass Other,
SymbolSet OtherMembers);		SymbolSet OtherMembers);

static inline bool		static inline bool
addToDisequalityInfo(DisequalityMapTy &Info, ConstraintRangeTy &Constraints,		addToDisequalityInfo(DisequalityMapTy &Info, ConstraintRangeTy &Constraints,
RangeSet::Factory &F, ProgramStateRef State,		RangeSet::Factory &F, ProgramStateRef State,
EquivalenceClass First, EquivalenceClass Second);		EquivalenceClass First, EquivalenceClass Second);

/// This is a unique identifier of the class.		/// This is a unique identifier of the class.
uintptr_t ID;		uintptr_t ID;
};		};
▲ Show 20 Lines • Show All 1,061 Lines • ▼ Show 20 Lines	bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym,
ClassMembersTy Members = State->get<ClassMembers>();		ClassMembersTy Members = State->get<ClassMembers>();
for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {		for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
EquivalenceClass Class = ClassToSymbolSet.first;		EquivalenceClass Class = ClassToSymbolSet.first;
State =		State =
EquivalenceClass::simplify(Builder, RangeFactory, RCM, State, Class);		EquivalenceClass::simplify(Builder, RangeFactory, RCM, State, Class);
if (!State)		if (!State)
return false;		return false;
SimplifiedClasses.insert(Class);		SimplifiedClasses.insert(Class);
}		}

		steakhalUnsubmitted Done Reply Inline Actions IMO we should have a `llvm::Statistic` here, tracking the maximum iteration count to reach the fixed point and an average iteration count. steakhal: IMO we should have a `llvm::Statistic` here, tracking the maximum iteration count to reach the…
		martongAuthorUnsubmitted Done Reply Inline Actions We can't do this once we reach the fixpoint with recursive `assume` calls. martong: We can't do this once we reach the fixpoint with recursive `assume` calls.
// Trivial equivalence classes (those that have only one symbol member) are		// Trivial equivalence classes (those that have only one symbol member) are
// not stored in the State. Thus, we must skim through the constraints as		// not stored in the State. Thus, we must skim through the constraints as
// well. And we try to simplify symbols in the constraints.		// well. And we try to simplify symbols in the constraints.
ConstraintRangeTy Constraints = State->get<ConstraintRange>();		ConstraintRangeTy Constraints = State->get<ConstraintRange>();
for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {		for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {
EquivalenceClass Class = ClassConstraint.first;		EquivalenceClass Class = ClassConstraint.first;
if (SimplifiedClasses.count(Class)) // Already simplified.		if (SimplifiedClasses.count(Class)) // Already simplified.
continue;		continue;
State =		State =
EquivalenceClass::simplify(Builder, RangeFactory, RCM, State, Class);		EquivalenceClass::simplify(Builder, RangeFactory, RCM, State, Class);
if (!State)		if (!State)
return false;		return false;
}		}

		// We may have trivial equivalence classes in the disequality info as
		// well, and we need to simplify them.
		DisequalityMapTy DisequalityInfo = State->get<DisequalityMap>();
		for (std::pair<EquivalenceClass, ClassSet> DisequalityEntry :
		DisequalityInfo) {
		EquivalenceClass Class = DisequalityEntry.first;
		ClassSet DisequalClasses = DisequalityEntry.second;
		State =
		EquivalenceClass::simplify(Builder, RangeFactory, RCM, State, Class);
		if (!State)
		return false;
		}

return true;		return true;
}		}

bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,		bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,
RangeSet Constraint) {		RangeSet Constraint) {
if (!handleRemainderOp(Sym, Constraint))		if (!handleRemainderOp(Sym, Constraint))
		steakhalUnsubmitted Not Done Reply Inline Actions I'd love to see a coverage report of the tests you add with this patch. steakhal: I'd love to see a coverage report of the tests you add with this patch.
		martongAuthorUnsubmitted Done Reply Inline Actions Ok, I am going to check the coverage and add the missing cases. martong: Ok, I am going to check the coverage and add the missing cases.
		steakhalUnsubmitted Not Done Reply Inline Actions Ok, I am going to check the coverage and add the missing cases. As of diff 5, line 1767 and all the code in the block at line 2184 are uncovered by the tests you provided. steakhal: > Ok, I am going to check the coverage and add the missing cases. As of diff 5, line 1767 and…
return false;		return false;

Optional<bool> ConstraintAsBool = interpreteAsBool(Constraint);		Optional<bool> ConstraintAsBool = interpreteAsBool(Constraint);

if (!ConstraintAsBool)		if (!ConstraintAsBool)
return true;		return true;

if (Optional<bool> Equality = meansEquality(Sym)) {		if (Optional<bool> Equality = meansEquality(Sym)) {
▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	inline Optional<bool> EquivalenceClass::areEqual(ProgramStateRef State,
ClassSet DisequalToFirst = First.getDisequalClasses(State);		ClassSet DisequalToFirst = First.getDisequalClasses(State);
if (DisequalToFirst.contains(Second))		if (DisequalToFirst.contains(Second))
return false;		return false;

// It is not clear.		// It is not clear.
return llvm::None;		return llvm::None;
}		}

		LLVM_NODISCARD ProgramStateRef
		EquivalenceClass::removeMember(ProgramStateRef State, const SymbolRef Old) {
		martongAuthorUnsubmitted Done Reply Inline Actions Remove `const` martong: Remove `const `

		SymbolSet ClsMembers = getClassMembers(State);
		assert(ClsMembers.contains(Old));

		// We don't remove `Old`'s Sym->Class relation for two reasons:
		vsavchenkoUnsubmitted Done Reply Inline Actions Maybe after removing you can check that `ClsMembers` is not empty? I just don't like relying on `getHeight` because it looks like an implementation detail and shouldn't be used. We use it only in one place in the whole codebase (in equivalence classes :) ), but since we can re-write this assertion to have a simpler condition, I think that it should be preferred. vsavchenko: Maybe after removing you can check that `ClsMembers` is not empty? I just don't like relying on…
		martongAuthorUnsubmitted Done Reply Inline Actions Okay, I've also found it inconvenient to use `getHeight` but the API of immutable maps is quite sparse, I mean there is no way to query the size. The solution you suggest is better in the sense we don't have to use the internal API, so I've changed it that way, though, it has the disadvantage that we must check the precondition of the function in the middle of it which is weird to read. What about having a free function that takes a tree as a param and returns whether it has two members? Or we could even extend the API of the immutable AVL tree with a new member function. martong: Okay, I've also found it inconvenient to use `getHeight` but the API of immutable maps is quite…
		// 1) This way constraints for the old symbol can still be found via it's
		// equivalence class that it used to be the member of.
		// 2) Performance and resource reasons. We can spare one removal and thus one
		// additional tree in the forest of `ClassMap`.

		// Remove `Old`'s Class->Sym relation.
		SymbolSet::Factory &F = getMembersFactory(State);
		ClassMembersTy::Factory &EMFactory = State->get_context<ClassMembers>();
		ClsMembers = F.remove(ClsMembers, Old);
		// Ensure another precondition of the removeMember function (we can check
		martongAuthorUnsubmitted Done Reply Inline Actions Comment that this is a precondition. martong: Comment that this is a precondition.
		// this only with isEmpty, thus we have to do the remove first).
		assert(!ClsMembers.isEmpty() &&
		"Class should have had at least two members before member removal");
		// Overwrite the existing members assigned to this class.
		ClassMembersTy ClassMembersMap = State->get<ClassMembers>();
		ClassMembersMap = EMFactory.add(ClassMembersMap, *this, ClsMembers);
		State = State->set<ClassMembers>(ClassMembersMap);

		return State;
		}

		// Re-evaluate an SVal with top-level `State->assume` logic.
		LLVM_NODISCARD ProgramStateRef reAssume(ProgramStateRef State,
		const RangeSet *Constraint,
		SVal TheValue) {
		if (!Constraint)
		return State;

		const auto DefinedVal = TheValue.castAs<DefinedSVal>();

		// If the SVal is 0, we can simply interpret that as `false`.
		if (Constraint->encodesFalseRange())
		return State->assume(DefinedVal, false);

		// If the constraint does not encode 0 then we can interpret that as `true`
		// AND as a Range(Set).
		if (Constraint->encodesTrueRange()) {
		State = State->assume(DefinedVal, true);
		if (!State)
		return nullptr;
		// Fall through, re-assume based on the range values as well.
		}
		// Overestimate the individual Ranges with the RangeSet' lowest and
		// highest values.
		return State->assumeInclusiveRange(DefinedVal, Constraint->getMinValue(),
		Constraint->getMaxValue(), true);
		}

// Iterate over all symbols and try to simplify them. Once a symbol is		// Iterate over all symbols and try to simplify them. Once a symbol is
// simplified then we check if we can merge the simplified symbol's equivalence		// simplified then we check if we can merge the simplified symbol's equivalence
// class to this class. This way, we simplify not just the symbols but the		// class to this class. This way, we simplify not just the symbols but the
// classes as well: we strive to keep the number of the classes to be the		// classes as well: we strive to keep the number of the classes to be the
// absolute minimum.		// absolute minimum.
LLVM_NODISCARD ProgramStateRef EquivalenceClass::simplify(		LLVM_NODISCARD ProgramStateRef EquivalenceClass::simplify(
SValBuilder &SVB, RangeSet::Factory &F, RangedConstraintManager &RCM,		SValBuilder &SVB, RangeSet::Factory &F, RangedConstraintManager &RCM,
ProgramStateRef State, EquivalenceClass Class) {		ProgramStateRef State, EquivalenceClass Class) {
Show All 20 Lines	if (SimplifiedMemberSym && MemberSym != SimplifiedMemberSym) {
ProgramStateRef OldState = State;		ProgramStateRef OldState = State;
State = merge(F, State, MemberSym, SimplifiedMemberSym);		State = merge(F, State, MemberSym, SimplifiedMemberSym);
if (!State)		if (!State)
return nullptr;		return nullptr;
// No state change, no merge happened actually.		// No state change, no merge happened actually.
if (OldState == State)		if (OldState == State)
continue;		continue;

// Initiate the reorganization of the equality information. E.g., if we		assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));
		martongAuthorUnsubmitted Done Reply Inline Actions TODO add Performance and complexity essay here. martong: TODO add Performance and complexity essay here.
// have `c + 1 == 0` then we'd like to express that `c == -1`. It makes		// Remove the old and more complex symbol.
// sense to do this only with `SymIntExpr`s.		State = find(State, MemberSym).removeMember(State, MemberSym);
// TODO Handle `IntSymExpr` as well, once computeAdjustment can handle
// them.		// Query the class constraint again b/c that may have changed during the
if (const SymIntExpr *SIE = dyn_cast<SymIntExpr>(SimplifiedMemberSym)) {		// merge above.
if (const RangeSet *ClassConstraint = getConstraint(State, Class)) {		const RangeSet *ClassConstraint = getConstraint(State, Class);
// Overestimate the individual Ranges with the RangeSet' lowest and
// highest values.		// Re-evaluate an SVal with top-level `State->assume`, this ignites
State = RCM.assumeSymInclusiveRange(		// a RECURSIVE algorithm that will reach a FIXPOINT.
State, SIE, ClassConstraint->getMinValue(),		//
ClassConstraint->getMaxValue(), /InRange=/true);		// About performance and complexity: Let us assume that in a State we
		// have N non-trivial equivalence classes and that all constraints and
		// disequality info is related to non-trivial classes. In the worst case,
		// we can simplify only one symbol of one class in each iteration. The
		// number of symbols in one class cannot grow b/c we replace the old
		// symbol with the simplified one. Also, the number of the equivalence
		// classes can decrease only, b/c the algorithm does a merge operation
		// optionally. We need N iterations in this case to reach the fixpoint.
		// Thus, the steps needed to be done in the worst case is proportional to
		// N*N.
		//
		// This worst case scenario can be extended to that case when we have
		// trivial classes in the constraints and in the disequality map. This
		// case can be reduced to the case with a State where there are only
		// non-trivial classes. This is because a merge operation on two trivial
		// classes results in one non-trivial class.
		State = reAssume(State, ClassConstraint, SimplifiedMemberVal);
if (!State)		if (!State)
return nullptr;		return nullptr;
}		}
}		}
}
}
return State;		return State;
}		}

inline ClassSet EquivalenceClass::getDisequalClasses(ProgramStateRef State,		inline ClassSet EquivalenceClass::getDisequalClasses(ProgramStateRef State,
SymbolRef Sym) {		SymbolRef Sym) {
return find(State, Sym).getDisequalClasses(State);		return find(State, Sym).getDisequalClasses(State);
}		}

▲ Show 20 Lines • Show All 699 Lines • Show Last 20 Lines

clang/test/Analysis/expr-inspection-printState-eq-classes.c

	// RUN: %clang_analyze_cc1 \			// RUN: %clang_analyze_cc1 \
	// RUN: -analyzer-checker=debug.ExprInspection %s 2>&1 \| FileCheck %s			// RUN: -analyzer-checker=debug.ExprInspection %s 2>&1 \| FileCheck %s

	void clang_analyzer_printState();			void clang_analyzer_printState();

	void test_equivalence_classes(int a, int b, int c, int d) {			void test_equivalence_classes(int a, int b, int c, int d) {
	if (a + b != c)			if (a + b != c)
	return;			return;
	if (a != d)			if (a != d)
	return;			return;
	if (b != 0)			if (b != 0)
				steakhalUnsubmitted Done Reply Inline Actions Why do you need to change this? steakhal: Why do you need to change this?
				martongAuthorUnsubmitted Done Reply Inline Actions We don't need it, I removed. martong: We don't need it, I removed.
	return;			return;
	clang_analyzer_printState();			clang_analyzer_printState();
	(void)(a * b * c * d);			(void)(a * b * c * d);
	return;			return;
	}			}

	// CHECK: "equivalence_classes": [			// CHECK: "equivalence_classes": [
	// CHECK-NEXT: [ "((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)", "(reg_$0<int a>) != (reg_$2<int c>)" ],			// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$2<int c>)" ],
	// CHECK-NEXT: [ "(reg_$0<int a>) + (reg_$1<int b>)", "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]			// CHECK-NEXT: [ "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]
	// CHECK-NEXT: ],			// CHECK-NEXT: ],

clang/test/Analysis/symbol-simplification-disequality-info.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: 2>&1 \| FileCheck %s

				// In this test we check how the solver's symbol simplification mechanism
				// simplifies the disequality info.

				void clang_analyzer_printState();

				void test(int a, int b, int c, int d) {
				if (a + b + c == d)
				return;
				clang_analyzer_printState();
				// CHECK: "disequality_info": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "reg_$3<int d>" ]]
				// CHECK-NEXT: },
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "reg_$3<int d>" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)" ]]
				// CHECK-NEXT: }
				// CHECK-NEXT: ],


				// Simplification starts here.
				if (b != 0)
				return;
				clang_analyzer_printState();
				// CHECK: "disequality_info": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "(reg_$0<int a>) + (reg_$2<int c>)" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "reg_$3<int d>" ]]
				// CHECK-NEXT: },
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "reg_$3<int d>" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "(reg_$0<int a>) + (reg_$2<int c>)" ]]
				// CHECK-NEXT: }
				// CHECK-NEXT: ],

				if (c != 0)
				return;
				clang_analyzer_printState();
				// CHECK: "disequality_info": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "reg_$0<int a>" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "reg_$3<int d>" ]]
				// CHECK-NEXT: },
				// CHECK-NEXT: {
				// CHECK-NEXT: "class": [ "reg_$3<int d>" ],
				// CHECK-NEXT: "disequal_to": [
				// CHECK-NEXT: [ "reg_$0<int a>" ]]
				// CHECK-NEXT: }
				// CHECK-NEXT: ],

				// Keep the symbols and the constraints! alive.
				(void)(a * b * c * d);
				return;
				}

clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: -verify

				// In this test we check whether the solver's symbol simplification mechanism
				// is capable of reaching a fixpoint.

				void clang_analyzer_warnIfReached();

				void test_contradiction(int a, int b, int c, int d, int x) {
				if (a + b + c != d)
				return;
				if (a == d)
				return;
				if (b + c != 0)
				return;
				steakhalUnsubmitted Done Reply Inline Actions Is it important to have this instead of `b + c`? steakhal: Is it important to have this instead of `b + c`?
				martongAuthorUnsubmitted Done Reply Inline Actions No, I changed it to `b+c` as you suggested. martong: No, I changed it to `b+c` as you suggested.
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}

				// Bring in the contradiction.
				if (b != 0)
				vsavchenkoUnsubmitted Done Reply Inline Actions Do we need to print states in this test? vsavchenko: Do we need to print states in this test?
				martongAuthorUnsubmitted Done Reply Inline Actions Good point, I've accidentally left it here. martong: Good point, I've accidentally left it here.
				return;

				// After the simplification of `b == 0` we have:
				// b == 0
				// a + c == d
				// a != d
				// c == 0
				// Doing another iteration we reach the fixpoint (with a contradiction):
				// b == 0
				// a == d
				// a != d
				// c == 0
				clang_analyzer_warnIfReached(); // no-warning, i.e. UNREACHABLE

				// Enabling expensive checks would trigger an assertion failure here without
				// the fixpoint iteration.
				if (a + c == x)
				return;

				// Keep the symbols and the constraints! alive.
				(void)(a * b * c * d * x);
				return;
				}

				void test_true_range_contradiction(int a, unsigned b) {
				if (!(b > a)) // unsigned b > int a
				return;
				if (a != -1) // int a == -1
				return; // Starts a simplification of `unsigned b > int a`,
				// that results in `unsigned b > UINT_MAX`,
				// which is always false, so the State is infeasible.
				clang_analyzer_warnIfReached(); // no-warning
				(void)(a * b);
				}

clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: 2>&1 \| FileCheck %s

				// In this test we check whether the solver's symbol simplification mechanism
				// is capable of reaching a fixpoint. This should be done after one iteration.

				void clang_analyzer_printState();

				void test(int a, int b, int c) {
				if (a + b != c)
				return;
				clang_analyzer_printState();
				// CHECK: "constraints": [
				// CHECK-NEXT: { "symbol": "((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)", "range": "{ [0, 0] }" }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "equivalence_classes": [
				// CHECK-NEXT: [ "(reg_$0<int a>) + (reg_$1<int b>)", "reg_$2<int c>" ]
				// CHECK-NEXT: ],
				// CHECK-NEXT: "disequality_info": null,

				// Simplification starts here.
				if (b != 0)
				return;
				clang_analyzer_printState();
				// CHECK: "constraints": [
				// CHECK-NEXT: { "symbol": "(reg_$0<int a>) != (reg_$2<int c>)", "range": "{ [0, 0] }" },
				// CHECK-NEXT: { "symbol": "reg_$1<int b>", "range": "{ [0, 0] }" }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "equivalence_classes": [
				// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$2<int c>)" ],
				// CHECK-NEXT: [ "reg_$0<int a>", "reg_$2<int c>" ]
				vsavchenkoUnsubmitted Done Reply Inline Actions Same question here vsavchenko: Same question here
				martongAuthorUnsubmitted Done Reply Inline Actions OK, I understand the next two classes. But how did we produce this one? Simplification is done on each equivalence class we can find in the state, no matter if they are non-trivial or trivial classes. Here is what happens in this case: We skim through the constraints and try to simplify all equivalence classes there. During this we start simplifying the trivial equivalence class `((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)`. The first and only member of this class can be simplified with `(reg_$0<int a>) != (reg_$2<int c>)` because `b==0`. Now, we merge the two trivial classes of the original non-simplified and the new simplified symbols. At this point we receive a non-trivial class with two members: `((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)` and `(reg_$0<int a>) != (reg_$2<int c>)`. And then we remove the old symbol from the class. That results in a non-trivial class with one member: `(reg_$0<int a>) != (reg_$2<int c>)`. martong: > OK, I understand the next two classes. But how did we produce this one? Simplification is…
				// CHECK-NEXT: ],
				// CHECK-NEXT: "disequality_info": null,

				// Keep the symbols and the constraints! alive.
				(void)(a * b * c);
				return;
				}

clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: 2>&1 \| FileCheck %s

				// In this test we check whether the solver's symbol simplification mechanism
				// is capable of reaching a fixpoint. This should be done after TWO iterations.

				void clang_analyzer_printState();

				void test(int a, int b, int c, int d) {
				if (a + b + c != d)
				return;
				if (c + b != 0)
				return;
				clang_analyzer_printState();
				// CHECK: "constraints": [
				// CHECK-NEXT: { "symbol": "(((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
				// CHECK-NEXT: { "symbol": "(reg_$2<int c>) + (reg_$1<int b>)", "range": "{ [0, 0] }" }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "equivalence_classes": [
				// CHECK-NEXT: [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)", "reg_$3<int d>" ]
				// CHECK-NEXT: ],
				// CHECK-NEXT: "disequality_info": null,

				// Simplification starts here.
				if (b != 0)
				return;
				clang_analyzer_printState();
				// CHECK: "constraints": [
				// CHECK-NEXT: { "symbol": "(reg_$0<int a>) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
				// CHECK-NEXT: { "symbol": "reg_$1<int b>", "range": "{ [0, 0] }" },
				// CHECK-NEXT: { "symbol": "reg_$2<int c>", "range": "{ [0, 0] }" }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "equivalence_classes": [
				// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$3<int d>)" ],
				// CHECK-NEXT: [ "reg_$0<int a>", "reg_$3<int d>" ],
				vsavchenkoUnsubmitted Done Reply Inline Actions OK, I understand the next two classes. But how did we produce this one? vsavchenko: OK, I understand the next two classes. But how did we produce this one?
				martongAuthorUnsubmitted Done Reply Inline Actions I am giving an answer to this in the previous test file, because that case is shorter and easier to explain. martong: I am giving an answer to this in the previous test file, because that case is shorter and…
				// CHECK-NEXT: [ "reg_$2<int c>" ]
				// CHECK-NEXT: ],
				// CHECK-NEXT: "disequality_info": null,

				// Keep the symbols and the constraints! alive.
				(void)(a * b * c * d);
				return;
				}

clang/test/Analysis/symbol-simplification-reassume.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: -analyzer-config eagerly-assume=false \
				// RUN: -verify

				// In this test we check whether the solver's symbol simplification mechanism
				// is capable of re-assuming simiplified symbols.

				void clang_analyzer_eval(bool);
				void clang_analyzer_warnIfReached();

				void test_reassume_false_range(int x, int y) {
				if (x + y != 0) // x + y == 0
				return;
				if (x != 1) // x == 1
				return;
				clang_analyzer_eval(y == -1); // expected-warning{{TRUE}}
				}

				void test_reassume_true_range(int x, int y) {
				if (x + y != 1) // x + y == 1
				return;
				if (x != 1) // x == 1
				return;
				clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
				}

				void test_reassume_inclusive_range(int x, int y) {
				if (x + y < 0 \|\| x + y > 1) // x + y: [0, 1]
				return;
				if (x != 1) // x == 1
				return;
				// y: [-1, 0]
				clang_analyzer_eval(y > 0); // expected-warning{{FALSE}}
				clang_analyzer_eval(y < -1);// expected-warning{{FALSE}}
				}