This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/StaticAnalyzer/Core/BugReporter/
-
clang/
-
StaticAnalyzer/
-
Core/
-
BugReporter/
-
BugReporterVisitors.h
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
8
BugReporterVisitors.cpp

Differential D48565

[analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a function to merge ConstraintSets
ClosedPublic

Authored by mikhail.ramalho on Jun 25 2018, 1:27 PM.

Download Raw Diff

Details

Reviewers

NoQ
george.karpenkov

Commits

rG38049a51bda3: [analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a…
rL336002: [analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a…
rC336002: [analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a…

Summary

The previous approach would add every constraint, from every node, into a vector and encode them all at the end of the path. However, duplicated constraints were not being removed and the SMT formula generated by the refutation manager would contain a huge number of duplicated constraint.

This patch implements some code to merge the constraint over the same symbol, removing duplicates. These are the number before and after the patch:

Project    |  before  |   after  |
tmux       |  283.222 |  137.927 | 
redis      |  614.858 |  457.721 |
openssl    |  308.292 |  317.049 |
twin       |  274.478 |  266.339 |
git        |  547.687 |  528.171 |
postgresql | 2927.495 | 2167.802 |
sqlite3    | 3264.305 | 1392.721 |

Major speedups in tmux and sqlite (less than half of the time), redis and postgresql were about 25% faster while the rest are basically the same.

Diff Detail

Repository: rC Clang

Event Timeline

mikhail.ramalho created this revision.Jun 25 2018, 1:27 PM

Herald added subscribers: a.sidorin, szepet, xazax.hun. · View Herald TranscriptJun 25 2018, 1:27 PM

mikhail.ramalho added a parent revision: D48561: [Analyzer] Moved RangeConstraintManager to header. NFC..Jun 25 2018, 1:27 PM

Ok, it looks like naively just dropping all the constraints at Z3 is not the most efficient way.

I have a question about this approach though. Right now this approach is intersecting the ranges associated with symbols and symbolic expressions between the new and the old state.
I have the feeling, however, this is redundant work. This is exactly what the analyzer is doing during the analysis, and the information should be already there in the exploded graph.

The original refutation patch used a logic that it added the ranges of a symbolic expression exactly once, where the analyzer has the most information about that expression (when the particular symbol is became dead or it is in the last node).
Using that approach we do not need to compare ranges or intersect them, for this reason, I did find that solution simpler.

So one approach is to merge everything the other is to just pick the right ranges from the right places.

Is there any particular reason that you follow a different path? If yes, I would love to see that documented, to make sure nobody will introdue an optimization that does not work in the future.

The original refutation patch used a logic that it added the ranges of a symbolic expression exactly once, where the analyzer has the most information about that expression

I did not like that approach as it creates tight coupling between the visitor and the analyzer internals which are not normally exposed.
I'm not sure it was correct, as after switching away from that we have noticed that some bugs have disappeared (though IIRC we do not fully understand why).
I think the current approach is actually much cleaner.

Nits: would prefer the visitor in a new file now, clarifications on the merging algorithm required.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2368	This visitor starts to get big, and the logic is quite different to what is usually found in `BugReporterVisitors`. Could we separate it into a different file with it's own header, which would be included by `BugReporter.cpp`?
2408	I'm not sure what happens in this branch. If the intersection is empty, seems like the state should be infeasible? Should this branch be in fact marked unreachable, since the state should not exist otherwise?
2412	I am quite confused on why do we need both `ResultRange` and `R` and why do we assign from one to the other. Maybe this would be fixed if the branch above is marked unreachable?

This revision now requires changes to proceed.Jun 26 2018, 5:42 PM

I think @xazax.hun has a point. We should not be intersecting any ranges because older ranges (that are closer to the root of the graph) are always super-sets of the newer ranges. Essentially, for every symbol we need to take either the final range (if it's present in the last program state) or the latest range (from the last state in which it's present) and feed it to the solver. That's all. No intersections. No PreStmtPurgeDeadSymbols program points. That's a fairly normal visitor workflow - that's exactly we have a pair of nodes as arguments to VisitNode. Just observe how state changes.

mikhail.ramalho retitled this revision from [Analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a function to merge ConstraintSets to [analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a function to merge ConstraintSets.Jun 27 2018, 6:49 AM

In any case, I think @NoQ and @xazax.hun have a valid point here, the CSA already merges the ranges so this patch feels like duplicated work.

I can have another look at @rnkovacs's patch and try to improve the approach here. What do you think, @george.karpenkov?

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2368	Sure.
2408	This is the case where we have an old set: x = [1,2], [10,20] and the new one is: x = [3,4] When intersecting the `[1,2]` with `[3,4]` the range will be empty but both ranges are valid.

george.karpenkov added inline comments.Jun 27 2018, 7:47 PM

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2408	I am confused, when and why would that happen?

mikhail.ramalho mentioned this in D48561: [Analyzer] Moved RangeConstraintManager to header. NFC..Jun 28 2018, 6:56 AM

Implemented the suggested approach (kinda). Instead of adding the constraints when they are removed, this patch adds them when they first appear and, since we walk the bug report backward, it should be the last set of ranges generated by the CSA for a given symbol.

The numbers:

Project    |  current |    v1    |    v2    | 
tmux       |  283.222 |  137.927 |  123.052 |
redis      |  614.858 |  457.721 |  400.347 |
openssl    |  308.292 |  317.049 |  307.149 |
twin       |  274.478 |  266.339 |  245.411 |
git        |  547.687 |  528.171 |  477.335 |
postgresql | 2927.495 | 2167.802 | 2002.526 |
sqlite3    | 3264.305 | 1392.721 | 1028.416 |

current: current version in repo
v1: first version of this patch, which merges the ranges
v2: second version of this patch, which adds the constraints when they first appear.

Overall, v2 is even faster than v1 and the current version across all projects.

@mikhail.ramalho LGTM with a few nits.

I assume you've double checked that the results are the same as what you get without this optimization?

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2386	nit: cleaner to just call `VisitNode(EndPathNode, nullptr, BRC, BR)`
2391	Nit: could drop braces according to LLVM style guide

This revision is now accepted and ready to land.Jun 29 2018, 10:06 AM

In D48565#1148097, @george.karpenkov wrote:

@mikhail.ramalho LGTM with a few nits.

I assume you've double checked that the results are the same as what you get without this optimization?

Yes, all the regressions pass and the same bugs are removed from the projects, with and without this patch.

Small requested changes.

Closed by commit rC336002: [analyzer] Replace the vector of ConstraintSets by a single ConstraintSet and a… (authored by mramalho). · Explain WhyJun 29 2018, 11:16 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: cfe-commits. · View Herald TranscriptJun 29 2018, 11:16 AM

Revision Contents

Path

Size

include/

clang/

StaticAnalyzer/

Core/

BugReporter/

BugReporterVisitors.h

8 lines

lib/

StaticAnalyzer/

Core/

BugReporterVisitors.cpp

46 lines

Diff 153545

include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h

	Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines
	};			};

	/// The bug visitor will walk all the nodes in a path and collect all the			/// The bug visitor will walk all the nodes in a path and collect all the
	/// constraints. When it reaches the root node, will create a refutation			/// constraints. When it reaches the root node, will create a refutation
	/// manager and check if the constraints are satisfiable			/// manager and check if the constraints are satisfiable
	class FalsePositiveRefutationBRVisitor final : public BugReporterVisitor {			class FalsePositiveRefutationBRVisitor final : public BugReporterVisitor {
	private:			private:
	/// Holds the constraints in a given path			/// Holds the constraints in a given path
	// TODO: should we use a set?			ConstraintRangeTy Constraints;
	llvm::SmallVector<ConstraintRangeTy, 32> Constraints;

	public:			public:
	FalsePositiveRefutationBRVisitor() = default;			FalsePositiveRefutationBRVisitor();

	void Profile(llvm::FoldingSetNodeID &ID) const override;			void Profile(llvm::FoldingSetNodeID &ID) const override;

	std::shared_ptr<PathDiagnosticPiece> VisitNode(const ExplodedNode *N,			std::shared_ptr<PathDiagnosticPiece> VisitNode(const ExplodedNode *N,
	const ExplodedNode *PrevN,			const ExplodedNode *PrevN,
	BugReporterContext &BRC,			BugReporterContext &BRC,
	BugReport &BR) override;			BugReport &BR) override;

				void finalizeVisitor(BugReporterContext &BRC, const ExplodedNode *EndPathNode,
				BugReport &BR) override;
	};			};

	namespace bugreporter {			namespace bugreporter {

	/// Attempts to add visitors to trace a null or undefined value back to its			/// Attempts to add visitors to trace a null or undefined value back to its
	/// point of origin, whether it is a symbol constrained to null or an explicit			/// point of origin, whether it is a symbol constrained to null or an explicit
	/// assignment.			/// assignment.
	///			///
	Show All 28 Lines

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp

Show First 20 Lines • Show All 2,343 Lines • ▼ Show 20 Lines	TaintBugVisitor::VisitNode(const ExplodedNode N, const ExplodedNode PrevN,
PathDiagnosticLocation L =		PathDiagnosticLocation L =
PathDiagnosticLocation::createBegin(S, BRC.getSourceManager(), NCtx);		PathDiagnosticLocation::createBegin(S, BRC.getSourceManager(), NCtx);
if (!L.isValid() \|\| !L.asLocation().isValid())		if (!L.isValid() \|\| !L.asLocation().isValid())
return nullptr;		return nullptr;

return std::make_shared<PathDiagnosticEventPiece>(L, "Taint originated here");		return std::make_shared<PathDiagnosticEventPiece>(L, "Taint originated here");
}		}

static bool		static bool areConstraintsUnfeasible(BugReporterContext &BRC,
areConstraintsUnfeasible(BugReporterContext &BRC,		const ConstraintRangeTy &Cs) {
const llvm::SmallVector<ConstraintRangeTy, 32> &Cs) {
// Create a refutation manager		// Create a refutation manager
std::unique_ptr<ConstraintManager> RefutationMgr = CreateZ3ConstraintManager(		std::unique_ptr<ConstraintManager> RefutationMgr = CreateZ3ConstraintManager(
BRC.getStateManager(), BRC.getStateManager().getOwningEngine());		BRC.getStateManager(), BRC.getStateManager().getOwningEngine());

SMTConstraintManager *SMTRefutationMgr =		SMTConstraintManager *SMTRefutationMgr =
static_cast<SMTConstraintManager *>(RefutationMgr.get());		static_cast<SMTConstraintManager *>(RefutationMgr.get());

// Add constraints to the solver		// Add constraints to the solver
for (const auto &C : Cs)		SMTRefutationMgr->addRangeConstraints(Cs);
SMTRefutationMgr->addRangeConstraints(C);

// And check for satisfiability		// And check for satisfiability
return SMTRefutationMgr->isModelFeasible().isConstrainedFalse();		return SMTRefutationMgr->isModelFeasible().isConstrainedFalse();
}		}

		static void addNewConstraints(ConstraintRangeTy &Cs,
		george.karpenkovUnsubmitted Not Done Reply Inline Actions This visitor starts to get big, and the logic is quite different to what is usually found in `BugReporterVisitors`. Could we separate it into a different file with it's own header, which would be included by `BugReporter.cpp`? george.karpenkov: This visitor starts to get big, and the logic is quite different to what is usually found in…
		mikhail.ramalhoAuthorUnsubmitted Not Done Reply Inline Actions Sure. mikhail.ramalho: Sure.
		const ConstraintRangeTy &NewCs,
		ConstraintRangeTy::Factory &CF) {
		// Add constraints if we don't have them yet
		for (auto const &C : NewCs) {
		const SymbolRef &Sym = C.first;
		if (!Cs.contains(Sym)) {
		Cs = CF.add(Cs, Sym, C.second);
		}
		}
		}

		FalsePositiveRefutationBRVisitor::FalsePositiveRefutationBRVisitor()
		: Constraints(ConstraintRangeTy::Factory().getEmptyMap()) {}

		void FalsePositiveRefutationBRVisitor::finalizeVisitor(
		BugReporterContext &BRC, const ExplodedNode *EndPathNode, BugReport &BR) {
		// Collect new constraints
		VisitNode(EndPathNode, nullptr, BRC, BR);
		george.karpenkovUnsubmitted Not Done Reply Inline Actions nit: cleaner to just call `VisitNode(EndPathNode, nullptr, BRC, BR)` george.karpenkov: nit: cleaner to just call `VisitNode(EndPathNode, nullptr, BRC, BR)`

		// Create a new refutation manager and check feasibility
		if (areConstraintsUnfeasible(BRC, Constraints))
		BR.markInvalid("Infeasible constraints", EndPathNode->getLocationContext());
		}
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Nit: could drop braces according to LLVM style guide george.karpenkov: Nit: could drop braces according to LLVM style guide

std::shared_ptr<PathDiagnosticPiece>		std::shared_ptr<PathDiagnosticPiece>
FalsePositiveRefutationBRVisitor::VisitNode(const ExplodedNode *N,		FalsePositiveRefutationBRVisitor::VisitNode(const ExplodedNode *N,
const ExplodedNode *PrevN,		const ExplodedNode *PrevN,
BugReporterContext &BRC,		BugReporterContext &BRC,
BugReport &BR) {		BugReport &BR) {
// Collect the constraint for the current state		// Collect new constraints
const ConstraintRangeTy &CR = N->getState()->get<ConstraintRange>();		addNewConstraints(Constraints, N->getState()->get<ConstraintRange>(),
Constraints.push_back(CR);		N->getState()->get_context<ConstraintRange>());

// If there are no predecessor, we reached the root node. In this point,
// a new refutation manager will be created and the path will be checked
// for reachability
if (PrevN->pred_size() == 0 && areConstraintsUnfeasible(BRC, Constraints)) {
BR.markInvalid("Infeasible constraints", N->getLocationContext());
}

return nullptr;		return nullptr;
}		}

void FalsePositiveRefutationBRVisitor::Profile(		void FalsePositiveRefutationBRVisitor::Profile(
llvm::FoldingSetNodeID &ID) const {		llvm::FoldingSetNodeID &ID) const {
static int Tag = 0;		static int Tag = 0;
ID.AddPointer(&Tag);		ID.AddPointer(&Tag);
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I'm not sure what happens in this branch. If the intersection is empty, seems like the state should be infeasible? Should this branch be in fact marked unreachable, since the state should not exist otherwise? george.karpenkov: I'm not sure what happens in this branch. If the intersection is empty, seems like the state…
		mikhail.ramalhoAuthorUnsubmitted Not Done Reply Inline Actions This is the case where we have an old set: x = [1,2], [10,20] and the new one is: x = [3,4] When intersecting the `[1,2]` with `[3,4]` the range will be empty but both ranges are valid. mikhail.ramalho: This is the case where we have an old set: ``` x = [1,2], [10,20] ``` and the new one is: ``` x…
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I am confused, when and why would that happen? george.karpenkov: I am confused, when and why would that happen?
}		}
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I am quite confused on why do we need both `ResultRange` and `R` and why do we assign from one to the other. Maybe this would be fixed if the branch above is marked unreachable? george.karpenkov: I am quite confused on why do we need both `ResultRange` and `R` and why do we assign from one…