This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Track assume call stack to detect fixpoint
ClosedPublic

Authored by martong on May 27 2022, 11:55 AM.

Details

Summary

Assume functions might recurse (see reAssume or tryRearrange). During the
recursion, the State might not change anymore, that means we reached a
fixpoint. In this patch, we avoid infinite recursion of assume calls by
checking already visited States on the stack of assume function calls. This
patch renders the previous "workaround" solution (D47155) unnecessary.
Note that this is not an NFC patch. If we were to limit the maximum stack
depth of the assume calls to 1 then would it be equivalent with the previous
solution in D47155.

Additionally, in D113753, we simplify the symbols right at the beginning of evalBinOpNN.
So, a call to simplifySVal in getKnownValue (added in D51252) is no longer needed.

Diff Detail

Event Timeline

martong created this revision.May 27 2022, 11:55 AM
martong requested review of this revision.May 27 2022, 11:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 11:55 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
martong updated this revision to Diff 433097.May 31 2022, 8:34 AM
  • Add AssumeStack to track recursive assume calls
martong retitled this revision from [analyzer][NFC] SimpleSValBuilder simplification: Remove superfluous workaround code to [analyzer][NFC] SimpleSValBuilder simplification: Remove superfluous workaround code and track Assume call stack rather.May 31 2022, 8:38 AM
martong edited the summary of this revision. (Show Details)
martong edited the summary of this revision. (Show Details)

This isn't an NFC change. It's more than a refactor/cosmetic change.
Please run some benchmarks to validate the assumed performance characteristics.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
156

Have you considered llvm::SmallPtrSet ?

clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
56–59

Why don't you return RawSt on the early return path instead? That seems to contain more information than State, right?

You lookup twice the location of the RawSt entry; once in the hasCycle() and again in the push().
I would recommend doing an unconditional insert() directly and reusing the returned bool part of the pair to observe if it did actually insert.

martong marked 2 inline comments as done.Jun 1 2022, 1:52 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
156

Actually, a SmallSet of pointers is transparently implemented with a SmallPtrSet.

clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
56–59

Why don't you return RawSt on the early return path instead? That seems to contain more information than State, right?

No, RawSt contains the same information as State.

You lookup twice the location of the RawSt entry; once in the hasCycle() and again in the push().
I would recommend doing an unconditional insert() directly and reusing the returned bool part of the pair to observe if it did actually insert.

I am considering to change the container to a SmallVector instead. I'd like to keep the interface of hasCycle and push and pop because those express the intention more clearly. And SmallSet falls back to a simple linear search anyway if size is less than N.

martong marked 2 inline comments as done.Jun 1 2022, 1:54 AM

This isn't an NFC change. It's more than a refactor/cosmetic change.

Still, there is no visible change in the behavior, at least that is intended. Should it be an NFCi ?

Please run some benchmarks to validate the assumed performance characteristics.

Yeah I've thought of it, here it is:

martong updated this revision to Diff 433324.Jun 1 2022, 2:09 AM
martong edited the summary of this revision. (Show Details)
  • Use SmallVector in AssumeStack

This isn't an NFC change. It's more than a refactor/cosmetic change.

Still, there is no visible change in the behavior, at least that is intended. Should it be an NFCi ?

Please run some benchmarks to validate the assumed performance characteristics.

Yeah I've thought of it, here it is:

Yes, let's go with NFCi.

clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
56–59

I'd like to keep the interface of hasCycle and push and pop because those express the intention more clearly.

I disagree. Double lookups are always a code-smell. I insist on this.

martong updated this revision to Diff 433353.Jun 1 2022, 4:37 AM
  • Remove FIXME comment
martong added inline comments.Jun 1 2022, 4:57 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
140–146

This hunk is probably also superfluous. With tracking the assume stack below, it should be safe to call assume even from evalAssume. But, I'd rather remove that in a subsequent patch, because checkNulls behavior is dependent on that.

steakhal accepted this revision.Jun 1 2022, 7:09 AM
steakhal added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
21

Unused.

140–146

Oh, I haven't thought about that. Thanks. Feel free to do in a followup!

154

Use llvm::is_contained() instead.

clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
60

Don't we call these 'Guards'?

This revision is now accepted and ready to land.Jun 1 2022, 7:09 AM

I've made some measurements with the diff down below. So, it seems like the chance of being a recursive call is very unlikely (roughly 1/10000). Thus I am going to update this condition with LLVM_UNLIKELY. Besides, the depth of the call stack seems to be less than 4 in the majority of cases (99.98%). However, there are edge cases (ffmpeg) when the depth is even bigger than 16. Consequently, a SmallPtrSet with 4 as the small buffer size would be better than the SmallVector.

Attached the csa-testbanch results.

Plus, attached a summarizing excel sheet about the assume stack depth (courtesy of @steakhal).

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
index f1b75a57115b..097d0ff8d162 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
@@ -153,6 +153,7 @@ protected:
     bool contains(const ProgramState *S) const {
       return llvm::find(Aux, S) != Aux.end();
     }
+    size_t size() { return Aux.size(); }

   private:
     llvm::SmallVector<const ProgramState *, 4> Aux;
diff --git a/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp b/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
index 0866d3d0db74..11bbda9dd899 100644
--- a/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
@@ -17,10 +17,25 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
 #include "llvm/ADT/ScopeExit.h"
+#include "llvm/ADT/Statistic.h"

 using namespace clang;
 using namespace ento;

+#define DEBUG_TYPE "CoreEngine"
+
+STATISTIC(NumAssume, "The # of assume calls");
+STATISTIC(NumAssumeRecurse, "The # of recursive assume calls");
+STATISTIC(AssumeStackSize01, "The # of assume stack size between 0 and 1");
+STATISTIC(AssumeStackSize23, "The # of assume stack size between 2 and 3");
+STATISTIC(AssumeStackSize45, "The # of assume stack size between 4 and 5");
+STATISTIC(AssumeStackSize67, "The # of assume stack size between 6 and 7");
+STATISTIC(AssumeStackSize89, "The # of assume stack size between 8 and 9");
+STATISTIC(AssumeStackSize1011, "The # of assume stack size between 10 and 11");
+STATISTIC(AssumeStackSize1213, "The # of assume stack size between 12 and 13");
+STATISTIC(AssumeStackSize1415, "The # of assume stack size between 14 and 15");
+STATISTIC(AssumeStackSize16XX, "The # of assume stack size greater equal than 16");
+
 ConstraintManager::~ConstraintManager() = default;

 static DefinedSVal getLocFromSymbol(const ProgramStateRef &State,
@@ -53,9 +68,30 @@ ConstraintManager::assumeDualImpl(ProgramStateRef &State,
   // fixpoint.
   // We avoid infinite recursion of assume calls by checking already visited
   // States on the stack of assume function calls.
+  ++NumAssume;
+  if (AssumeStack.size() < 2)
+    ++AssumeStackSize01;
+  else if (AssumeStack.size() < 4)
+    ++AssumeStackSize23;
+  else if (AssumeStack.size() < 6)
+    ++AssumeStackSize45;
+  else if (AssumeStack.size() < 8)
+    ++AssumeStackSize67;
+  else if (AssumeStack.size() < 10)
+    ++AssumeStackSize89;
+  else if (AssumeStack.size() < 12)
+    ++AssumeStackSize1011;
+  else if (AssumeStack.size() < 14)
+    ++AssumeStackSize1213;
+  else if (AssumeStack.size() < 16)
+    ++AssumeStackSize1415;
+  else
+    ++AssumeStackSize16XX;
   const ProgramState *RawSt = State.get();
-  if (AssumeStack.contains(RawSt))
+  if (AssumeStack.contains(RawSt)) {
+    ++NumAssumeRecurse;
     return {State, State};
+  }
   AssumeStack.push(RawSt);
   auto AssumeStackBuilder =
       llvm::make_scope_exit([this]() { AssumeStack.pop(); });
martong retitled this revision from [analyzer][NFC] SimpleSValBuilder simplification: Remove superfluous workaround code and track Assume call stack rather to [analyzer] Track assume call stack to detect fixpoint.Jun 2 2022, 4:42 AM
martong edited the summary of this revision. (Show Details)
martong marked 4 inline comments as done.Jun 2 2022, 5:18 AM

Consequently, a SmallPtrSet with 4 as the small buffer size would be better than the SmallVector

Ahh, SmallVector is still better, because with the set we need to do a search in the `erase, which compares badly to a simple decrement in the vector's pop_back.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
21

Thx.

154

Ok.

clang/lib/StaticAnalyzer/Core/ConstraintManager.cpp
60

Yeah, we tend to call them guards in the context of exception safety and resource management. But, I think, "builder" is more expressive here.

martong updated this revision to Diff 433719.Jun 2 2022, 5:20 AM
martong marked 3 inline comments as done.
martong edited the summary of this revision. (Show Details)
  • Add LLVM_UNLIKELY
steakhal accepted this revision.Jun 2 2022, 6:03 AM

Looks good!

Express why this is not an NFC patch in the summary.

martong edited the summary of this revision. (Show Details)Jun 2 2022, 6:24 AM

Measurement results on the last diff looks good.

This revision was landed with ongoing or failed builds.Jun 6 2022, 11:36 PM
This revision was automatically updated to reflect the committed changes.
uabelho added a subscriber: uabelho.Jun 7 2022, 6:37 AM

New assertion failure:

llvm-project/clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp:72: bool isLeftShiftResultUnrepresentable(const clang::BinaryOperator *, clang::ento::CheckerContext &): Assertion `LHS && RHS && "Values unknown, inconsistent state"' failed.

Reproduction:

cat > extent.c <<EOF
void crashing(long a, _Bool b) {
  a & 1 && 0;
  b = a & 1;
  b << 1;
}
EOF
build/release/bin/clang --analyze -Xclang -analyzer-checker=core extent.c

The important part of the stack trace:

isLeftShiftResultUnrepresentable(clang::BinaryOperator const*, clang::ento::CheckerContext&) llvm-project/clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp:73:20
UndefResultChecker::checkPostStmt(clang::BinaryOperator const*, clang::ento::CheckerContext&) const llvm-project/clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp:154:18

Please @martong have a look at this.

Thanks for the report. I am looking into it.