This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Canonicalize SymIntExpr so the RHS is positive when possible
ClosedPublic

Authored by tomasz-kaminski-sonarsource on Apr 29 2022, 12:48 AM.

Details

Summary

This PR changes the SymIntExpr so the expression that uses
negative value as RHS, for example: x +/- (-N), are modeled as
as x -/+ N. This avoid producing very large RHS in case when the
symbol is cased to unsigned number, and as consequence makes the
value more robust in presence of cast.
This change is not apllied if N is lowest negative value for
which negation would not be representable.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 12:48 AM
Herald added a subscriber: martong. · View Herald Transcript
tomasz-kaminski-sonarsource requested review of this revision.Apr 29 2022, 12:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 12:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Without the changes, out of the following unsatisfiable conditions:
A) (l - 1000) > 0
B) l > 1000
C) l > 1000L
D) (l + 0L) > 1000
CSA was considering A and B to satisable. The l is (long)(x + 1 - 10U) and x is in range [10, 100] so no overflow happens.

steakhal retitled this revision from CSA Normalize SymIntExpr so the RHS is positive when possible to [analyzer] Canonicalize SymIntExpr so the RHS is positive when possible.Apr 29 2022, 3:14 AM
steakhal edited reviewers, added: martong, ASDenysPetrov, Szelethus; removed: vsavchenko.

It looks good at first glance.

For the review, it would be nice to see which clang_analyzer_dumps and reachables change with the patch.
Could you mark the ones which have different results depending on whether you apply your fix or not?
E.g put a no-warning where we previously had a report, but now we don't.

Without the changes, out of the following unsatisfiable conditions:
A) (l - 1000) > 0
B) l > 1000
C) l > 1000L
D) (l + 0L) > 1000
CSA was considering A and B to satisable. The l is (long)(x + 1 - 10U) and x is in range [10, 100] so no overflow happens.

Ah, I see. It would be still nice to see a comment in the test :D

PS: we tend to use regular expression matching for masking out the number in the symbol dump: expected-warning-re {{(reg_${{[0-9]+}}<int x>) - 1}}

steakhal added inline comments.Apr 29 2022, 4:27 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
204

I would rather swap these branches though, to leave the default case (aka. this) to the end.

212–219

Somehow I miss a check for signedness here.
Why do you think it would be only triggered for signed types?

I have a guess, that since we already handled x +-0, SymIntExprs like x - (-0) cannot exist here, thus cannot trigger this condition spuriously. I cannot think of any ther example that could cause this misbehaving. So in that sense ConvertedRHSValue > resultIntTy.getMinValue() implies *at this place* that ConvertedRHSValue.isSigned().
I would rather see this redundant check here to make the correctness reasoning local though.

658–664

It feels like this is a completely different problem compared to the simplification of X - (-N) to X + N.
If this is the case, I would rather split that part into a separate patch instead.

clang/test/Analysis/additive-op-on-sym-int-expr.c
2

You assume that int has 8 bytes and that 1 byte is 8 bits. Pin the target triple.

12
36–37

You could simply assign the right value directly to i AFAIK.

This avoid producing very large RHS in case when the symbol is cased to unsigned number, and as consequence makes the value more robust in presence of cast.

Could you please elaborate on this? I understand that you'd like to simplify certain binary operations by merging the RHS and the operand, however, I don't see what are the empirical problems that this patch supposed to fix. E.g. did you encounter a crash without the fix, or was it the mentioned infeasible state ((l - 1000) > 0) that caused a false positive? If that is the case, I believe the huge cast patch is going to solve it. Could you please check if those infeasible cases are solved by the mentioned D103096 (you have to set support-symbolic-integer-casts=true) ?

clang/test/Analysis/additive-op-on-sym-int-expr.c
19

And same below for the other infeasible cases.

tomasz-kaminski-sonarsource marked 2 inline comments as done.Apr 29 2022, 7:13 AM

Could you please elaborate on this? I understand that you'd like to simplify certain binary operations by merging the RHS and the operand, however, I don't see what are the empirical problems that this patch supposed to fix. E.g. did you encounter a crash without the fix, or was it the mentioned infeasible state ((l - 1000) > 0) that caused a false positive?

Yes, the patch is aiming to fix false positives raised from unfeasible paths, that were guarded by conditions that encountered this problem.

If that is the case, I believe the huge cast patch is going to solve it. Could you please check if those infeasible cases are solved by the mentioned D103096 (you have to set support-symbolic-integer-casts=true) ?

Indeed all false positives that I am fixing with this patch would be also fixed by proper modeling of the cast of the integers.
However this patch:

  • does not in any way affect the correctness of the symbol values, or the upcoming patch
  • fixes them even without that support-symbolic-integer-casts enabled, and with nearly zero performance impact (two additional comparisons)
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
204

I folded the RHS.isNegative() into the if for the BinaryOperator::isAssociative(op), as same conversion is performed in final else branch.

212–219

The integer representation does not have negative zeros (the standard and clang assume two's complement). However, this condition does need to check for the signedness of the types. What I mean is that if the RHS is negative, but ConvertedRHSValue the branch will trigger and we will change x - INT_MIN to x + (INT_MAX + 1)U which is ok, as a negation of INT_MIN is representable as an unsigned type of same or lager bit with.

212–219

However, I was not able to reach this point with RHS being signed, and resultTy being unsigned. Any hints how this could be done?

658–664

Without the change, this code would produce X -(-N) cases, that would be detected for signed numbers. This would be handled for signed integers later by the simplification but would lead to double conversion. However, if IntType is is unsigned and first < last is true, then we will produce x + extremely large number, which would not be later simplified to X - (last - first).

In other words, without applying this change, the normalization of x -(-N) would provide no benefit. And only this change without the initialial normalization would fail in case of:

i = x + (-1);
l = i - 10U;

Uploading updated diff with suggested fixes.

steakhal added inline comments.May 2 2022, 5:18 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
204

I think what confused me is that a different API is used for doing the conversion.

  • resultIntTy.convert(RHS)
  • &BasicVals.Convert(resultTy, RHS)

Anyway, leave it as-is.

212–219

I'm not saying that I can follow this thought process. But the clang/test/Analysis/PR49642.c would trigger an assertion like this:

diff --git a/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp b/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 088c33c8e612..7e59309228e1 100644
--- a/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -207,6 +207,16 @@ SVal SimpleSValBuilder::MakeSymIntVal(const SymExpr *LHS,
            "number of bits as its operands.");
 
     llvm::APSInt ConvertedRHSValue = resultIntTy.convert(RHS);
+    if (RHS.isSigned() && resultTy->isUnsignedIntegerOrEnumerationType()) {
+      llvm::errs() << "LHS sym:\n";
+      LHS->dump();
+      llvm::errs() << "RHS integral:\n";
+      RHS.dump();
+      llvm::errs() << "OP: " << BinaryOperator::getOpcodeStr(op) << "\n";
+      llvm::errs() << "result type:\n";
+      resultTy->dump();
+      llvm_unreachable("how is it possible??");
+    }
     // Check if the negation of the RHS is representable,
     // i.e., the resultTy is signed, and it is not the lowest
     // representable negative value.

Which can be reduced into this one:

// RUN: %clang_analyze_cc1 -Wno-implicit-function-declaration -w -verify %s \
// RUN:   -analyzer-checker=core \
// RUN:   -analyzer-checker=apiModeling.StdCLibraryFunctions

// expected-no-diagnostics

typedef int ssize_t;
int write(int, const void *, unsigned long);
unsigned c;
void a() {
  int b = write(0, 0, c);
  b != 0;
  c -= b;
  b < 1;
  ++c; // crash simplifySValOnce: derived_$4{conj_$1{int, LC1, S700, #1},c}  op(-)  APInt(32b, 4294967295u -1s)  :: unsigned int
}
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
204

As a note, the use of different APIs was intentional. The BasicVals one is persisting the value, so it is safe to use ptr to it, as a consequence it is more costfull. So, I am delaying its use until I know I will need to persist the value.

212–219

What I mean, is that performing normalization (op and sign switch) is always correct for the unsigned resultInTy, even if RHS is the lowest representable negative number. The code is already behaving correctly in that case (I have verified your example), as the ConvertedRHSValue > resultIntTy.getMinValue() is always passing in a situation when resultIntTy.isUnsigned() is true (zero was eliminated before), so I left simple check.

But, now I see that this is confusing, so I have updated the check to be more explicit and updated the comment.

Just a few nits left.
Consider marking 'done' the corresponding boxes.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
211

It feels odd that the comments refer to resultTy, but you then use resultIntTy instead, and both of these entities are alive at this scope.

215

These should be the same.

tomasz-kaminski-sonarsource marked 7 inline comments as done.May 5 2022, 1:25 AM
tomasz-kaminski-sonarsource marked 2 inline comments as done.
steakhal accepted this revision.May 5 2022, 4:52 AM

Seems right. Thanks.

This revision is now accepted and ready to land.May 5 2022, 4:52 AM

I do not have commit rights. Would it be possible to you to commit the changes on my behalf?

steakhal reopened this revision.May 6 2022, 12:37 AM

This patch triggers a crash with this minimized example.
assertion at L205: "The result operation type must have at least the same number of bits as its operands."

// build/bin/clang -cc1 -triple x86_64-unknown-linux-gnu -analyze -analyzer-checker=core,apiModeling repro.c

typedef long ssize_t;
ssize_t write(int, const void *, unsigned long);

int crash(int x, int fd) {
  unsigned wres = write(fd, "a", 1);
  if (wres) {}
  int t1 = x - wres;
  if (wres < 0) {}
  return x + t1;
}

The debugger showed these values:

op: BO_Sub
resultTy: unsigned int
resultIntTy.isUnsigned(): true
Sym: reg<int x>
RHSValue: APInt(64b, 18446744073709551615u -1s)
resultIntTy.getBitWidth(): 32
RHS.getBitWidth(): 64

Please investigate this @tomasz-kaminski-sonarsource

This revision is now accepted and ready to land.May 6 2022, 12:37 AM
uabelho added a subscriber: uabelho.May 6 2022, 1:45 AM

This patch triggers a crash with this minimized example.
assertion at L205: "The result operation type must have at least the same number of bits as its operands."
[...]
Please investigate this @tomasz-kaminski-sonarsource

Hi,

I've stumbled on this crash as well with this patch.
So

clang --analyze foo.c

crashes with

clang: ../../clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:207: clang::ento::SVal (anonymous namespace)::SimpleSValBuilder::MakeSymIntVal(const clang::ento::SymExpr *, BinaryOperator::Opcode, const llvm::APSInt &, clang::QualType): Assertion `resultIntTy.getBitWidth() >= RHS.getBitWidth() && "The result operation type must have at least the same " "number of bits as its operands."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /repo/uabelho/master-github/llvm/build-all/bin/clang --analyze foo.c
1.	<eof> parser at end of file
2.	While analyzing stack: 
	#0 Calling h
3.	foo.c:9:9: Error evaluating statement
4.	foo.c:9:9: Error evaluating statement
 #0 0x0000000002e4fbf3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x2e4fbf3)
 #1 0x0000000002e4d86e llvm::sys::RunSignalHandlers() (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x2e4d86e)
 #2 0x0000000002e4ef92 llvm::sys::CleanupOnSignal(unsigned long) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x2e4ef92)
 #3 0x0000000002db4e03 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #4 0x0000000002db4fbd CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #5 0x00007fd21e2ce630 __restore_rt sigaction.c:0:0
 #6 0x00007fd21ba15387 raise (/lib64/libc.so.6+0x36387)
 #7 0x00007fd21ba16a78 abort (/lib64/libc.so.6+0x37a78)
 #8 0x00007fd21ba0e1a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #9 0x00007fd21ba0e252 (/lib64/libc.so.6+0x2f252)
#10 0x000000000467e0bb (anonymous namespace)::SimpleSValBuilder::MakeSymIntVal(clang::ento::SymExpr const*, clang::BinaryOperatorKind, llvm::APSInt const&, clang::QualType) SimpleSValBuilder.cpp:0:0
#11 0x000000000467a94f (anonymous namespace)::SimpleSValBuilder::evalBinOpNN(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::NonLoc, clang::ento::NonLoc, clang::QualType) SimpleSValBuilder.cpp:0:0
#12 0x0000000004688f46 clang::ento::SValBuilder::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x4688f46)
#13 0x000000000467f28e clang::ento::SymExprVisitor<(anonymous namespace)::SimpleSValBuilder::simplifySValOnce(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SVal)::Simplifier, clang::ento::SVal>::Visit(clang::ento::SymExpr const*) SimpleSValBuilder.cpp:0:0
#14 0x000000000467eccc (anonymous namespace)::SimpleSValBuilder::simplifySValOnce(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SVal) SimpleSValBuilder.cpp:0:0
#15 0x000000000467d814 (anonymous namespace)::SimpleSValBuilder::simplifySVal(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SVal) SimpleSValBuilder.cpp:0:0
#16 0x0000000004679ec3 (anonymous namespace)::SimpleSValBuilder::evalBinOpNN(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::NonLoc, clang::ento::NonLoc, clang::QualType) SimpleSValBuilder.cpp:0:0
#17 0x0000000004688f46 clang::ento::SValBuilder::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x4688f46)
#18 0x00000000045f509c clang::ento::ExprEngine::VisitBinaryOperator(clang::BinaryOperator const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45f509c)
#19 0x00000000045e201a clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45e201a)
#20 0x00000000045dd27e clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45dd27e)
#21 0x00000000045dcf99 clang::ento::ExprEngine::processCFGElement(clang::CFGElement, clang::ento::ExplodedNode*, unsigned int, clang::ento::NodeBuilderContext*) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45dcf99)
#22 0x00000000045c5300 clang::ento::CoreEngine::HandlePostStmt(clang::CFGBlock const*, unsigned int, clang::ento::ExplodedNode*) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45c5300)
#23 0x00000000045c437a clang::ento::CoreEngine::ExecuteWorkList(clang::LocationContext const*, unsigned int, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x45c437a)
#24 0x00000000041f10dc (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*, void> >*) AnalysisConsumer.cpp:0:0
#25 0x00000000041d3e6a (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) AnalysisConsumer.cpp:0:0
#26 0x00000000046c5f03 clang::ParseAST(clang::Sema&, bool, bool) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x46c5f03)
#27 0x00000000038748a6 clang::FrontendAction::Execute() (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x38748a6)
#28 0x00000000037e8fc4 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x37e8fc4)
#29 0x0000000003932562 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x3932562)
#30 0x0000000000a0feac cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0xa0feac)
#31 0x0000000000a0dccf ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#32 0x00000000036577e2 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::$_1>(long) Job.cpp:0:0
#33 0x0000000002db4d21 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x2db4d21)
#34 0x00000000036571d8 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x36571d8)
#35 0x000000000361abc6 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x361abc6)
#36 0x000000000361b11a clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x361b11a)
#37 0x0000000003637778 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/repo/uabelho/master-github/llvm/build-all/bin/clang+0x3637778)
#38 0x0000000000a0d571 main (/repo/uabelho/master-github/llvm/build-all/bin/clang+0xa0d571)
#39 0x00007fd21ba01555 __libc_start_main (/lib64/libc.so.6+0x22555)
#40 0x0000000000a0a90c _start (/repo/uabelho/master-github/llvm/build-all/bin/clang+0xa0a90c)

with foo.c being

long a;
char c;
long(d)(long f, long g) { return g ? f : 0; }
void h() {
  for (;;) {
    long *e = h;
    d(-5L, *e) == a;
    char b = a;
    c = b + c;
  }
}

Thank you for reverting the change.
I am looking into the crashes. The simplest solution is to not apply normalization in such a situation. But, want to understand better how such values are produced in the first case, and if performing normalization would make sense in such case.

I have removed the assertions and updated the code to handle both extensions of reductions of bitwith from RHS to ResultType.
Expanded test, to cover the above scenarios.

@uabelho Could you please apply this change and check if it resolves your crashes?
I'm gonna do the same on our side.

Added missing casts in truncatingToUnsigned tests.

The patch doesn't apply to me.

Updated patch to match master after revert.

Updated patch to match master after revert.

It's still messed up. Please double-check.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
117

else after return is unnecessary.

Not sure if I'm doing something wrong but I can't apply this patch on current top of tree (a4190037fac0)

steakhal reopened this revision.May 9 2022, 1:37 AM

When I download the "raw diff" using the Download Raw Diff button on this page, then I try to git apply it still fails. (error 219: new blank line at EOF.)
Anyway, I managed to apply your path by hand, but now some tests fail.

This revision is now accepted and ready to land.May 9 2022, 1:37 AM
steakhal requested changes to this revision.May 9 2022, 1:37 AM
This revision now requires changes to proceed.May 9 2022, 1:37 AM

It looks like I got change in clang/test/Analysis/expr-inspection.c while updating the patch. :/
Now rebasing, rebuilding llvm.

Firstly, my apologies for the back and forth. My repo was in some strange state, and I wasn't picking up all the changes that got reverted in my diff (this is why the patch was not applying and the test failed later). Uploading a clean diff.

This time I've checked this on our test set, and it seems to run without any issues. I'm still waiting for the results of analyzing LLVM, but I think safe to say that this should work.
Could you please give this another shoot at this @uabelho?

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
107–122

Please follow the naming conventions of LLVM, so variables are UpperCase, and the & sign binds to the variable name.
https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
I'm not sure why clang-format did not complain about these.

Also, taking a parameter by-const-value is discouraged, since it has no visible effect.

[...]

Could you please give this another shoot at this @uabelho?

Yes, the case that crashed for me works with the updated patch. Haven't done other testing.

Changed variable name and formatting to match coding style.

steakhal accepted this revision.May 11 2022, 2:41 AM

Everything passed this time. Thanks for the patience.

This revision is now accepted and ready to land.May 11 2022, 2:41 AM

Thank you for additional checks. Unfortunatelly, I am not able to perform commits for this and next wee (up to 20th of May), so I wonder if you could commit this one for me.

This revision was landed with ongoing or failed builds.May 12 2022, 6:40 AM
This revision was automatically updated to reflect the committed changes.