With this patch, the solver can infer results for not equal to operator
over Ranges as well. This also fixes the issue of comparison between
different types, by first converting the RangeSets to resulting type,
which then can be used for comparisons.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Please also demonstrate that the patch can deal with sign mismatching ranges.
Aside from that, it looks clean and sweet.
Although, I still miss the point of how it would bypass anything.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1622 | I would probably host this logic into a Range::intersects(Range Other) const member function. | |
1626–1631 | Technically, it's correct, but I would rather follow a similar logic to what the comment would suggest. |
It seems like the test does not exercise the modified code segment.
Please investigate, and make sure that the code you submit is actually exercised by the test you provide.
I have made few changes:
- The failed assertions due to comparison between different types have been fixed by converting all the Ranges to a given type. This will allow comparisons without throwing errors.
- There was also a build error due to explicit specialization in non-namespace scope. This was fixed by @martong previously, but that patch led to the above mentioned bug. So I used @martong 's patch here to make GCC happy.
- I have added a small check for comparison between different types.
https://reviews.llvm.org/D106102 differential contains the background story.
I'm not quite happy.
Next time please make sure that the tests you provide actually exercise the changed code segments.
I deleted all the old lines of the constant-folding.c test file and placed a llvm_unreachable() into SymbolicRangeInferrer::VisitBinaryOperator<BO_NE>() and the test did not cause a crash, which is a clear indication that the test does not test the modified code segments.
Unless I'm overly anxious I would draw false conclusions about the results of the tests and if I approve we could get crashes and complaints, and reverts.
I would kindly ask you to double-check your tests prior to submitting them to review.
About the technical content of this patch.
I would recommend following the same pattern as in the rest of the functions. So, fillGaps() then convert(). That way you would reduce the time-complexity of your code to O(1) from O(N).
I would like to ask for a coverage report that shows that all of the branches of the modified code have been exercised by the constant-folding.c test file.
That being said, I would like to thank you for your effort in fixing this, I'm really looking forward!
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1661–1662 | This and the subsequent similar block could be implemented by a member function of a RangeSet. | |
1668–1670 | ConvertedLHS.getConcreteValue() && ConvertedLHS.getConcreteValue() == ConvertedRHS.getConcreteValue() | |
clang/test/Analysis/constant-folding.c | ||
326 | I would assume from [x, y] that x <= y. |
Fix test cases to make them reachable via VisiBinaryOperator<BO_NE> and using getConcreteValue()
@steakhal apologies for holding onto this for so long. I managed to fix previously untestable test cases. The issue was that I was building expressions as (u1 != u2) != 0 and the solver was canonicalizing this to an equivalent BO_EQ expression. That's why, it wasn't reaching VisitBinaryOperator<BO_NE>(). So I changed all tests as following: (u1 != u2) (removing the latter != 0 part. Also, I utilized the getConcreteValue() for checking of single APSInt in the RangeSets.
Apart from this, I am building for the coverage reports (which I will upload later).
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1661–1662 | I agree. I will try to find similarities which can be extracted from remaining binops and put them I a member function. |
Here are my remarks.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1628–1629 | Avoid filling the gaps, because it's completely possible to compare all the ranges. | |
1633–1634 | The ResultType of BO_NE is bool. You don't need to convert your integer ranges to boolean. Otherwise, you'll lose the information to compare. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1643–1646 | You can simply check an intersection (RangeSet::Factory::intersect) between the ranges. | |
1650–1654 | This is OK but check ConvertedCoarseRHS->getConcreteValue() for nullptr before getting the value. | |
clang/test/Analysis/constant-folding.c | ||
339 | I'd like to see the cases with concrete integers as well. |
The coverage showing unreachability of VisitBinaryOperator<BO_NE> for concrete integer cases.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1628–1629 | If we don't fill gaps, then we will be making this method O(n) instead of O(1) as of now. As I see it, filling RHS (and not filling LHS), it can iterate over all ranges in LHS, and check each range's intersection with CoarseRHS. Before, I do this, I just wanted to point out the unreachability of concrete cases to this method. I have tried to explain this below. | |
clang/test/Analysis/constant-folding.c | ||
339 | I have checked the coverage reports which show that concrete values are not reaching VisitBinaryOperator<BO_NE>. This is due to them being trivially inferred in RangedConstraintManager.cpp. I attached the coverage for RangeConstraintManager.cpp. I think that I should remove the handling of concrete APSInt all together from SymbolicRangeInferrer for above reason. What do you guys think? |
I'm sorry but it seems like I brought you a new work :-) I've just loaded these two patches (D129678 and D129498) and now you have to rebase your changes. But there is a good news as well. You will be able to use clang_analyzer_value function for your tests if needed.
I appreciate your efforts but I should tell that I'm currently working on the thing that should resolve the issue you are trying to solve D103096.
The coverage showing unreachability of VisitBinaryOperator<BO_NE> for concrete integer cases.
Maybe it's better to remove that unreachable part but leave the tests for concrete ints just to verify that all the cases are covered.
Also I expect to see test case for short-ushort, char-uchar pairs, because if it would turn out that the int-uint is only pair that we handle, the patch would be disappointing, unfortunately.
Remove filling gaps and convert, use castTo, and add tests for short-ushort,
char-uchar pairs
Considering @ASDenysPetrov 's example of LHS = [1, 2] U [8, 9] and RHS = [5, 6], I constructed a test case as following:
(((u1 >= 1 && u1 <= 2) || (u1 >= 8 && u1 <= 9)) && u2 >= 5 && u2 <= 6)
but I can see that the analyzer is bifurcating paths at the OR operator. Essentially, there are two diverged paths:
- 1 <= u1 && u1 <= 2 && 5 <= u2 && u2 <= 6
- 8 <= u1 && u1 <= 9 && 5 <= u2 && u2 <= 6
Separately, they hit VisitBinaryOperator<BO_NE> and in both cases, we get TRUE for (u1 != u2).
Is there any other way to formulate the expression so that it constructs LHS = [1, 2] U [8, 9] and doesn't bifurcate?
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1633–1634 | I have utilized castTo method for conversion of different types so that they can be compared/intersected. | |
1650–1654 | I have removed the concrete case handling as it was unreachable. | |
clang/test/Analysis/constant-folding.c | ||
289 | @ASDenysPetrov I have used your example, but I realized that the path bifurcates and sizeof LHS RangeSet is always 1 inside VisitBinaryOperator<BO_NE>. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1661–1662 | This comment is about finding intersecting Ranges in ConvertedLHS/ConvertedRHS which was previously done by comparing getMinValue/getMaxValue. Now, we use intersect family of methods for it. |
@manas, constrain into [8, 9] first and then pop out each intermediate element. This should work:
void clang_analyzer_eval(bool); template <typename T> void clang_analyzer_value(T x); extern void abort() __attribute__((__noreturn__)); #define assert(expr) ((expr) ? (void)(0) : abort()) void testDisequalityRules(unsigned int u1, unsigned int u2, unsigned int u3, int s1, int s2, int s3, unsigned char uch, signed char sch, short ssh, unsigned short ush) { assert(1 <= u1 && u1 <= 9); assert(u1 != 3); assert(u1 != 4); assert(u1 != 5); assert(u1 != 6); assert(u1 != 7); clang_analyzer_value(u1); // expected-warning{{32u:{ [1, 2], [8, 9] }}} assert(5 <= u2 && u2 <= 6); clang_analyzer_value(u2); // expected-warning{{32u:{ [5, 6] }}} clang_analyzer_eval(u1 != u2); // expected-warning{{TRUE}} }
Try this u1 > 0 && u1 < 10 && u1 != 3 && u1 != 4 && u1 != 5 && u1 != 6 && u1 != 7 && u2 > 4 && u2 < 7. This is a bit verbose but it will avoid bifurcation.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1624–1626 | Syntax: We usually don't use braces for single-line statements. | |
1628–1629 | Here I'd rather get a correct result and wait a bit then contra-versa. | |
1630–1631 | And again. This is incorrect to cast your un/signeds to bool. Here you should emulate the behavior of C++ abstract machine, hence cast both to the biggest type or unsigned one. | |
1633–1636 | castTo won't produce you empty RangeSets unless the originals were already empty, but we checked for this previously. So, you don't need this check. | |
clang/test/Analysis/constant-folding.c | ||
289 | Just do as I advised in the main comment. |
Thanks @martong and @ASDenysPetrov for the feedback. I utilized (u != n) to disallow bifurcation in test cases.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1630–1631 | So I figured out that according to C standard an expression of x != y has int as resulting type, while C++ standard makes it a bool. And as test file constant-folding.c contains tests for C, I was unintentionally casting RangeSets to int, and that is why those tests were passing. Do you think we should add a C++ test file, similar to constant-folding.c in the suite? | |
1630–1631 | Apart from the tests, consider the example of LHS (8-bit, signed) and RHS (16-bit, unsigned). LHS = [-2, -1] and RHS = [128, 129] which ideally should produce LHS != RHS -> true but after casting smaller signed type to bigger unsigned type(CastedLHS = [128,129]), we may have incorrect information. Should we not cast to bigger signed type instead of bigger unsigned type? Also, I think for cases where smaller bitwidth is signed(say type T1), and bigger bitwidth is unsigned(say type T2), we should "bisect" smaller signed type rangeset into following rangesets: bisect(LHS) => [LHS.MinValue, LHS.MaxNegativeValueLessThanZero] U [LHS.MinPositiveValueGreaterThanEqualToZero, LHS.MaxValue] and we can show that, first RangeSet of above can never have an intersection with RHS(unsigned) type, so we only care about the second RangeSet in bisect(LHS). |
Analysis/constant-folding.c seems to fail.
Please run the check-clang-analysis build target to see what fails and investigate it.
@steakhal thank you for reviewing this! I investigated about the failing tests.
// s1: [-3, -1], u1: [UINT_MAX - 3, UINT_MAX - 2] clang_analyzer_eval(u1 != s1); // expected-warning{{TRUE}} # Line: 312 // uch: [2, CHAR_MAX], sch: [SCHAR_MIN, 0] clang_analyzer_eval(uch != sch); // expected-warning{{TRUE}} # Line: 406 // ush: [2, USHRT_MAX], ssh: [SHRT_MIN, 0] clang_analyzer_eval(ush != ssh); // expected-warning{{TRUE}} # Line: 422
Above tests are failing.
Previously, it was discussed that a good strategy is to "cast both [LHS and RHS] to the biggest type or unsigned one."
And for example, in the first failing test case, casting both rangesets,
s1 = [-3,-1] -> [UINT_MAX-2, UINT_MAX] and u1 = [UINT_MAX-3, UINT_MAX-2] ->(unchanged) [UINT_MAX-3, UINT_MAX-2]
UINT_MAX -2 is overlapping in both RangeSets.
Casting signed types to unsigned ones can leave us with overlapping values as shown above. Essentially, these tests were wrongly written. So, I am correcting these tests accordingly.
I haven't looked at the implementation. I only checked the tests and something is not quite right there. See my comments inline.
BTW the line-coverage is good. I found only two branches which I want to have tests for - see inline.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1625 | This branch is uncovered by tests. | |
1634 | Although the RHS.isUnsigned() is covered 2 times in the check-clang-analysistarget, I think it would make sense to have a dedicated test for this case in the constant-folding.c. | |
clang/test/Analysis/constant-folding.c | ||
289–301 | These two hunks seem to be the same. We should probably keep one. | |
315–319 | Clang thinks it depends on the operands if u1 != s1 is true. | |
321–325 | Clang thinks it depends on the operands if u1 != s1 is true. | |
328–332 | Clang thinks it depends on the operands if u1 != u2 is true. | |
404–407 | Shouldn't it print TRUE? | |
420–423 | It should return TRUE. https://godbolt.org/z/44Pax16d1 |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1625 | (Un)fortunately, empty rangesets don't go into VisitBinaryOperator at all. They are pre-checked and handled accordingly. I don't see any reasons not to remove this branch altogether. | |
clang/test/Analysis/constant-folding.c | ||
289–301 | My bad. Removed one hunk. clang is unable to optimize but gcc does (https://godbolt.org/z/8TaaYa4M9). Is it something the llvm optimizer has not been fullfilling for now? Because to me, it looks fine as LHS and RHS cannot have any APSInt in common, so they will always be unequal in all possible scenarios. | |
315–319 | clang is unable to optimize while gcc can. https://godbolt.org/z/zoe8ddqh4 | |
321–325 | clang can't optimize but gcc can. https://godbolt.org/z/hnahWPacr | |
404–407 | Yes, it ideally should be true. But current implementation is unable to figure that out. I will try to solve it differently. The unknown cases are like this: bigger(or equal) bitwidth(say LHS) is unsigned and smaller(or equal) bitwidth(RHS) is signed, now if we cast RHS to LHS type, then all negative values in RHS will overlap with positive values of LHS, leading us to believe some values may overlap. In this example, uch: [2, CHAR_MAX], sch: [SCHAR_MIN, 0], when we cast sch (signed) to unsigned char type, then we get overlapping values between casted_sch and uch. |
For the test cases where we should be able to prove the case but return Unknown instead, should be marked by a FIXME stating the expected value.
Approximating a concrete value with Unknown is (almost) always correct. So, in this case, you don't need to go and fix them unless you want to do the extra mile.
The patch looks great!
And I think it's correct, but integral promotions and standard conversions are always tricky.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1625 | Let's replace it with an assertion then. | |
1628 | ||
1639–1640 | ||
clang/test/Analysis/constant-folding.c | ||
289–301 | Indeed! My bad. IDK why clang misses this. We should either ask about it on Discord (optimizations channel) or simply file a ticket asking about this. Feel free to ask them. | |
315–319 | Right! | |
321–325 | Right! | |
328–332 | Indeed correct. |
I have added FIXME in constant-folding.c test file for now. I will fix it in a new review.
I have also opened a Discourse thread on the missing optimization opportunity.
Hi,
The following starts crashing with this patch:
clang -cc1 -analyze -analyzer-checker=core bbi-77010.c
It crashes with
bbi-77010.c:6:1: warning: non-void function does not return a value [-Wreturn-type] } ^ clang: ../../clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:1622: clang::ento::RangeSet (anonymous namespace)::SymbolicRangeInferrer::VisitBinaryOperator(clang::ento::RangeSet, clang::ento::RangeSet, clang::QualType): Assertion `!LHS.isEmpty() && !RHS.isEmpty() && "Both ranges should be non-empty"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: ../../main-github/llvm/build-all/bin/clang -cc1 -analyze -analyzer-checker=core bbi-77010.c 1. <eof> parser at end of file 2. While analyzing stack: #0 Calling g 3. bbi-77010.c:13:12: Error evaluating statement 4. bbi-77010.c:13:12: Error evaluating statement #0 0x0000000002fddbb3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/clang+0x2fddbb3) #1 0x0000000002fdb8de llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/clang+0x2fdb8de) #2 0x0000000002fddf36 SignalHandler(int) Signals.cpp:0:0 #3 0x00007fe701eb8630 __restore_rt sigaction.c:0:0 #4 0x00007fe6ff5ff387 raise (/lib64/libc.so.6+0x36387) #5 0x00007fe6ff600a78 abort (/lib64/libc.so.6+0x37a78) #6 0x00007fe6ff5f81a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6) #7 0x00007fe6ff5f8252 (/lib64/libc.so.6+0x2f252) #8 0x00000000049caed2 (anonymous namespace)::SymbolicRangeInferrer::VisitBinaryOperator(clang::ento::RangeSet, clang::BinaryOperatorKind, clang::ento::RangeSet, clang::QualType) RangeConstraintManager.cpp:0:0 #9 0x00000000049c9867 (anonymous namespace)::SymbolicRangeInferrer::infer(clang::ento::SymExpr const*) RangeConstraintManager.cpp:0:0 #10 0x00000000049bebf5 (anonymous namespace)::RangeConstraintManager::assumeSymNE(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&) RangeConstraintManager.cpp:0:0 #11 0x00000000049d368c clang::ento::RangedConstraintManager::assumeSymUnsupported(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SymExpr const*, bool) (../../main-github/llvm/build-all/bin/clang+0x49d368c) #12 0x00000000049f0b09 clang::ento::SimpleConstraintManager::assumeAux(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::NonLoc, bool) (../../main-github/llvm/build-all/bin/clang+0x49f0b09) #13 0x00000000049f096a clang::ento::SimpleConstraintManager::assume(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::NonLoc, bool) (../../main-github/llvm/build-all/bin/clang+0x49f096a) #14 0x00000000049f086d clang::ento::SimpleConstraintManager::assumeInternal(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::DefinedSVal, bool) (../../main-github/llvm/build-all/bin/clang+0x49f086d) #15 0x000000000492d3e3 clang::ento::ConstraintManager::assumeDual(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::DefinedSVal) (../../main-github/llvm/build-all/bin/clang+0x492d3e3) #16 0x0000000004955b6d clang::ento::ExprEngine::evalEagerlyAssumeBinOpBifurcation(clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet&, clang::Expr const*) (../../main-github/llvm/build-all/bin/clang+0x4955b6d) #17 0x00000000049514b6 clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) (../../main-github/llvm/build-all/bin/clang+0x49514b6) #18 0x000000000494c73e clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) (../../main-github/llvm/build-all/bin/clang+0x494c73e) #19 0x000000000494c459 clang::ento::ExprEngine::processCFGElement(clang::CFGElement, clang::ento::ExplodedNode*, unsigned int, clang::ento::NodeBuilderContext*) (../../main-github/llvm/build-all/bin/clang+0x494c459) #20 0x000000000492f3d0 clang::ento::CoreEngine::HandlePostStmt(clang::CFGBlock const*, unsigned int, clang::ento::ExplodedNode*) (../../main-github/llvm/build-all/bin/clang+0x492f3d0) #21 0x000000000492e1f6 clang::ento::CoreEngine::ExecuteWorkList(clang::LocationContext const*, unsigned int, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>) (../../main-github/llvm/build-all/bin/clang+0x492e1f6) #22 0x000000000454d931 (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*, void>>*) AnalysisConsumer.cpp:0:0 #23 0x000000000453034e (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) AnalysisConsumer.cpp:0:0 #24 0x0000000004a441a5 clang::ParseAST(clang::Sema&, bool, bool) (../../main-github/llvm/build-all/bin/clang+0x4a441a5) #25 0x0000000003ac97f6 clang::FrontendAction::Execute() (../../main-github/llvm/build-all/bin/clang+0x3ac97f6) #26 0x0000000003a3b8a4 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (../../main-github/llvm/build-all/bin/clang+0x3a3b8a4) #27 0x0000000003b8b102 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (../../main-github/llvm/build-all/bin/clang+0x3b8b102) #28 0x00000000009f8516 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (../../main-github/llvm/build-all/bin/clang+0x9f8516) #29 0x00000000009f53b0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0 #30 0x00000000009f4e66 clang_main(int, char**) (../../main-github/llvm/build-all/bin/clang+0x9f4e66) #31 0x00007fe6ff5eb555 __libc_start_main (/lib64/libc.so.6+0x22555) #32 0x00000000009f0fbb _start (../../main-github/llvm/build-all/bin/clang+0x9f0fbb) Abort (core dumped)
Thanks for the report. I'll fix it ASAP.
I think I'll replace the assertion with an early return.
BTW, was this uncovered by fuzzing? @uabelho
I would probably host this logic into a Range::intersects(Range Other) const member function.