Substitute explicit comparing to zero with more appropriate ProgramState::assume() in CStringChecke::assumeZero().
Details
Diff Detail
Event Timeline
Seems okay to me, but please wait for other reviewers please. And I have some questions.
Could you please add a reproducer lit test file? E.g. maybe from the bug you mentioned.
Did the crash happen only when driven from clang-tidy?
@martong
File to reproduce is here:
I have not made a test for it yet. If it needs I will.
The crash reproduces not only on clang-tidy, but by simple envoke clang --analyze t37503.cpp
Yes, every commit needs a test, or some other way of preventing regressions (eg., a stronger assert that'd fail on existing tests), assuming it changes the behavior (i.e., not pure refactoring).
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
286–288 | This suppresses the assertion but it's preferable to find out why it fails to begin with. Why are we even trying to compare a NonLoc to a null pointer (or a Loc to a numeric null)? Is somebody misusing the API by feeding us with an incorrect SVal or an incorrect type? |
For now I am a bit far from understanding what we shall do with the results of assume and what is the idea of using it and in which cases it is appropriate.
Could you briefly explain me this?
Another question is why we can't compare NonLoc and Loc which potentially can have some values (e.g. such ambiguous nonloc::ConcreteInt and loc::ConcreteInt).
Maybe this will push me to some other fix.
assume is when you record new constraints to the program state. For instance, if you encounter an if-statement, the symbolic value of its condition is assumed to be zero on one path and non-zero on the other path. Or when you dereference a pointer, it's assumed to be non-null. Assumptions may be arbitrary valid symbolic expressions. nonloc::ConcreteInt and loc::ConcreteInt are different classes of values. The difference is like between a zero integer and a null pointer; these are values of different types and their structure reflects that. They probably didn't *need* to be different classes of values but this time it looks like it helped us to catch a problem, so it's great that they're different classes of values. Type safety is important :)
Reworked solution. Simplified CStringChecker::assumeZero.
Added test (taken from the bug).
@Szelethus , @NoQ please, look at my solution. I'm not sure if it is correct, but it actually passes all the tests and throw off the issue.
I still think that the mistake was made earlier. Why are we getting a compound value here? The argument of strcpy is a plain pointer, not a compound value. It points to an array of chars which may be interpreted as a compound value, but in any case our intention is definitely not to make assumes over the whole array.
@NoQ ,
sorry for the late response, was working on other patch.
Why are we getting a compound value here?
We are getting nonloc::SymbolVal here, not a compound value. SA thinks that double-dereference of the first agrument applies to unsigned char* *, instead of char* * * (look at the test sample).
Nevertheless, the root cause was that we compared nonloc with Zero sval which was loc and we got an assertion as a result.
Here I've improved and simplified a bit and now it is handles OK without assertions and passes all the tests.
What do you think, is it worth to be landed?
Sorry for the late answer, I've been kinda busy with other things! And thank you for attending to these bugs! I admit, this isn't really within the realms of my expertise, but nevertheless, here are my two cents.
I think what what be great to see here (and this seems to be the thing @NoQ is fishing for) is not to change an if branch and avoid running into the crash, but rather find out why assumeZero was ever called with a nonloc value, which shouldn't really happen. We're treating the symptom, not curing the illness, if you will. The SVal (if its DefinedSVal) is supposed to be always MemRegionVal here, is that right? Maybe if we tried to add an assert here, that could help nail where the actual issue is coming from. It's probably in evalStrcpyCommon, judging from the bug report you linked in your summary.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
280–284 |
So, in other words the condition seems to be this: if (V.getAs<DefinedSVall>(&& !V.getAs<nonloc::LazyCompoundVal>()) But nonloc::SymbolVal is a subclass of DefinedSVal (and is not a nonloc::LazyCompoundVal), so we're definitely running into the assume call, isn't that right? What really seems to be the problem solver here is that you changed the assumption from x == 0 (creating a state where this expression is false, and one where it isn't) into x (whether x is true or not). |
@Szelethus, @NoQ
I've investigated graph.dot of the sample.
Here is a simplification:
- SA thinks that ptr is a pointer with a structure MemRegion->MemRegion->MemRegion->Element
- Then *(unsigned char **)ptr = (unsigned char *)(func()); occures. Symbolic substitution happens to ptr.
- After that SA thinks that ptr holds a symbolic value MemRegion->MemRegion->Element because of casts.
- **ptr should lead us to MemRegion->MemRegion->MemRegion from C++ point of view, but dereferencing applies to substituted symbolic value from SA point of view and we finally get MemRegion->MemRegion->Element
As I see, this is not treating the symptom. This is exactly handling this particular case which is legal and may take place.
Another solution could be to check the first argument of strcpy for being actially a char* and show a warning otherwise.
Please, explain, what I could miss in my suggestions, because I'm less expertise than you, guys.
This is the exploded graph for which code? The t37503.cpp file you uploaded earlier doesn't have the functions/variables found here, nor does the test case included in this patch.
This is the exploded graph for which code? The t37503.cpp file you uploaded earlier doesn't have the functions/variables found here, nor does the test case included in this patch.
This html corresponds to the test case in the patch. You can find it below in clang/test/Analysis/string.c. However t37503.cpp also contains the same snippet just with other names and it wouldn't change the graph.
Folk,
please look at this patch. It has been hanging for a time here. We should finally make a decision about it.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
281–283 | Basically SVal::getAs<> should not be used for discovering the type of the value; only the exact representation that's used for the value of the type that's already in your possession. Say, it's ok to use it to discriminate between, say, compound value and lazy compound value. It's ok to use it to discriminate between a concrete integer and a symbol. It's ok to use it do discriminate between a known value and an unknown value. But if it's used for discriminating between a compound value and a numeric symbol, i'm 99% sure it's incorrect. You should already know the type from the AST before you even obtain the value. It doesn't make sense to run the checker at all if the function receives a structure. And if it doesn't receive the structure but the run-time value is of structure type, then either the checker isn't obtaining the value correctly or there's bug in path-sensitive analysis. That's why i still believe you're only treating the symptoms. There's nothing normal in the situation where "strcpy suddenly accepts a structure (or an array) by value". |
@NoQ You concerns are absolutly justified. I agree with you. Let me tell what I'm thinking in inlines.
If you are interested in why the assertion happens, please, look D77062#1977613
And, please, share your vision about what the solution could be.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
281–283 | I'll remove V.getAs<nonloc::LazyCompoundVal>(). I mistakenly added this V.getAs<nonloc::LazyCompoundVal>() to keep code safe regardless of the checker intention and sanity. The point is that state->assume(*val) calls ConstraintMgr->assumeDual which finally calls SimpleConstraintManager::assumeAux. assumeAux handles all NonLoc kinds except LazyCompoundValKindand CompoundValKind(I missed to check it). In case of these two kinds it leads to llvm_unreachable("'Assume' not implemented for this NonLoc");. But for now I understand that I failed because unreachable means I might not have check for those kinds, since the function already takes on this work. However it is not an actual fix. The fix is std::tie(states.second, states.first) = state->assume(*val);. You can see the summary above. | |
2271–2279 | This check prevents passing structures. From this point we are sure to work with pointers and integral types in arguments: |
Ok, this simplification is obviously correct. I don't mind landing it regardless of the crash. If it gets rid of the crash, even better.
I'm still worried about the root cause of the crash though. I recommend to add an assertion into CStringChecker::assumeZero() that's equivalent to whatever assertion was previously crashing, and continue debugging from there to see if it uncovers more bugs in the checker.
@NoQ thanks for the approval. But I'm afraid we should return the check for !V.getAs<nonloc::LazyCompoundVal>() back in CStringChecker::assumeZero.
Look, I paid attention to the fix for https://llvm.org/PR24951 and related commit rG480a0c00ca51d909a925120737b71289cbb79eda. I saw that @dcoughlin added a check for V.getAs<nonloc::LazyCompoundVal>() to fix and the old code used it implicitly. My patch returns this issue back. That's why we need to add the check for LazyCompoundVal as well.
Another big question is why LazyCompoundVal can be possible here at all. As I see according to the graph below the answer is somewhere deep in the core. Thus I'd propose to apply this patch in scope of fix https://llvm.org/PR37503.
I updated the patch. I've also added a FIXME caveat.
It is a good practice in many projects to make commit messages into imperative (i.e. "Improve" instead of "Improving" or "Improved"). Again, not everyone follows it, but it is good to keep it consistent, right?
@NoQ knock-knock!
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
199–200 | Can you please put static before return type, it will be more consistent with other static methods nearby. | |
282–287 | I know that other methods here don't name variables according to llvm-style guide. Analyzer's code is one of the least complying areas of the whole LLVM project. However, I suggest not to make it worse and capitalize all parameter and local variable names. |
Changed naming due to LLVM rules.
I decided not to change names everywhere to leave the patch more readable.
Ping!
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
199–200 | Sure, I just didn't pay attention to that. |
@NoQ just one more ping. You accepted it and then I just revised it again. Could you, please, take a minute and look at it.
I'd close it with a great pleasure. It's been hanging too long.
I would not accept this patch unless this investigation is done. However, I'm not inherently against this change.
I would not accept this patch unless this investigation is done. However, I'm not inherently against this change.
Actually I've done the investigation. You can find it here https://reviews.llvm.org/D77062#1977613
The bug is somewhere deep in the core. I see two solutions:
- Leave the crash presented in the clang and dig deep into the core trying to solve the root cause.
- Accept the change with a FIXME promt and dig deep into the core trying to solve the root cause.
Finally, I made my investigations and I come up with this code:
void test(int *a, char ***b) { *(unsigned char **)b = (unsigned char*)a; // #1 if (**b == nullptr) // will-crash ; }
So, this issue does not relate to CStringChecker. It will crash at ExprEngineC.cpp:100.
It seems that we have to make the choice of how to model type punning.
As you can see in the example, we overwrite the pointer value of *b to point to an unsigned char value (#1).
The static type of b (char***) does not reflect the associated value's type which (unsigned char**) - (note the number of indirections!) in other words, an obvious type pun happened at that line.
If we get the value of **b, we get a NonLoc of type unsigned char.
The dump of **b confirms this: reg_$4<unsigned char Element{SymRegion{reg_$0<int * a>},0 S64b,unsigned char}>, which is a NonLoc in deed.
IMO we should fix the root cause of this in the Core.
I think we should have a symbolic cast back to the static type before doing anything with the SVal (iff the BaseKind differs).
If we do this, we will get a Loc as expected - and neither this bug nor your original bug would fire.
WDYT @NoQ @martong @ASDenysPetrov @Szelethus?
[edit]: removed strcpy() forw.delc. from the example
[edit]: fix italic triple pointers not displayed correctly by using backticks instead
And of course, repro:
./bin/clang -cc1 -analyze -setup-static-analyzer -analyzer-checker=core example.c Assertion `op == BO_Add' failed #0 0x00007f5bea743904 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:563:0 #1 0x00007f5bea7439a8 PrintStackTraceSignalHandler(void*) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:627:0 #2 0x00007f5bea741759 llvm::sys::RunSignalHandlers() /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Signals.cpp:70:0 #3 0x00007f5bea743286 SignalHandler(int) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:405:0 #4 0x00007f5be9b19fd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0) #5 0x00007f5be9b19f47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f5be9b1b8b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0 #7 0x00007f5be9b0b42a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0 #8 0x00007f5be9b0b4a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2) #9 0x00007f5bdece2000 clang::ento::SValBuilder::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/SValBuilder.cpp:439:0 #10 0x00007f5bdebd28ae clang::ento::ExprEngine::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) /home/elnbbea/git/llvm-project/build/debug/../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:631:0 #11 0x00007f5bdebeb031 clang::ento::ExprEngine::VisitBinaryOperator(clang::BinaryOperator const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:100:0 #12 0x00007f5bdebc0aa5 clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1573:0 #13 0x00007f5bdebbca10 clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:792:0
Beware, Phabricator ruins the visual experience of this nice analysis. E.g //char ***// is visible as an italic char *.
I think we should have a symbolic cast back to the static type before doing anything with the SVal (iff the BaseKind differs).
If we do this, we will get a Loc as expected - and neither this bug nor your original bug would fire.
I fully agree, I think this is the way.
Though, the fix probably will not be simple, because the issue itself always requires a 3x indirection. The code that is presented by @steakhal is the least minimal example to get this crash. The reason why we cannot have a crash with a ** is a mystic at the moment.
I think probably the representation of casts is behind this.
Eg. if you reinterpret cast b to int**, and make the type pun that way, we don't crash.
template <typename T> void clang_analyzer_dump(T); void test(int *a, char ***b) { *(int **)b = a; // only this line changed! clang_analyzer_dump(**b); // &SymRegion{reg_$2<char * Element{SymRegion{reg_$0<int * a>},0 S64b,char *}>} if (**b == nullptr) // will-not-crash ; }
If we get the value of **b, we get a NonLoc of type unsigned char.
The dump of **b confirms this: reg_$4<unsigned char Element{SymRegion{reg_$0<int * a>},0 S64b,unsigned char}>, which is a NonLoc in deed.
Exactly. That's what I've been trying to explaine above!
This happens because of the casts, after which CSA stores operates with the symbol (b) as it points to int* (though, it really is char**).
IMO we should fix the root cause of this in the Core.
I can't see a direct fix for now and feel quite unconfident in that part of code. That's why I suggested to accept this change as what makes CSA codebase resistant to current bugs in the Core, which we are not able to fix for now. We can change Summary and the name of this revision for acception without objections.
@steakhal
You told that you suppose a potential fix. It would be nice, if you share the patch to review.
Nice, very interesting!
The contract of RegionStore with respect to type punning is that it stores the value as written, even if its type doesn't match the storage type, but then it casts the value to the correct type upon reading by invoking CastRetrievedVal() on it. That's where the fix should probably be.
I mean, I already told you my suggestion, there was no concrete fix in my mind at the time.
Hm, thank you for the pointer.
I'm not sure if I fixed this in the right way - I'm still puzzled by the spaghetti code :D
Never the less, here is my proposed fix for this: D88477
@ASDenysPetrov Do you still want to rework the API of the assumeZero?
@ASDenysPetrov Do you still want to rework the API of the assumeZero?
This patch more looks like NFC, being just refactored.
Actually I see that if we find and fix the root cause, we can freely refuse this patch.
Another thing I see is that this patch will work after a root cause fix as well.
And the last one is that as long a root cause stays unfixed, clang will emit an assertion without this patch, at least for my testcase.
So, I can't really evaluate this objectively. What do you think?
Fine.
Actually I see that if we find and fix the root cause, we can freely refuse this patch.
Another thing I see is that this patch will work after a root cause fix as well.
Yes, I think so.
And the last one is that as long a root cause stays unfixed, clang will emit an assertion without this patch, at least for my testcase.
So, I can't really evaluate this objectively. What do you think?
Actually, I'm expecting to fix this in the need future. I'm really doing the extra mile to come up with a proper fix.
Unfortunately, it's deeply in the Core and requires a bunch of stuff to work out how to achieve this.
It will be a nice first-time experience though - I've already learned a lot just by debugging it.
Till then, I recommend you to follow my effort at D88477.
Updated. Requalified this patch to non-functional changes.
What do you think we should do with the thing that ProgramState::assume() doesn't support CompoundVal and LazyCompoundVal kinds of SVal?
Can you please put static before return type, it will be more consistent with other static methods nearby.