Page MenuHomePhabricator

[analyzer] [NFC] Simplify CStringChecke::assumeZero function
Needs ReviewPublic

Authored by ASDenysPetrov on Mar 30 2020, 8:30 AM.

Details

Summary

Substitute explicit comparing to zero with more appropriate ProgramState::assume() in CStringChecke::assumeZero().

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Mar 30 2020, 8:30 AM
ASDenysPetrov retitled this revision from Added check for unaccaptable equality operation between Loc and NonLoc types to [analyzer] Added check for unaccaptable equality operation between Loc and NonLoc types.Mar 30 2020, 8:31 AM

Seems okay to me, but please wait for other reviewers please. And I have some questions.
Could you please add a reproducer lit test file? E.g. maybe from the bug you mentioned.
Did the crash happen only when driven from clang-tidy?

@martong
File to reproduce is here:


I have not made a test for it yet. If it needs I will.
The crash reproduces not only on clang-tidy, but by simple envoke clang --analyze t37503.cpp

NoQ added a comment.Mar 30 2020, 11:24 AM

Yes, every commit needs a test, or some other way of preventing regressions (eg., a stronger assert that'd fail on existing tests), assuming it changes the behavior (i.e., not pure refactoring).

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
286–288

This suppresses the assertion but it's preferable to find out why it fails to begin with. Why are we even trying to compare a NonLoc to a null pointer (or a Loc to a numeric null)? Is somebody misusing the API by feeding us with an incorrect SVal or an incorrect type?

@NoQ

For now I am a bit far from understanding what we shall do with the results of assume and what is the idea of using it and in which cases it is appropriate.
Could you briefly explain me this?
Another question is why we can't compare NonLoc and Loc which potentially can have some values (e.g. such ambiguous nonloc::ConcreteInt and loc::ConcreteInt).

Maybe this will push me to some other fix.

NoQ added a comment.Apr 1 2020, 12:17 AM

assume is when you record new constraints to the program state. For instance, if you encounter an if-statement, the symbolic value of its condition is assumed to be zero on one path and non-zero on the other path. Or when you dereference a pointer, it's assumed to be non-null. Assumptions may be arbitrary valid symbolic expressions. nonloc::ConcreteInt and loc::ConcreteInt are different classes of values. The difference is like between a zero integer and a null pointer; these are values of different types and their structure reflects that. They probably didn't *need* to be different classes of values but this time it looks like it helped us to catch a problem, so it's great that they're different classes of values. Type safety is important :)

ASDenysPetrov updated this revision to Diff 254225.EditedApr 1 2020, 9:12 AM
ASDenysPetrov edited the summary of this revision. (Show Details)

Reworked solution. Simplified CStringChecker::assumeZero.
Added test (taken from the bug).

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2020, 9:12 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ASDenysPetrov retitled this revision from [analyzer] Added check for unaccaptable equality operation between Loc and NonLoc types to [analyzer] Added check for unacceptable equality operation between Loc and NonLoc types.Apr 2 2020, 1:02 PM

@Szelethus , @NoQ please, look at my solution. I'm not sure if it is correct, but it actually passes all the tests and throw off the issue.

NoQ added a comment.Apr 6 2020, 11:25 AM

I still think that the mistake was made earlier. Why are we getting a compound value here? The argument of strcpy is a plain pointer, not a compound value. It points to an array of chars which may be interpreted as a compound value, but in any case our intention is definitely not to make assumes over the whole array.

ASDenysPetrov added a comment.EditedApr 10 2020, 7:31 AM

@NoQ ,
sorry for the late response, was working on other patch.

Why are we getting a compound value here?

We are getting nonloc::SymbolVal here, not a compound value. SA thinks that double-dereference of the first agrument applies to unsigned char* *, instead of char* * * (look at the test sample).
Nevertheless, the root cause was that we compared nonloc with Zero sval which was loc and we got an assertion as a result.
Here I've improved and simplified a bit and now it is handles OK without assertions and passes all the tests.
What do you think, is it worth to be landed?

@Szelethus , @NoQ please, look at my solution. I'm not sure if it is correct, but it actually passes all the tests and throw off the issue.

Sorry for the late answer, I've been kinda busy with other things! And thank you for attending to these bugs! I admit, this isn't really within the realms of my expertise, but nevertheless, here are my two cents.

I think what what be great to see here (and this seems to be the thing @NoQ is fishing for) is not to change an if branch and avoid running into the crash, but rather find out why assumeZero was ever called with a nonloc value, which shouldn't really happen. We're treating the symptom, not curing the illness, if you will. The SVal (if its DefinedSVal) is supposed to be always MemRegionVal here, is that right? Maybe if we tried to add an assert here, that could help nail where the actual issue is coming from. It's probably in evalStrcpyCommon, judging from the bug report you linked in your summary.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
280–284

Why are we getting a compound value here?

We are getting nonloc::SymbolVal here, not a compound value. SA thinks that double-dereference of the first agrument applies to unsigned char* *, instead of char* * * (look at the test sample).

So, in other words the condition seems to be this:

if (V.getAs<DefinedSVall>(&& !V.getAs<nonloc::LazyCompoundVal>())

But nonloc::SymbolVal is a subclass of DefinedSVal (and is not a nonloc::LazyCompoundVal), so we're definitely running into the assume call, isn't that right?
https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html

What really seems to be the problem solver here is that you changed the assumption from x == 0 (creating a state where this expression is false, and one where it isn't) into x (whether x is true or not).

ASDenysPetrov added a comment.EditedApr 13 2020, 4:26 AM

@Szelethus, @NoQ
I've investigated graph.dot of the sample.


Here is a simplification:

  1. SA thinks that ptr is a pointer with a structure MemRegion->MemRegion->MemRegion->Element
  2. Then *(unsigned char **)ptr = (unsigned char *)(func()); occures. Symbolic substitution happens to ptr.
  3. After that SA thinks that ptr holds a symbolic value MemRegion->MemRegion->Element because of casts.
  4. **ptr should lead us to MemRegion->MemRegion->MemRegion from C++ point of view, but dereferencing applies to substituted symbolic value from SA point of view and we finally get MemRegion->MemRegion->Element

As I see, this is not treating the symptom. This is exactly handling this particular case which is legal and may take place.

Another solution could be to check the first argument of strcpy for being actially a char* and show a warning otherwise.

Please, explain, what I could miss in my suggestions, because I'm less expertise than you, guys.

I've investigated graph.dot of the sample.

This is the exploded graph for which code? The t37503.cpp file you uploaded earlier doesn't have the functions/variables found here, nor does the test case included in this patch.

@Szelethus

This is the exploded graph for which code? The t37503.cpp file you uploaded earlier doesn't have the functions/variables found here, nor does the test case included in this patch.

This html corresponds to the test case in the patch. You can find it below in clang/test/Analysis/string.c. However t37503.cpp also contains the same snippet just with other names and it wouldn't change the graph.

@Szelethus, could you, please, make some suggestions about my investigation above?

Folk,
please look at this patch. It has been hanging for a time here. We should finally make a decision about it.

NoQ added inline comments.Jun 2 2020, 4:19 AM
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
281–283

Basically SVal::getAs<> should not be used for discovering the type of the value; only the exact representation that's used for the value of the type that's already in your possession. Say, it's ok to use it to discriminate between, say, compound value and lazy compound value. It's ok to use it to discriminate between a concrete integer and a symbol. It's ok to use it do discriminate between a known value and an unknown value. But if it's used for discriminating between a compound value and a numeric symbol, i'm 99% sure it's incorrect. You should already know the type from the AST before you even obtain the value. It doesn't make sense to run the checker at all if the function receives a structure. And if it doesn't receive the structure but the run-time value is of structure type, then either the checker isn't obtaining the value correctly or there's bug in path-sensitive analysis. That's why i still believe you're only treating the symptoms. There's nothing normal in the situation where "strcpy suddenly accepts a structure (or an array) by value".

ASDenysPetrov marked 3 inline comments as done.EditedJun 4 2020, 3:51 AM

@NoQ You concerns are absolutly justified. I agree with you. Let me tell what I'm thinking in inlines.

If you are interested in why the assertion happens, please, look D77062#1977613

And, please, share your vision about what the solution could be.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
281–283

I'll remove V.getAs<nonloc::LazyCompoundVal>(). I mistakenly added this V.getAs<nonloc::LazyCompoundVal>() to keep code safe regardless of the checker intention and sanity. The point is that state->assume(*val) calls ConstraintMgr->assumeDual which finally calls SimpleConstraintManager::assumeAux. assumeAux handles all NonLoc kinds except LazyCompoundValKindand CompoundValKind(I missed to check it). In case of these two kinds it leads to llvm_unreachable("'Assume' not implemented for this NonLoc");. But for now I understand that I failed because unreachable means I might not have check for those kinds, since the function already takes on this work.

However it is not an actual fix. The fix is std::tie(states.second, states.first) = state->assume(*val);. You can see the summary above.

2271–2279

This check prevents passing structures. From this point we are sure to work with pointers and integral types in arguments:
E.g. char *strcpy(int i1, int i2);, char *strcpy(char *c1, char *c2);, char *strcpy(Kind t1, Kind t2);
This confirms @NoQ's supposition that checker narrows argument types.
P.S. Honestly I'd narrow this more aggressively to just a char pointer type, anyway it doesn't relate to the bug and I wouldn't add this to a single patch.

ASDenysPetrov edited the summary of this revision. (Show Details)Jun 4 2020, 4:03 AM
ASDenysPetrov edited the summary of this revision. (Show Details)
ASDenysPetrov edited the summary of this revision. (Show Details)Jun 4 2020, 4:07 AM
ASDenysPetrov edited the summary of this revision. (Show Details)Jun 4 2020, 4:11 AM

Removed V.getAs<nonloc::LazyCompoundVal>() from if.

@NoQ, what do you think about my explanation?

NoQ accepted this revision.Jun 18 2020, 6:52 AM

Ok, this simplification is obviously correct. I don't mind landing it regardless of the crash. If it gets rid of the crash, even better.

I'm still worried about the root cause of the crash though. I recommend to add an assertion into CStringChecker::assumeZero() that's equivalent to whatever assertion was previously crashing, and continue debugging from there to see if it uncovers more bugs in the checker.

This revision is now accepted and ready to land.Jun 18 2020, 6:52 AM
ASDenysPetrov retitled this revision from [analyzer] Added check for unacceptable equality operation between Loc and NonLoc types to [analyzer] Improved zero assumption in CStringChecke::assumeZero.Jun 23 2020, 5:48 AM
ASDenysPetrov edited the summary of this revision. (Show Details)

@NoQ thanks for the approval. But I'm afraid we should return the check for !V.getAs<nonloc::LazyCompoundVal>() back in CStringChecker::assumeZero.

Look, I paid attention to the fix for https://llvm.org/PR24951 and related commit rG480a0c00ca51d909a925120737b71289cbb79eda. I saw that @dcoughlin added a check for V.getAs<nonloc::LazyCompoundVal>() to fix and the old code used it implicitly. My patch returns this issue back. That's why we need to add the check for LazyCompoundVal as well.

Another big question is why LazyCompoundVal can be possible here at all. As I see according to the graph below the answer is somewhere deep in the core. Thus I'd propose to apply this patch in scope of fix https://llvm.org/PR37503.

I updated the patch. I've also added a FIXME caveat.

This comment was removed by ASDenysPetrov.

@NoQ , just one another ping, since it is near to be closed.

ASDenysPetrov requested review of this revision.Jul 15 2020, 3:12 AM

It is a good practice in many projects to make commit messages into imperative (i.e. "Improve" instead of "Improving" or "Improved"). Again, not everyone follows it, but it is good to keep it consistent, right?

@NoQ knock-knock!

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
199–200

Can you please put static before return type, it will be more consistent with other static methods nearby.

282–287

I know that other methods here don't name variables according to llvm-style guide. Analyzer's code is one of the least complying areas of the whole LLVM project. However, I suggest not to make it worse and capitalize all parameter and local variable names.

ASDenysPetrov marked an inline comment as done.

Changed naming due to LLVM rules.
I decided not to change names everywhere to leave the patch more readable.

ASDenysPetrov retitled this revision from [analyzer] Improved zero assumption in CStringChecke::assumeZero to [analyzer] Improve zero assumption in CStringChecke::assumeZero.Jul 16 2020, 5:44 AM
ASDenysPetrov edited the summary of this revision. (Show Details)

Ping!

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
199–200

Sure, I just didn't pay attention to that.

@NoQ just one more ping. You accepted it and then I just revised it again. Could you, please, take a minute and look at it.
I'd close it with a great pleasure. It's been hanging too long.

A gentle notification :-)

One more notification. Please, somebody look at this patch.

I think what what be great to see here (and this seems to be the thing @NoQ is fishing for) is not to change an if branch and avoid running into the crash, but rather find out why assumeZero was ever called with a nonloc value, which shouldn't really happen. We're treating the symptom, not curing the illness, if you will. The SVal (if its DefinedSVal) is supposed to be always MemRegionVal here, is that right? Maybe if we tried to add an assert here, that could help nail where the actual issue is coming from. It's probably in evalStrcpyCommon, judging from the bug report you linked in your summary.

I would not accept this patch unless this investigation is done. However, I'm not inherently against this change.

ASDenysPetrov added a comment.EditedSep 24 2020, 6:09 AM

@steakhal

I would not accept this patch unless this investigation is done. However, I'm not inherently against this change.

Actually I've done the investigation. You can find it here https://reviews.llvm.org/D77062#1977613

The bug is somewhere deep in the core. I see two solutions:

  1. Leave the crash presented in the clang and dig deep into the core trying to solve the root cause.
  2. Accept the change with a FIXME promt and dig deep into the core trying to solve the root cause.
steakhal added a comment.EditedSep 25 2020, 2:18 AM

Finally, I made my investigations and I come up with this code:

void test(int *a, char ***b) {
  *(unsigned char **)b = (unsigned char*)a; // #1
  if (**b == nullptr) // will-crash
    ;
}

So, this issue does not relate to CStringChecker. It will crash at ExprEngineC.cpp:100.
It seems that we have to make the choice of how to model type punning.
As you can see in the example, we overwrite the pointer value of *b to point to an unsigned char value (#1).
The static type of b (char***) does not reflect the associated value's type which (unsigned char**) - (note the number of indirections!) in other words, an obvious type pun happened at that line.
If we get the value of **b, we get a NonLoc of type unsigned char.
The dump of **b confirms this: reg_$4<unsigned char Element{SymRegion{reg_$0<int * a>},0 S64b,unsigned char}>, which is a NonLoc in deed.

IMO we should fix the root cause of this in the Core.
I think we should have a symbolic cast back to the static type before doing anything with the SVal (iff the BaseKind differs).
If we do this, we will get a Loc as expected - and neither this bug nor your original bug would fire.
WDYT @NoQ @martong @ASDenysPetrov @Szelethus?

[edit]: removed strcpy() forw.delc. from the example
[edit]: fix italic triple pointers not displayed correctly by using backticks instead

And of course, repro:

./bin/clang -cc1 -analyze -setup-static-analyzer -analyzer-checker=core example.c

Assertion `op == BO_Add' failed

 #0 0x00007f5bea743904 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:563:0
 #1 0x00007f5bea7439a8 PrintStackTraceSignalHandler(void*) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:627:0
 #2 0x00007f5bea741759 llvm::sys::RunSignalHandlers() /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Signals.cpp:70:0
 #3 0x00007f5bea743286 SignalHandler(int) /home/elnbbea/git/llvm-project/build/debug/../../llvm/lib/Support/Unix/Signals.inc:405:0
 #4 0x00007f5be9b19fd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0)
 #5 0x00007f5be9b19f47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f5be9b1b8b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f5be9b0b42a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f5be9b0b4a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
 #9 0x00007f5bdece2000 clang::ento::SValBuilder::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/SValBuilder.cpp:439:0
#10 0x00007f5bdebd28ae clang::ento::ExprEngine::evalBinOp(llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal, clang::QualType) /home/elnbbea/git/llvm-project/build/debug/../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:631:0
#11 0x00007f5bdebeb031 clang::ento::ExprEngine::VisitBinaryOperator(clang::BinaryOperator const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:100:0
#12 0x00007f5bdebc0aa5 clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1573:0
#13 0x00007f5bdebbca10 clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) /home/elnbbea/git/llvm-project/build/debug/../../clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:792:0

Beware, Phabricator ruins the visual experience of this nice analysis. E.g //char ***// is visible as an italic char *.

I think we should have a symbolic cast back to the static type before doing anything with the SVal (iff the BaseKind differs).
If we do this, we will get a Loc as expected - and neither this bug nor your original bug would fire.

I fully agree, I think this is the way.

Though, the fix probably will not be simple, because the issue itself always requires a 3x indirection. The code that is presented by @steakhal is the least minimal example to get this crash. The reason why we cannot have a crash with a ** is a mystic at the moment.

Though, the fix probably will not be simple, because the issue itself always requires a 3x indirection. The code that is presented by @steakhal is the least minimal example to get this crash. The reason why we cannot have a crash with a ** is a mystic at the moment.

I think probably the representation of casts is behind this.

Eg. if you reinterpret cast b to int**, and make the type pun that way, we don't crash.

template <typename T> void clang_analyzer_dump(T);
void test(int *a, char ***b) {
  *(int **)b = a; // only this line changed!
  clang_analyzer_dump(**b); // &SymRegion{reg_$2<char * Element{SymRegion{reg_$0<int * a>},0 S64b,char *}>}
  if (**b == nullptr) // will-not-crash
    ;
}

@steakhal

If we get the value of **b, we get a NonLoc of type unsigned char.
The dump of **b confirms this: reg_$4<unsigned char Element{SymRegion{reg_$0<int * a>},0 S64b,unsigned char}>, which is a NonLoc in deed.

Exactly. That's what I've been trying to explaine above!
This happens because of the casts, after which CSA stores operates with the symbol (b) as it points to int* (though, it really is char**).

IMO we should fix the root cause of this in the Core.

I can't see a direct fix for now and feel quite unconfident in that part of code. That's why I suggested to accept this change as what makes CSA codebase resistant to current bugs in the Core, which we are not able to fix for now. We can change Summary and the name of this revision for acception without objections.

@steakhal
You told that you suppose a potential fix. It would be nice, if you share the patch to review.

NoQ added a comment.Mon, Sep 28, 10:59 AM

Nice, very interesting!

The contract of RegionStore with respect to type punning is that it stores the value as written, even if its type doesn't match the storage type, but then it casts the value to the correct type upon reading by invoking CastRetrievedVal() on it. That's where the fix should probably be.

@steakhal
You told that you suppose a potential fix. It would be nice, if you share the patch to review.

I mean, I already told you my suggestion, there was no concrete fix in my mind at the time.

In D77062#2298748, @NoQ wrote:

Nice, very interesting!

The contract of RegionStore with respect to type punning is that it stores the value as written, even if its type doesn't match the storage type, but then it casts the value to the correct type upon reading by invoking CastRetrievedVal() on it. That's where the fix should probably be.

Hm, thank you for the pointer.
I'm not sure if I fixed this in the right way - I'm still puzzled by the spaghetti code :D
Never the less, here is my proposed fix for this: D88477


@ASDenysPetrov Do you still want to rework the API of the assumeZero?

@steakhal

@ASDenysPetrov Do you still want to rework the API of the assumeZero?

This patch more looks like NFC, being just refactored.
Actually I see that if we find and fix the root cause, we can freely refuse this patch.
Another thing I see is that this patch will work after a root cause fix as well.
And the last one is that as long a root cause stays unfixed, clang will emit an assertion without this patch, at least for my testcase.

So, I can't really evaluate this objectively. What do you think?

@steakhal

@ASDenysPetrov Do you still want to rework the API of the assumeZero?

This patch more looks like NFC, being just refactored.

Fine.

Actually I see that if we find and fix the root cause, we can freely refuse this patch.
Another thing I see is that this patch will work after a root cause fix as well.

Yes, I think so.

And the last one is that as long a root cause stays unfixed, clang will emit an assertion without this patch, at least for my testcase.

So, I can't really evaluate this objectively. What do you think?

Actually, I'm expecting to fix this in the need future. I'm really doing the extra mile to come up with a proper fix.
Unfortunately, it's deeply in the Core and requires a bunch of stuff to work out how to achieve this.
It will be a nice first-time experience though - I've already learned a lot just by debugging it.

Till then, I recommend you to follow my effort at D88477.

ASDenysPetrov added a comment.EditedFri, Oct 2, 3:35 AM

@steakhal

Till then, I recommend you to follow my effort at D88477.

I'm already on this way. I'm debugging the Store. I think we load a wrong type because we store a wrong type.

ASDenysPetrov retitled this revision from [analyzer] Improve zero assumption in CStringChecke::assumeZero to [analyzer] [NFC] Simplify CStringChecke::assumeZero function.
ASDenysPetrov edited the summary of this revision. (Show Details)

Updated. Requalified this patch to non-functional changes.

What do you think we should do with the thing that ProgramState::assume() doesn't support CompoundVal and LazyCompoundVal kinds of SVal?