This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix a crash for dereferencing an empty llvm::Optional variable in SMTConstraintManager.h.
ClosedPublic

Authored by OikawaKirie on Jul 13 2020, 12:52 AM.

Details

Summary

The first crash reported in the bug report 44338.

Condition !isSat.hasValue() || isNotSat.getValue() here should be !isNotSat.hasValue() || isNotSat.getValue().
getValue here crashed when we used the static analyzer to analyze postgresql-12.0.

Diff Detail

Event Timeline

OikawaKirie created this revision.Jul 13 2020, 12:52 AM

Looks like a copy-paste error indeed.

@OikawaKirie do you accidentally have a test case to reproduce the crash, so that we could add it to our regression test suite? 'Cause traditionally we add a test to every patch.

Note that SMT-backed constraint managers aren't an actual supported mode; it's an experimental project that isn't yet supposed to be used anywhere and probably won't be ever officially supported, unless reworked dramatically.

In D83660#2148834, @NoQ wrote:

Looks like a copy-paste error indeed.

@vrnithinkumar @NoQ looks like a good opportunity for SmartPtr checker and llvm::Optional!

do you accidentally have a test case to reproduce the crash

@NoQ I am now working with the reporter of this bug to make a simple test case to trigger the crash.

looks like a good opportunity for SmartPtr checker and llvm::Optional!

@vsavchenko It seems not difficult to write a checker specifically for llvm::Optional. However, the root reason for this crash is an assertion failure. It seems to be better to write a checker to report assertion failures.

In D83660#2148834, @NoQ wrote:

Looks like a copy-paste error indeed.

@OikawaKirie do you accidentally have a test case to reproduce the crash, so that we could add it to our regression test suite? 'Cause traditionally we add a test to every patch.

Note that SMT-backed constraint managers aren't an actual supported mode; it's an experimental project that isn't yet supposed to be used anywhere and probably won't be ever officially supported, unless reworked dramatically.

I cannot agree more. Although I would appreciate some progress improving Z3 constraint manager. Might in the future I would spend some time on it - we will see.

BTW nice catch & fix @OikawaKirie!

Might in the future I would spend some time on it - we will see.

@steakhal My boss always asks me about how to improve the performance for SMT solver based constraint solving, on both the engine side and the SMT solver side. If there is anything that our research group can do, you are free to contact us.

BTW nice catch & fix.

Thank you. This crash was discovered by one of our team members when we were testing our CSA based tool. If the patch gets merged in the future, I'd like to use his name and email for the commit.

Might in the future I would spend some time on it - we will see.

[...] If there is anything that our research group can do, you are free to contact us.

I would use the official LLVM/clang-static-analyzer channel for lightweight discussions.

steakhal accepted this revision.Jul 22 2020, 7:30 AM

Although I'm not the most experienced one here, I think it's good to go.

This revision is now accepted and ready to land.Jul 22 2020, 7:30 AM
NoQ added a comment.Jul 22 2020, 10:35 AM

It seems to be better to write a checker to report assertion failures.

No no no. No no no no no. Assertions don't say "the program crashes if this happens", assertions say "this doesn't happen". We should trust them, not argue against them. They're god-sent for any static analysis and also they themselves constitute a form of dynamic analysis. Like, even if there's assert(false) in the beginning of a function, we shouldn't flag it; it's merely an indication from the user that this function is never called. There are some cases where the assertion indeed doesn't make sense but they're extremely rare. It becomes even more confusing when people occasionally have code to handle a would-be assertion failure gracefully in release builds (i.e., when assertions are disabled).

What we should warn about, though, is precondition failures. And the assertion that we've stepped on here is, in fact, a precondition: "if you call this method on an optional, first make sure the optional isn't empty". I.e., the assertion is put inside the method, but in fact it should be on the method boundary, and that's where it makes sense to warn ("method is misused"). But as of now C/C++ doesn't have a way of formally differentiating assertions from preconditions or postconditions; such feature is known as "contracts" and it was recently delayed until C++23. So manually teaching the static analyzer about preconditions of Optional's methods is the right solution until we have contracts.

NoQ added a comment.Jul 22 2020, 10:38 AM

Although I'm not the most experienced one here, I think it's good to go.

Let's wait for the test to be added, sounds like it's close.

(@OikawaKirie, you can use delta debugging tools like creduce or delta to automatically write test cases for crashes)

In D83660#2167391, @NoQ wrote:

Although I'm not the most experienced one here, I think it's good to go.

Let's wait for the test to be added, sounds like it's close.

(@OikawaKirie, you can use delta debugging tools like creduce or delta to automatically write test cases for crashes)

+1 for the test. It's not like we don't trust the fix, it is pretty obvious. However, it would be good to cover it for the future because it might help with tests around this area of code.

It seems that this patch is stuck.
How can I reproduce the crash? @OikawaKirie

The closest I could come up with was:

void clang_analyzer_eval(int);
void clang_analyzer_printState();

void foo(int a, int b) {
  if (a > 5)
    return;
  if (b < 4)
    return;
  if (a <= b)
    return;

  clang_analyzer_printState();
  clang_analyzer_eval(a == 5); // expected-warning{{TRUE}}
  clang_analyzer_eval(b == 4); // expected-warning{{TRUE}}
  (void)a;
  (void)b;
}

But it still not crash. Any idea?

Ping @OikawaKirie .
How should we proceed?
I would happily participate in creating a minimal repro for this, but I need at least one crash.

We have tried to trigger the crash with the original project where the crash was encountered. But the problem is we cannot trigger the crash with the project, and we have lost all the previous records about this crash. Besides, both I and the bug reporter himself are now working on our own research right now. We have very limited time working on this patch in recent months. We are now planning to just make a simple regression test case to trigger the crash during our spire time. Sorry for the delay.

After reviewing the code of this snippet, I think it would be very difficult to make a regression test case for the crash, as far as what I know about Z3 and SMT solvers.

First of all, all calls to Solver->check() will return true for sat, false for unsat, and empty for a timeout.
On line 132, the manager invokes Z3 for solving the constraints under the current state.
On line 137, the manager invokes Z3 for getting a model (a valid result) satisfying the constraints.
On line 141, the manager adds another constraint to exclude the model gotten from the model.
On line 149, the manager invokes Z3 for solving the excluded constraint.
In summary, the manager will first solver the constraint for a result of the queried symbolic variable. If there is a valid value, it excludes the value and solves again to check whether it is the only valid result.

To trigger the crash, we need to construct a group of constraints that are sat. Then, we need to exclude a valid value for a symbolic variable and make the constraints lead to a timeout (rather than an unsat). Simple linear constraints have very few chances to lead to a timeout. I tried to create a group of constraints from https://stackoverflow.com/questions/20536435/z3-what-might-be-the-reason-for-timeout, which are a group of non-linear unsat constraints that can trigger a timeout (under a timeout of 10 seconds). However, I have not successfully made one, as it has too many things to do with mathematics. And my SMT solver colleagues also think it is quite difficult to make one.

As far as I am thinking, it is also very tricky to trigger a constraint solver timeout. Since it can be impacted by which version of the constraint solver is used, how much time is set for the timeout, how fast your computer runs, and so on. Chances are that even if I reproduced the crash with a test case, the same test case may not work on your computer.

Is it possible to hack the SMT solver creator to make a mock solver for triggering the problem?

After reviewing the code of this snippet, I think it would be very difficult to make a regression test case for the crash, as far as what I know about Z3 and SMT solvers.

First of all, all calls to Solver->check() will return true for sat, false for unsat, and empty for a timeout.
On line 132, the manager invokes Z3 for solving the constraints under the current state.
On line 137, the manager invokes Z3 for getting a model (a valid result) satisfying the constraints.
On line 141, the manager adds another constraint to exclude the model gotten from the model.
On line 149, the manager invokes Z3 for solving the excluded constraint.
In summary, the manager will first solver the constraint for a result of the queried symbolic variable. If there is a valid value, it excludes the value and solves again to check whether it is the only valid result.

To trigger the crash, we need to construct a group of constraints that are sat. Then, we need to exclude a valid value for a symbolic variable and make the constraints lead to a timeout (rather than an unsat). Simple linear constraints have very few chances to lead to a timeout. I tried to create a group of constraints from https://stackoverflow.com/questions/20536435/z3-what-might-be-the-reason-for-timeout, which are a group of non-linear unsat constraints that can trigger a timeout (under a timeout of 10 seconds). However, I have not successfully made one, as it has too many things to do with mathematics. And my SMT solver colleagues also think it is quite difficult to make one.

I suspected this one due to my previous investigation.

As far as I am thinking, it is also very tricky to trigger a constraint solver timeout. Since it can be impacted by which version of the constraint solver is used, how much time is set for the timeout, how fast your computer runs, and so on. Chances are that even if I reproduced the crash with a test case, the same test case may not work on your computer.

I highly agree.
This was the exact reason why I raised a discussion on the mailing list about Limit Z3 without timeouts.
That way we could reproduce this crash without depending on the actual machine.
However, we would still depend on an implementation detail of the solver - and I suspect that this option is highly Z3 specific, so would not fit into the solver-agnostic API very well.
I'm still open to discuss any ideas regarding this, but I can't put any effort into that :|

Is it possible to hack the SMT solver creator to make a mock solver for triggering the problem?

I suspect, no. We don't have a mechanism to do dependency injection into the SMT constant manager implementation. It would be nice to have IMO.

I see two options here:

  1. Implement the dependency injection to let the SMT constant manager implementation use the given SMT solver. Create a mock solver and a unit-test.
  2. Implement the resource limit option in the API. Introduce a flag driving this option and create a LIT test where we set this limit to a really small one, and check a toy example.

I think the second option is easier and the preferred one. At the same time, that would require a greater consensus about changing the corresponding API since that is potentially used by different out-of-tree projects of LLVM.

I'm somewhat bothered that this patch is not landed yet. :|

I made a crude mock to trigger the bug using LD_PRELOAD.


The test reproduces the issue and passes if you apply the fix from this patch.

However, it requires compiling a shared object with the same compiler you used for building clang. I'm using the c++ compiler, for now.
I think it's a good starting point.

OikawaKirie updated this revision to Diff 333708.EditedMar 27 2021, 11:10 PM

I'm somewhat bothered that this patch is not landed yet. :|

I made a crude mock to trigger the bug using LD_PRELOAD.


The test reproduces the issue and passes if you apply the fix from this patch.

However, it requires compiling a shared object with the same compiler you used for building clang. I'm using the c++ compiler, for now.
I think it's a good starting point.

Simply replace the c++ compiler with %host_cxx compiler to compile the mock Z3 solver.

BTW, I was obstructed by the z3 requirement in the regression test case when I tried to understand your test case. Even though I set the variables LLVM_Z3_INSTALL_DIR and LLVM_ENABLE_Z3_SOLVER during CMake, and the CSA can be correctly compiled with Z3, I still cannot make the test case run during make check-all. Therefore, this case was manually executed with llvm-lit -DUSE_Z3_SOLVER=1. Could you please tell me how to enable Z3 during llvm-lit.

Lets see what others think about this.
Im fine with it on my part.

martong accepted this revision.Mar 31 2021, 7:06 AM

I went through the change and it looks good, seems like this is indeed a copy paste error from line 132.
I checked the related conversation, and thanks for all the effort spent with the test.

BTW, I was obstructed by the z3 requirement in the regression test case when I tried to understand your test case. Even though I set the variables LLVM_Z3_INSTALL_DIR and LLVM_ENABLE_Z3_SOLVER during CMake, and the CSA can be correctly compiled with Z3, I still cannot make the test case run during make check-all. Therefore, this case was manually executed with llvm-lit -DUSE_Z3_SOLVER=1. Could you please tell me how to enable Z3 during llvm-lit.

I am a bit unhappy that we cannot run the test automatically, but maybe in the future.
@steakhal, https://reviews.llvm.org/D83677 seems to be related, should we push that?

This is a good fix, I'd like to see landed!

clang/test/Analysis/z3/Inputs/MockZ3_solver_check.c
6

It's not clear why we have to wait for the 5th invocation. Can't we return UNDEF for the first time? That would eliminate the need to call the originalFN.

By the way why do we have call the original, can't we just return immediately with Z3_L_TRUE?

steakhal added inline comments.Mar 31 2021, 7:20 AM
clang/test/Analysis/z3/Inputs/MockZ3_solver_check.c
6

Excellent questions, I should have clarified these upfront!

It's not clear why we have to wait for the 5th invocation.

Probably lies in how the exploded graph is built up. I was also surprised that it was hit exactly 5 times.

Can't we return UNDEF for the first time?

That way we would not reach the 2nd Solver->check() call, just simply early return.

That would eliminate the need to call the originalFN.

Unfortunately, the side-effect is also required. Solver->check() creates a model which is accessed by Solver->getInterpretation(Exp, Value). I don't really want to mock that representation as well. We could though.
I'm gonna have a look at that.

By the way why do we have call the original, can't we just return immediately with Z3_L_TRUE?

We could, but as we have to call the original regardless, there is no point doing that.
This way it can mimic the exact behavior that the Z3 would do, and diverge only for the 5th time.

I went through the change and it looks good, seems like this is indeed a copy paste error from line 132.
I checked the related conversation, and thanks for all the effort spent with the test.

BTW, I was obstructed by the z3 requirement in the regression test case when I tried to understand your test case. Even though I set the variables LLVM_Z3_INSTALL_DIR and LLVM_ENABLE_Z3_SOLVER during CMake, and the CSA can be correctly compiled with Z3, I still cannot make the test case run during make check-all. Therefore, this case was manually executed with llvm-lit -DUSE_Z3_SOLVER=1. Could you please tell me how to enable Z3 during llvm-lit.

I am a bit unhappy that we cannot run the test automatically, but maybe in the future.
@steakhal, https://reviews.llvm.org/D83677 seems to be related, should we push that?

I wouldn't mind fixing that yes, but right now I'm busy with other things.
Let's come back to it later.

BTW, I was obstructed by the z3 requirement in the regression test case when I tried to understand your test case. Even though I set the variables LLVM_Z3_INSTALL_DIR and LLVM_ENABLE_Z3_SOLVER during CMake, and the CSA can be correctly compiled with Z3, I still cannot make the test case run during make check-all. Therefore, this case was manually executed with llvm-lit -DUSE_Z3_SOLVER=1. Could you please tell me how to enable Z3 during llvm-lit.

I am a bit unhappy that we cannot run the test automatically, but maybe in the future.
@steakhal, https://reviews.llvm.org/D83677 seems to be related, should we push that?

I am just reporting this phenomenon and I am not sure whether it is caused by my bad configurations.
Under my configurations, the last step of make check-all will execute llvm-lit with argument --param USE_Z3_SOLVER=0, which will make all test cases requiring z3 skipped.
Although, clang is built with z3 under this configuration.

Can we automatically enable all test cases requiring z3 if clang is built with z3? I do not think the patch D83677 really make the problem fixed.

Can we automatically enable all test cases requiring z3 if clang is built with z3? I do not think the patch D83677 really make the problem fixed.

Z3 constraint manager is unsupported, thus no test runs those parts. And yea, crashes more often than not.

The primary goal of D83677 is to run the analyzer with the dummy range-based constraint manager, but with Z3 bugreport refutation. This scenario is monitored and maintained currently by us, Ericsson. We put effort into fixing crashes relating to this, thus it would be nice to state a test requirement that it // REQUIRES: z3, but only for refutation - which is really fast, has negligible overhead.
The main concern with the Z3 constraint manager is that it's really slow, beyond comfortable for tight CI loops.
So, // REQUIRES: z3 means a little bit different than it supposed. z3-solver would be closer to reality.

Indeed it looks like a copy & paste error, I'm surprised no one found it earlier.

Regarding the tests, we used to have make check-clang-analysis-z3 (or something similar) that would run only the analyzer's tests, but using Z3 as the constraint solver. It looks like this change broke it: https://reviews.llvm.org/D62445

Indeed it looks like a copy & paste error, I'm surprised no one found it earlier.

Regarding the tests, we used to have make check-clang-analysis-z3 (or something similar) that would run only the analyzer's tests, but using Z3 as the constraint solver. It looks like this change broke it: https://reviews.llvm.org/D62445

I add VERBOSE=1 during execution, and command /path/to/llvm-project/build/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /path/to/llvm-project/clang/test/Analysis/z3 is used to execute lit for testing.
And this test case is not executed.

Should the execution requirements be changed to make it run if z3 is enabled? Or just keep it as it is now?

If there are no other suggestions for this patch, I'd like to see it landed ASAP. I think it is a far too long period for a fix of a copy & paste error.

Indeed it looks like a copy & paste error, I'm surprised no one found it earlier.

Regarding the tests, we used to have make check-clang-analysis-z3 (or something similar) that would run only the analyzer's tests, but using Z3 as the constraint solver. It looks like this change broke it: https://reviews.llvm.org/D62445

It might worth investigating this build 'target'.

Should the execution requirements be changed to make it run if z3 is enabled? Or just keep it as it is now?

We should keep current behavior IMO.

If there are no other suggestions for this patch, I'd like to see it landed ASAP. I think it is a far too long period for a fix of a copy & paste error.

I investigated if we could mock the whole Z3 shared object.
Turns out, 33 Z3 functions should be mocked if we chose to mock every used function of libZ3.
Z3_dec_ref, Z3_del_config, Z3_del_context, Z3_get_app_decl, Z3_get_ast_id, Z3_get_bv_sort_size, Z3_get_numeral_string, Z3_get_sort_kind, Z3_inc_ref, Z3_is_eq_sort, Z3_mk_bv_sort, Z3_mk_config, Z3_mk_const, Z3_mk_context_rc, Z3_mk_eq, Z3_mk_int64, Z3_mk_not, Z3_mk_simple_solver, Z3_mk_string_symbol, Z3_mk_unsigned_int64, Z3_model_dec_ref, Z3_model_get_const_interp, Z3_model_has_interp, Z3_model_inc_ref, Z3_set_error_handler, Z3_set_param_value, Z3_solver_assert, Z3_solver_check, Z3_solver_dec_ref, Z3_solver_get_model, Z3_solver_inc_ref, Z3_solver_reset, Z3_to_app

I think we should mock only the Z3_solver_check function as done in this current patch. I don't see any immediate 'easier' and 'robust way of testing this.
I'm gonna land this at the end of the week if no objections are made.

PS: Even if this lands, Z3 solver will crash all over the place :D This was the reason why I did not want to 'push' this fix so hard.