Page MenuHomePhabricator

[analyzer] Add support for symbolic float expressions
Needs ReviewPublic

Authored by ddcc on Jan 20 2017, 8:13 AM.



With the Z3 constraint manager, symbolic floating-point constraints can also be reasoned about. This commit includes a basic floating-point checker for domain errors with math functions.

Event Timeline

ddcc created this revision.Jan 20 2017, 8:13 AM
ddcc added a comment.EditedJan 20 2017, 8:45 AM
  • This patch introduces a virtual overloaded llvm::APFloat getSymVal() for returning the value of known floating-point constants. In the long term, both getSymVal() functions should probably be refactored to return something like a llvm::Constant.
  • There are also long-term issues with the flexibility of the SymExpr class. Currently, this patch adds two SymFloatExpr and FloatSymExpr subclasses, but this isn't expressive enough for certain types of symbolic queries. For example, the current implementation checks whether a floating-point value is NaN by comparing whether V == V, which should only be false for NaN. It would be better to add support for special queries for whether a value is NaN, positive infinity, negative infinity, etc.
ddcc updated this revision to Diff 85315.Jan 22 2017, 8:24 PM


ddcc updated this revision to Diff 93590.Mar 30 2017, 10:37 PM

Rebase, update tests, fix bugs

lirhea added a comment.EditedMay 14 2017, 4:16 PM
In D28954#714936, @ddcc wrote:

Rebase, update tests, fix bugs

Excuse me,I want to download full codes of this version,but I have no idea how to do it,can you tell me?
And my system is windows, I haven't install anything about Phabricator.
Thank you!

ddcc added a comment.EditedMay 14 2017, 8:21 PM
In D28954#714936, @ddcc wrote:

Rebase, update tests, fix bugs

Excuse me,I want to download full codes of this version,but I have no idea how to do it,can you tell me?
And my system is windows, I haven't install anything about Phabricator.
Thank you!

You're probably better off waiting for these patches (D28953, D28954, and D28955) to land rather than grabbing the commits, since some of these diffs are old and I haven't had a chance to rebase and retest them yet. But if you want to try them now, you'll need to compile Clang/LLVM from source, since they haven't landed in any release. I'm also not sure how well the CMake bindings for Z3 work on Windows, it's not a platform I've tested. Either grab the raw diffs of these three commits using "download raw diff" from the web interface, or using Arcanist's arc patch commit. Alternatively, I have an older tree with all three commits at , just revert the debugging commit.

ddcc updated this revision to Diff 99522.May 18 2017, 10:37 PM

Rebase, avoid generating floating-point constraints if unsupported by constraint manager

dcoughlin edited edge metadata.Jul 12 2017, 4:58 PM

Thanks for the patch, and apologies for the delay reviewing!

Here are some initial comments -- there are more coming.


The called overload of Convert() initializes 'To', right? Should it be in an assert()? I'm worried about losing the side effect in non-assert builds.


There are multiple checks for overflow or invalid in this file. Can the checks be factored out into a static member function?


What do you think about renaming this to "getSymIntVal" now that we have getSymFloatVal()?


The name seems slightly weird. How about "canReasonAboutFloats" or "canReasonAboutFloatingPoint"?


Note the 80-column violation here.


I don't think we need the 'const SymExpr &lhs' entry point. It seems superfluous and as far as I can tell the analog for getSymIntExpr() is never called and should probably be removed.


Is there a reason you're not using assumeDual() here?


Logical || has control flow implications, which is why it is handled at a higher level. We don't have a good way of representing generalized lazy disjunctive constraints. We can eagerly case split at the top level for disjunction, but this is bad for performance and is usually not needed.

Your trick of using bitwise OR as an ad hoc lazy disjunct seems reasonable here.


This logic is nearly identical to that in getValue(const llvm::APSInt& X). Can the logic be factored out in some sort of zero-costish abstraction? (Perhaps templatized over the value type?)


My compiler is complaining about these being uninitialized at line 1251 when both the if and else if guard conditions are false.

This looks like a very solid start!

One area that I'm very worried about is that in various places the analyzer makes assumptions about algebraic simplifications that don't hold for floating point because of NaN and other floating point oddities. I think it really important to write a fairly comprehensive set of tests for these gotchas to make sure we're not applying for floating point. We should test for these both with and without the Z3 solver.

Here are some examples that should all be UNKNOWN (right?) but are not in the current patch, assuming a and b hold unconstrained symbols of floating-point type:

clang_analyzer_eval(a != 4.0 || a == 4.0);
clang_analyzer_eval(a < 4.0 || a >= 4.0);
clang_analyzer_eval((a != b) == !(a == b));
clang_analyzer_eval((a != 4.0) == !(a == 4.0));

It is fine to have this in alpha now. What package to do envision this in after it is ready? Is this something that should always be on, or should be opted into on a per-project basis?


I think it would be a better user experience if the diagnostic could (1) mention the function name and (2) tell the programmer what the valid domain is. (1) should be easy. Do you think (2) is feasible for floating point?


Can the checking for not NaN be factored out? It looks there is similar checking in multiple places.


Is this comment correct? Is this really a divide by zero?

I'm also a bit surprised APFloat::divide() doesn't handle this case.


Could you add a little more context to this TODO? This way if you don't get around to it someone can pick it up.


LLVM style generally discourages putting an 'else' after a return. Here you can just use 'if'.


Aren't structs already handled by the isRecordType() disjunct above that makes a compound value? Can you provide more detail in the FIXME about what needs to be fixed?


Should this say 'APFloat' in the comment?


I guess this is safe, but it seems quite strange to have operations with side effects inside an assert(). Can you store the return value from convert in a local use and (void)local; to suppress the unused variable diagnostic?


Is this the kind of thing it could eventually be a good idea to issue a diagnostic for?


I'm wondering whether we should rename this method to "getKnownIntValue()" and then just return nullptr here. What are the merits of trapping vs. returning nullptr here?


I'm wondering whether this assert is too strong. Should it be the responsibility of callers of this method to guarantee that it doesn't fire? If so, can we document the implied precondition for the method?


It is probably easier to maintain if you use:


here instead of the duplication.

ddcc updated this revision to Diff 107430.Jul 19 2017, 6:06 PM
ddcc marked 18 inline comments as done.

Revise based on feedback

ddcc added a comment.EditedJul 19 2017, 6:11 PM

As an aside, I think it'd be good to land D28954 and D28955 first, since they affect accuracy and precision of various analyzer parts that this depends on.

Here are some examples that should all be UNKNOWN (right?) but are not in the current patch, assuming a and b hold unconstrained symbols of floating-point type:

Yeah, those should definitely be fixed. In general, I tried to avoid performing simplifications on anything not of floating-point type, particularly in SimpleSValBuilder, but there are probably cases that I've missed.

In your example clang_analyzer_eval(a < 4.0 || a >= 4.0) (and likewise for the rest), the following is happening:

  1. At ExprEngineC.cpp:VisitLogicalExpr(), we hit this logical expression for the first time, the introspection fails, and we generate the SVal (((double) (reg_$0<float a>)) >= 4.0E+0) != 0 that is bound to a < 4.0 || a >= 4.0.
  2. The next time around, the introspection succeeds, and we generate the SVal 1 S32b that is bound to a < 4.0 || a >= 4.0.
  3. Now, when we hit ExprInspectionChecker.cpp:getArgumentValueString(), we retrieve the SVal 1 S32b, and attempt to assert it.
  4. Then, we hit SimpleConstraintManager.cpp:assumeAux(), and fall into the nonloc::ConcreteIntKind case. When Assumption is true, we are fine and return the original State, but then when Assumption is false, we return nullptr.
  5. Back in ExprInspectionChecker.cpp:getArgumentValueString(), we see StTrue != nullptr and StFalse == nullptr, and we print TRUE instead of UNKNOWN.

I'm not familiar with VisitLogicalExpr() and why integer constants are being bound to the logical expressions. Wouldn't we simply want to assume that the logical expression, when expressed as a symbolic constraint, is either true/false in each respective child state?


This checker is a bit of a toy, in that (last I checked) Clang doesn't support the floating-point environment, which is typically used to install global floating-point exception handlers (for e.g. NaN, etc). As a result, there are probably going to lots of false-positives on real codebases.

Additionally, it requires the z3 solver, which probably isn't being built by default in most Linux distributions (and I doubt we're at the point of asking package maintainers to add a dependency for clang on libz3, even for those that do build that package).

So I think it should definitely be optional.


I'm not sure how assumeDual helps here? The explicit checks isPInf and isNInf implement the short-circuit effect of logical or, whereas omitting them and binding directly to isInf miss the short-circuit effect.


The tricky part is that there are associate class member variables for each function (APSIntSet and APFloatSet). I can factor out getValue to a template, but then I'd need to introduce a templated helper function with two specializations to retrieve the class member variable for the input template type. I'm not sure if that'd be zero-cost?


I don't recall why I wrote this...


The trapping should be for debugging purposes only, since it is easy to accidentally call the wrong getKnown*Value function without disambiguating on the input type, and this wasn't previously necessary.


Would you prefer returning nullptr instead?

I guess the question is if calling getSymFloatVal on a non floating-point type implies that the result should be converted to a floating-point type, and if so, whether that should trigger an assertion failure or return a null pointer on failure?

ddcc updated this revision to Diff 107432.Jul 19 2017, 6:29 PM

Drop unnecessary code