This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Path-sensitive different.IntegerOverflow checker
Needs ReviewPublic

Authored by j.trofimovich on Jun 9 2014, 2:02 AM.

Details

Summary

Hi all,

We tried to implement IntegerOverflow checker, which is in the list of potential checkers. Could you review it and share your opinion, please?

Best regards,
Julia Trofimovich

Diff Detail

Event Timeline

j.trofimovich retitled this revision from to [analyzer] [patch] Path-sensitive different.IntegerOverflow checker.
j.trofimovich updated this object.
j.trofimovich edited the test plan for this revision. (Show Details)
j.trofimovich added a subscriber: Unknown Object (MLST).
zaks.anna edited edge metadata.Jul 8 2014, 1:27 PM

Hi Julia,

Thanks for working on this!

In general, we prefer to get incremental patches rather than something that is close to the end-result. That way it is easier to get the guidance along the way and easier for a reviewer to review.

Here are some high level comments.

The package name lacks description.

It would be very helpful to differentiate the overflows that are considered undefined behavior from the ones that are defined. I would have 2 checkers for these, to allow users to turn them on/off independently. However, this might be a separate future improvement.

Could you add more documentation in doxygen style as well as a high level description of the algorithm?

After briefly looking at the patch, I am still not sure how exactly "ExternalSym" is used. Looks like you are tracking symbols that the analyzer may not have values for (like globals). But that should not be needed: if a value is not known by the analyzer you can determine that when evaluating the overflow check. Specifically, State->assume() returns two states, which you can use to find out which situation you are dealing with:

  • there was an overflow
  • there was no overflow
  • under constrained or unknown - this would correspond to the case where both states are feasible.

There should be other posts on the mailing list describing this API as well as other uses of it throughout the analyzer.

Also, I did not notice tests that are testing these heuristics...

The error message seems to be constructed prematurely and I do not see any tests that test it fully.

Another checker that is somewhat related to this one is the String API checker. One reason why it is still in alpha is that it was very difficult to interpret the reports; most of which were due to overflows earlier on the path. It would be very interesting to see how this checker works on various codebases (with respect to bugs found vs false positives).

Cheers,
Anna.

lib/StaticAnalyzer/Checkers/IntegerOverflowChecker.cpp
35

Are you getting multiple reports on the same location? I don't think that should be happening - the bug reporting infrastructure should unique reports.

349

Need a better comment here - the function returns a bool.
Also, comments should be in doxygen form and added to the declarations.

446

addGoodSink seem to add a symbol to a state. Looks like it's used to track symbols we are not supposed to diagnose - symbols with global state? Could you add a better comment/name here?

Sink has a very specific meaning in the analyzer - it's a node at which the simulation stops. Usually, it's used for unrecoverable error nodes.

498

It looks like you are building an error message string before the issue is diagnosed..

Thanks a lot for reviewing!

Unfortunately we have no ability to share our code before it satisfies some quality level...

Could you propose description for "different" package?

You are correct in your assumption about ExternalSym goals. But there are some cases where analyzer fails to determine right value for a symbol, i.e two alerts from android codebase:

  • report-7f4011.html
Fileexternal/mesa3d/src/glsl/linker.cpp
Locationline 779, column 42
DescriptionInteger overflow while subtraction. 0 U32b AND 1 U32b

This alert happens because analyzer have no information about num_shaders and while cross_validate_globals proccessing assumes that num_shaders can be 0. But actually it's never happened because num_shaders is checked for 0 every time before link_intrastage_shaders calling (external/mesa3d/src/glsl/linker.cpp, lines 1602 and 1617).

  • report-51bc27.html
Fileframeworks/av/drm/libdrmframework/plugins/passthru/src/DrmPassthruPlugIn.cpp
Locationline 66, column 41
DescriptionInteger overflow while addition. 4294967295 U32b AND 1 U32b

This alert happens because constructor for value(line 64) doesn't inlined (because this constructor is defined in another translation unit frameworks/​native/​libs/​utils/​String8.cpp) and class member mString is assumed to be 0. So, when value.length() is called(line 66) underflow happens and (0 - 1) is returned. Further addition 1 results in FP overflow.

We tested IntegerOverflow checker on Android codebase where it produced 236 alerts. In brief I guess about 70% of alerts are TP.

If you would like to inspect full results of analysis with enabled/disabled heuristic please suggest place for uploading(size is about 100mb).

I'll try to change the checker according to your comments and it would be nice if you'll find time to review it again!)

Unfortunately we have no ability to share our code before it satisfies some quality level...

The llvm developer policy encourages incremental development. If you cannot send patches before they go though the approval process, maybe you could seek guidance while developing by posting questions on the mailing list and split the patch into logical pieces when sending for review? For example, in this case, the ExternalSym is something that would come in separately with it's own tests as justification. We should also discuss if that's the best approach to suppress the false positives.

You are correct in your assumption about ExternalSym goals. But there are some cases where analyzer fails to determine right value for a symbol, i.e two alerts from android >codebase:

I hope there could be better ways of handling these than a checker keeping a table of symbols that are not handled well by the analyzer. (The current approach does not solve the underlying issues and it introduces overhead for all checkers.)

Looks like there are various sources of false positives..

This alert happens because analyzer have no information about num_shaders and while cross_validate_globals proccessing assumes that num_shaders can be 0. But actually >it's never happened because num_shaders is checked for 0 every time before link_intrastage_shaders calling (external/mesa3d/src/glsl/linker.cpp, lines 1602 and 1617).

This is about an inlined function adding constraints that are not valid for all input values? Would it be possible to construct a small example/test case for this? You can use the debug checkers to show that a particular variable does not have the right value, see docs/analyzer/DebugChecks.rst?

We have added some heuristics for null pointer dereference, where we do not count in the null checks coming from the inlined functions (inlined defensive checks). However, this case might be more complicated.

native/​libs/​utils/​String8.cpp) and class member mString is assumed to be 0. So, when value.length() is called(line 66) underflow happens and (0 - 1) is returned. Further addition 1 results in FP overflow.

It's hard me to tell what this issue is - not all code is available. Again, could you construct a small example and show that the value is wrong with the debug checker? If we fail to inline a constructor, a value should be underconstrained, not '0'.

j.trofimovich retitled this revision from [analyzer] [patch] Path-sensitive different.IntegerOverflow checker to [analyzer] Path-sensitive different.IntegerOverflow checker.
j.trofimovich edited edge metadata.

Improve checker and test suite:

  1. Split into two checkers for def/undef behavior.
  2. Refactor message composing
  3. Add comments
  4. Add test cases

I'm sorry for two-month delay...
ExternalSym is the only way I can suggest to suppress the large number of FP's. I guess inter-unit analysis could have been helpful, but as such approach would require a lot more work, I found ignoring values which may be changed in other units to be the only viable option. But of course we are interested in any discussions and suggestions...

I've wrote some example but didn't find the proper place to add it, so I guess I just post it here:

main.cpp

#include "SomeClass.h"

void clang_analyzer_eval(bool);

int main() {
  SomeClass sc(0);
  bool b = (sc.someFunc() + 1) != 0;
  // FIXME: 'b' is always TRUE
  clang_analyzer_eval(b); // expected-warning{{TRUE}} expected-warning{{FALSE}}
  return 0;
}

SomeClass.h

#ifndef SOMECLASS_H
#define SOMECLASS_H

void someExternalFunc (int a) {
  unsigned k = a ? 2 * a : a; // A comparison with 0 to bifurcate the state
}

class SomeClass {
  unsigned a;
public:
  SomeClass();
  unsigned someFunc() {
    someExternalFunc(a);
    return a - 1;
  }
};

#endif // SOMECLASS_H

SomeClass.cpp

#include "SomeClass.h"

SomeClass::SomeClass(unsigned pa) {
  a = pa ? pa : -1; // a isn't 0
}

Unexpected "FALSE" warning here has the same nature as the FP in DrmPassthruPlugIn.cpp. Lack of information from another translation unit forces analyzer to consider that some variables may take values which cant't really be taken. So 'b' in main.cpp can't be false, but the 'FALSE' warning still happens.

lib/StaticAnalyzer/Checkers/IntegerOverflowChecker.cpp
35

In what way should bug reporting infrastructure unique reports? scan-build prevents existence of fully identical reports by computing digest (Digest::MD5->new->addfile(*FILE)->hexdigest; scan-build, line 247) but cases when alerts differs by message only aren't caught.

I suspect, this code below explains why you are getting the false positives.

The issue you highlight in the example is that sometimes the analyzer doesn't know what the value of a variable is. The existing checkers minimize false positives by issuing a warning only when it's known that a value is "bad". For example, we would only warn if StateOverflow && !StateNotOverflow. This will flag much less issues, but should not produce a lot of false positives.

Are the false positives you are getting being flagged by the first if clause?

if (StateOverflow && StateNotOverflow) {

  if (Pack.LValueIsTainted) {
    Msg.assign("Possible integer overflow while " + Pack.Operation +
               ". Left operand is tainted: " + Pack.LValue + " AND " +
               Pack.RValue);
    reportBug(Msg, C, SL);
  } else if (Pack.RValueIsTainted) {
    Msg.assign("Possible integer overflow while " + Pack.Operation +
               ". Right operand is tainted: " + Pack.LValue + " AND " +
               Pack.RValue);
    reportBug(Msg, C, SL);
  }
  return;
}

if (StateOverflow) {
  Msg.assign("Integer overflow while " + Pack.Operation + ". " + Pack.LValue +
             " AND " + Pack.RValue);
  reportBug(Msg, C, SL);
}
lib/StaticAnalyzer/Checkers/IntegerOverflowChecker.cpp
35

Identical issues should have the same message. Do you have identical issues with different messages?

Hello Anna,

thank you for your reply.
Could you please point a problem in this code sample? I cannot find a sample that doesn't follow this recommendation.
In this checker, we have 2 cases for throwing a report:

  1. We could both overflow and not overflow (StateOverflow && StateNotOverflow). So I check if one of operands is tainted, produce a warning if so and return. It is not a source of false positives.
  2. Then I check if only overflow is possible: StateOverflow && !StateNotOverflow == StateOverflow here, because StateOverflow && StateNotOverflow option was considered before.

I'll try to explain my assumptions why having 'StateOverflow && !StateNotOverflow' is not enough to get a reliable reason to warn.
The first 'if' clause is not a problem because we could definitely know if an operand is tainted so we point that we *can* overflow here. The main source of false positives are values that are reasoned to overflow by analyzer but cannot overflow due to possible checks in the caller. These values are values of function arguments and global variables, for example. I collect these values in 'ExternalSym' set. This is what I am filtering out. In this case analyzer's constraints are not enough to suggest an overflow so we make an additional check.