This is an archive of the discontinued LLVM Phabricator instance.

Add correct "-isystem"/"-isysroot" warning handling to static analysis' BugReporter.
AbandonedPublic

Authored by kmarshall on Mar 3 2017, 3:14 PM.

Details

Summary

The Clang static analyzer doesn't follow the warning suppression semantics of the "-isystem" command line flag. This patch adds a check to BugReporter which causes it to drop any BugReports which originated from a system header (descendant of an -isystem path).

Diff Detail

Repository
rL LLVM

Event Timeline

kmarshall created this revision.Mar 3 2017, 3:14 PM

Note to reviewers: this diff is relative to "llvm/cfe" - I couldn't find a way to specify a repository subpath to use for this diff.

kmarshall retitled this revision from Add correct "-isystem" warning handling to static analysis' BugReporter. to Add correct "-isystem"/"-isysroot" warning handling to static analysis' BugReporter..Mar 3 2017, 3:17 PM
thakis edited edge metadata.Mar 3 2017, 3:26 PM

http://llvm-cs.pcc.me.uk/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h/risInSystemHeader suggests that the analyzer has some plumbing for this, so I added dcoughlin as reviewer, who has touched some of those lines before. dcoughlin, as background: We're playing with running the analyzer on chromium, and we were pretty surprised that it defaults to printing diagnostics for system headers. That's different from what regular clang does, and there isn't much applications can do about diagnostics in system headers. kmarshall wrote a script to manually filter out diagnostics from system headers, but we figured it'd make more sense if the analyzer didn't emit those diagnostics in the first place -- probably by default, but maybe behind some flag. Are you familiar with the design behind the current behavior? Does it make sense to change this? Are we missing some existing flag?

(kmarshall: In the future, please send some context lines with your diffs, see http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface.)

dcoughlin edited edge metadata.

The analyzer has two different kinds of diagnostics: AST-based and path-sensitive. AST-based diagnostics are similar to the diagnostics that clang performs in Sema in that they can usually be localized to a single program point. For AST-based checks, we follow clang's policy and never report a diagnostic in system headers.

In contrast, path-sensitive diagnostics describe errors that occur along a particular path through the program and so they involve multiple program points. Even when these paths *end* in system headers, they may start in application code and ultimately be the application code's fault. For example, if the application passes NULL to an inline function in a C++ header that then dereferences that pointer, we do want to emit a diagnostic even though the location of the diagnostic is in a system header. In this case the application programmer can do something about it: they should not pass NULL to the function.

By design the analyzer doesn't ever *start* a path-sensitive check in a header (either user or system) -- but if a path starts in an application source file and eventually calls into a header and does something bad, we do report it under the assumption that the application code is violating a header precondition.

This can lead to false positives when the analyzer doesn't understand the system headers properly. In those cases we have custom heuristics to suppress to known patterns the analyzer doesn't handle.

What specific diagnostics in headers are you seeing? Are these in libcxx? We know we have at least one issue in <regex> that isn't being properly suppressed. But if there are others we'd love to hear about them.

That makes sense. In the cases that I'm finding, the original call site is in application source code, but the warning is generated from inside a system header. So, I suppose the suppressions are working as intended, and we just need to add one-off suppressions for the known noisy sources? Where is the suppression database located? FWIW, there aren't many unique error sources - I only counted 9 after running the warnings through "uniq | sort".

Re: specifics, here the most common one that I'm seeing. It seems like a bogus buffer overrun warning?

../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1258:9: note: Assigned value is garbage or undefined
        *__s++ = *__first++;
               ^ ~~~~~~~~~~

The error is generated from this trivial logging line:

LOG(FATAL) << "Unknown type: " << type;  // |type| is an enum

The offending C++ library code snippet is here (note the line w/comment "GENERATES WARNING")

template<typename _CharT>
  _CharT*
  __add_grouping(_CharT* __s, _CharT __sep,
                 const char* __gbeg, size_t __gsize,
                 const _CharT* __first, const _CharT* __last)
  {
    size_t __idx = 0;
    size_t __ctr = 0;

    while (__last - __first > __gbeg[__idx]
           && static_cast<signed char>(__gbeg[__idx]) > 0
           && __gbeg[__idx] != __gnu_cxx::__numeric_traits<char>::__max)
      {
        __last -= __gbeg[__idx];
        __idx < __gsize - 1 ? ++__idx : ++__ctr;
      }

    while (__first != __last)
      *__s++ = *__first++;  // <------ GENERATES WARNING

    while (__ctr--)
      {
        *__s++ = __sep;         
        for (char __i = __gbeg[__idx]; __i > 0; --__i)
          *__s++ = *__first++;
      }

    while (__idx--)
      {
        *__s++ = __sep;         
        for (char __i = __gbeg[__idx]; __i > 0; --__i)
          *__s++ = *__first++;
      }

    return __s;
  }

Here's a full snippet of the analysis trace:

../../dbus/message.cc:233:9: note: Calling 'basic_ostream::operator<<'
        LOG(FATAL) << "Unknown type: " << type;
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../base/logging.h:414:23: note: expanded from macro 'LOG'
#define LOG(severity) LAZY_STREAM(LOG_STREAM(severity), LOG_IS_ON(severity))
                      ^
../../base/logging.h:402:62: note: expanded from macro 'LAZY_STREAM'
  !(condition) ? (void) 0 : ::logging::LogMessageVoidify() & (stream)
                                                             ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:113:11: note: Assuming '__fmt' is not equal to 'oct'
      if (__fmt == ios_base::oct || __fmt == ios_base::hex)
          ^~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:113:11: note: Left side of '||' is false
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:113:37: note: Assuming '__fmt' is not equal to 'hex'
      if (__fmt == ios_base::oct || __fmt == ios_base::hex)
                                    ^~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:113:7: note: Taking false branch
      if (__fmt == ios_base::oct || __fmt == ios_base::hex)
      ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:116:9: note: Calling 'basic_ostream::_M_insert'
        return _M_insert(static_cast<long>(__n));
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:69:2: note: Taking true branch
        if (__cerb)
        ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/ostream.tcc:75:7: note: Calling 'num_put::put'
                if (__np.put(*this, *this, this->fill(), __v).failed())
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.h:2336:16: note: Calling 'num_put::do_put'
      { return this->do_put(__s, __f, __fill, __v); }
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.h:2475:16: note: Calling 'num_put::_M_insert_int'
      { return _M_insert_int(__s, __io, __fill, __v); } 
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:868:22: note: Assuming '__basefield' is equal to 'oct'
        const bool __dec = (__basefield != ios_base::oct
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:869:8: note: Left side of '&&' is false
                            && __basefield != ios_base::hex);
                            ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:870:32: note: Assuming '__v' is <= 0
        const __unsigned_type __u = ((__v > 0 || !__dec)
                                      ^~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:870:32: note: Left side of '||' is false
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:870:31: note: '?' condition is true
        const __unsigned_type __u = ((__v > 0 || !__dec)
                                     ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:877:6: note: Assuming the condition is true
        if (__lc->_M_use_grouping)
            ^~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:877:2: note: Taking true branch
        if (__lc->_M_use_grouping)
        ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:884:6: note: Calling 'num_put::_M_group_int'
            _M_group_int(__lc->_M_grouping, __lc->_M_grouping_size,
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:839:21: note: Calling '__add_grouping'
      _CharT* __p = std::__add_grouping(__new, __sep, __grouping,
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1249:14: note: Assuming the condition is true
      while (__last - __first > __gbeg[__idx]
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1249:14: note: Left side of '&&' is true
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1250:10: note: Assuming the condition is true
             && static_cast<signed char>(__gbeg[__idx]) > 0
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1249:14: note: Left side of '&&' is true
      while (__last - __first > __gbeg[__idx]
             ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1249:7: note: Loop condition is true.  Entering loop body
      while (__last - __first > __gbeg[__idx]
      ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1254:4: note: Assuming the condition is false
          __idx < __gsize - 1 ? ++__idx : ++__ctr;
          ^~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1254:4: note: '?' condition is false
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1250:7: note: Left side of '&&' is false
             && static_cast<signed char>(__gbeg[__idx]) > 0
             ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1257:7: note: Loop condition is true.  Entering loop body
      while (__first != __last)
      ^
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1257:7: note: Loop condition is true.  Entering loop body
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1257:7: note: Loop condition is true.  Entering loop body
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1257:7: note: Loop condition is true.  Entering loop body
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/locale_facets.tcc:1258:9: note: Assigned value is garbage or undefined
        *__s++ = *__first++;
               ^ ~~~~~~~~~~
kmarshall added a comment.EditedMar 6 2017, 5:22 PM

Also, it looks like some instances of safe unique_ptr assignment using std::move() are tripping up the analyzer.

Error:

../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/unique_ptr.h:190:4: warning: Potential memory leak

C++ library code:

template<typename _Up, typename _Ep, typename = typename
  std::enable_if
    <std::is_convertible<typename unique_ptr<_Up, _Ep>::pointer,
                         pointer>::value
     && !std::is_array<_Up>::value>::type>
  unique_ptr&
  operator=(unique_ptr<_Up, _Ep>&& __u)
  {
    reset(__u.release());
    get_deleter() = std::forward<_Ep>(__u.get_deleter());
    return *this;  // <------------------ GENERATES WARNING
  }

Call site (link):

case Message::STRUCT: {
  MessageReader sub_reader(NULL);
  if (reader->PopStruct(&sub_reader)) {
    std::unique_ptr<base::ListValue> list_value(new base::ListValue);
    if (PopListElements(&sub_reader, list_value.get()))
      result = std::move(list_value);  // <-- GENERATES WARNING. |result| is a std::unique_ptr<base::Value>, so the std::move should be safe.
  }
  break;
}

Analysis trace:

In file included from ../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/memory:85:
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/unique_ptr.h:190:4: warning: Potential memory leak
          return *this;
          ^
../../dbus/values_util.cc:94:3: note: Control jumps to 'case STRUCT:'  at line 197
  switch (reader->GetDataType()) {
  ^
../../dbus/values_util.cc:199:11: note: Assuming the condition is true
      if (reader->PopStruct(&sub_reader)) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../dbus/values_util.cc:199:7: note: Taking true branch
      if (reader->PopStruct(&sub_reader)) {
      ^
../../dbus/values_util.cc:200:53: note: Memory is allocated
        std::unique_ptr<base::ListValue> list_value(new base::ListValue);
                                                    ^~~~~~~~~~~~~~~~~~~
../../dbus/values_util.cc:201:9: note: Taking true branch
        if (PopListElements(&sub_reader, list_value.get()))
        ^
../../dbus/values_util.cc:202:11: note: Calling 'unique_ptr::operator='
          result = std::move(list_value);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/unique_ptr.h:190:4: note: Potential memory leak
          return *this;
          ^

We add the suppression for libcxx issues in LikelyFalsePositiveSuppressionBRVisitor in BugReporterVisitors.cpp. It is ad-hoc pattern recognition of idioms we know cause false positives rather than an explicit database.

It appears that the "suppress-c++-stdlib" option in LikelyFalsePositiveSuppressionBRVisitor is yielding promising results. It removes warnings from the "std" namespace which addresses both false positives that I cited above.

Kevin, would you be willing to file a PR on https://bugs.llvm.org with the 9 false positives you are seeing? This will help us suppress them for users who don't use 'suppress-c++-stdlib'.

OK. I'll make bugs for the false positives I'm confident about.

kmarshall abandoned this revision.Mar 9 2017, 3:38 PM
zaks.anna edited edge metadata.Mar 9 2017, 4:42 PM

@kmarshall,

We are going to turn this off by default, see https://reviews.llvm.org/D30798.

Please, do file a bug that lists all the issues you are seeing and desirably instructions on how to reproduce. (Please, list even the cases you are not 100% sure are false positives)

Thank you!