Clang Static Analyzer checker for detecting forgotten std::ostream modifications.
Diff Detail
Event Timeline
Hello,
Thanks for the patch, and for the impressing amount of work you put into this. Because the patch is huge, so it's going to take some time for somebody to understand how everything works.
I'd probably like to learn a bit more about the motivation behind the checker. You must be working on a project that uses a lot of std streams, and the checker seems to verify a particular coding guideline you're following (revert all changes to the stream locally). I wonder how universally useful this guideline is. For instance, there might exist a project that enjoys writing code like this:
void set_my_favorite_format_specifiers() { std::cout << std::setprecision(12) << std::hex; // warning? } void revert_my_favorite_format_specifiers() { std::cout << std::setprecision(6) << std::dec; } void foo() { set_my_favorite_format_specifiers(); std::cout << 100 << std::endl; revert_my_favorite_format_specifiers(); }
Do you expect such pattern to be common? If we enable the checker by default, would positives on such code cause annoyance, and would there be a way to suppress them?
Hello,
This checker was developed indeed with internal usage in mind. It should not necessary be added as a default checker. However I have run it on the boost-1.63.0 codebase, and there some some mildly interesting findings in examples and tests. There is also a true positive result in the core codebase.
As for the patter described above, it certainly is a possible use-case to set stream state inside functions, but the number of false positives that were cause by this pattern, was not that high. The boost build system was reluctant to cooperate with scan-build-py.
Interesting finding:
boost/progress.hpp
namespace boost { // progress_timer ----------------------------------------------------------// // A progress_timer behaves like a timer except that the destructor displays // an elapsed time message at an appropriate place in an appropriate form. class progress_timer : public timer, private noncopyable { public: explicit progress_timer( std::ostream & os = std::cout ) // os is hint; implementation may ignore, particularly in embedded systems : timer(), noncopyable(), m_os(os) {} ~progress_timer() { // A) Throwing an exception from a destructor is a Bad Thing. // B) The progress_timer destructor does output which may throw. // C) A progress_timer is usually not critical to the application. // Therefore, wrap the I/O in a try block, catch and ignore all exceptions. try { // use istream instead of ios_base to workaround GNU problem (Greg Chicares) std::istream::fmtflags old_flags = m_os.setf( std::istream::fixed, std::istream::floatfield ); std::streamsize old_prec = m_os.precision( 2 ); m_os << elapsed() << " s\n" // "s" is System International d'Unites std << std::endl; m_os.flags( old_flags ); m_os.precision( old_prec ); } catch (...) {} // eat any exceptions } // ~progress_timer private: std::ostream & m_os; };
This is a bug, that could be remedied by using the already existing boost iostream state saving pattern, which provides exception safety.
Also there were results in tests and examples, which could be coded cleaner. It would promote responsible management of global stream state if the checker gave a warning.
Example:
libs/config/test/config_info.cpp
static unsigned int indent = 4; static unsigned int width = 40; void print_macro(const char* name, const char* value) { // if name == value+1 then then macro is not defined, // in which case we don't print anything: if(0 != strcmp(name, value+1)) { for(unsigned i = 0; i < indent; ++i) std::cout.put(' '); std::cout << std::setw(width); cout.setf(istream::left, istream::adjustfield); std::cout << name; if(value[1]) { // macro has a value: std::cout << value << "\n"; } else { // macro is defined but has no value: std::cout << " [no value]\n"; } } }
Hello,
Sorry again for the delays, thank you for your patience. Your checker is in good shape, very well-structured and easy to follow, you definitely know your stuff, and my comments here are relatively minor.
Are you planning on making more varied warning messages, eg. specifying which format modifiers are forgotten, and where in the code they were previously set? The latter could be done by using the "bug reporter visitor" mechanism.
I'm sure we can at the very least eventually land this checker into the optin package, which is the new package for warnings that aren't on by default, but are advised to be turned on by the users if the codebase intends to use certain language or API features or contracts (unlike the alpha package which is for checkers undergoing development to be ultimately moved out of alpha). For moving out of alpha, i think the work on the warning messages needs to be done, and the checker needs to be tested on a large codebase to make sure that the false positive rate is low (i guess you already did it), and that's pretty much it :) On a side note, i've been recently thinking of making optin checkers more visible to the scan-build users, eg. mentioning in the log that certain checks may be worth enabling.
lib/StaticAnalyzer/Checkers/OStreamFormatChecker.cpp | ||
---|---|---|
264–283 | If you ever want to extend the CallDescription class to cover your use case, please let us know :) While most of these aren't functions, the approach used in this class could be used here as well - making initialization code shorter. | |
388 | I think you should use SimpleSValBuilder::getKnownValue(). The EvaluateAsInt part doesn't add much because the analyzer's engine should already be more powerful than that. On the other hand, the ConstraintManager::getSymVal() thing, which is already done in SimpleSValBuilder::getKnownValue(), may be useful. | |
404 | By the way, there's the None thing that represents any empty optional. | |
514 | One of the practical differences between checkPostCall and evalCall is that in the latter you have full control over the function execution, including invalidations that the call causes. Your code not only sets the return value, but also removes invalidations that normally happen. Like, normally when an unknown function is called, it is either inlined and therefore modeled directly, or destroys all information about any global variables or heap memory that it might have touched. By implementing evalCall, you are saying that the only effect of calling operator<<() on a basic_ostream is returning the first argument lvalue, and nothing else happens; inlining is suppressed, invalidation is suppressed. I'm not sure if it's a good thing. On one hand, this is not entirely true, because the operator changes the internal state of the stream and maybe of some global stuff inside the standard library. On the other hand, it is unlikely to matter in practice, as far as i can see. Would it undermine any of your efforts here if you add a manual invalidation of the stream object and of the GlobalSystemSpaceRegion memory space (you could construct a temporary CallEvent and call one of its methods if it turns out to be easier)? I'm not exactly in favor of turning this into checkPostCall() because binding expressions in that callback may cause hard-to-notice conflicts across multiple checkers. Some checkers may even take the old value before it's replaced. For evalCall() we at least have an assert. If you intend to keep the return value as the only effect, there's option of making a synthesized body in our body farm, which is even better at avoiding inter-checker conflicts. Body farms were created for that specific purpose, even though they also have their drawbacks (sometimes evalCall is more flexible than anything we could synthesize, eg. D20811). If you have considered all the alternative options mentioned above and rejected them, could you add a comment explaining why? :) | |
552 | We have a global analyzer flag to suppress those, -analyzer-config suppress-c++-stdlib=true, which has been recently turned on by default. I don't think that duplicating this suppression into every checker separately is worth it. | |
678–683 | ||
728 | This region may be null in case the argument is UnknownVal or UndefinedVal. I don't think you intend to have a null pointer in the map here and in a few other places. | |
test/Analysis/ostream-format.cpp | ||
53 | Did you intend to leave this string here? :) |
Update diff.
lib/StaticAnalyzer/Checkers/OStreamFormatChecker.cpp | ||
---|---|---|
264–283 | I will look at the CallDescription class, and how the checker can benefit. Is it still a problem, that we can not initialize IdentifierInfos during checker construction? If so, how will would a CallDescription implementation solve that? | |
388 | I have tried an implementation of getKnownValue(), and it more terse, and did not handle the cases it should have had to anyway. | |
514 | I am not familiar with the BodyFarm approach, however I tried something along the lines of: It however broke test2 (where the state is set to hex formatting, then, back to dec). Any tips why resetting regions could cause problems? |
Hello,
After experimentation the following AST difference between the mock and the standard library implementation still stands (which necessitates the special handling of the complex manipulators). Example:
// The different behaviour of the AST is illustrated on the test4 function of the test suite: // // void test4(float f) { // std::cout << std::setprecision(2) // << f; // The stream state is chaged, but not restored. // } // expected-warning{{Possibly forgotten ostream format modification in scope}} // // The AST of the function with the standard library include // (note the wrapping CXXConstructorExpr): // |-FunctionDecl 0x5642e3f80b70 <line:33:1, line:36:1> line:33:6 test4 'void (float)' // | |-ParmVarDecl 0x5642e3f80ab0 <col:12, col:18> col:18 used f 'float' // | `-CompoundStmt 0x5642e3f818b0 <col:21, line:36:1> // | `-ExprWithCleanups 0x5642e3f81898 <line:34:3, line:35:16> 'std::basic_ostream<char, struct std::char_traits<char> >::__ostream_type':'class std::basic_ostream<char>' lvalue // | `-CXXOperatorCallExpr 0x5642e3f81850 <line:34:3, line:35:16> 'std::basic_ostream<char, struct std::char_traits<char> >::__ostream_type':'class std::basic_ostream<char>' lvalue // | |-ImplicitCastExpr 0x5642e3f81838 <col:13> 'std::basic_ostream<char, struct std::char_traits<char> >::__ostream_type &(*)(float)' <FunctionToPointerDecay> // | | `-DeclRefExpr 0x5642e3f817b8 <col:13> 'std::basic_ostream<char, struct std::char_traits<char> >::__ostream_type &(float)' lvalue CXXMethod 0x5642e3edf120 'operator<<' 'std::basic_ostream<char, struct std::char_traits<char> >::__ostream_type &(float)' // | |-CXXOperatorCallExpr 0x5642e3f81270 <line:34:3, col:35> 'basic_ostream<char, struct std::char_traits<char> >':'class std::basic_ostream<char>' lvalue // | | |-ImplicitCastExpr 0x5642e3f81258 <col:13> 'basic_ostream<char, struct std::char_traits<char> > &(*)(basic_ostream<char, struct std::char_traits<char> > &, struct std::_Setprecision)' <FunctionToPointerDecay> // | | | `-DeclRefExpr 0x5642e3f811d0 <col:13> 'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char, struct std::char_traits<char> > &, struct std::_Setprecision)' lvalue Function 0x5642e3f6a6e0 'operator<<' 'basic_ostream<char, struct std::char_traits<char> > &(basic_ostream<char, struct std::char_traits<char> > &, struct std::_Setprecision)' // | | |-DeclRefExpr 0x5642e3f80c30 <col:3, col:8> 'std::ostream':'class std::basic_ostream<char>' lvalue Var 0x5642e3f56578 'cout' 'std::ostream':'class std::basic_ostream<char>' // | | `-CXXConstructExpr 0x5642e3f81198 <col:16, col:35> 'struct std::_Setprecision' 'void (const struct std::_Setprecision &) throw()' elidable // | | `-MaterializeTemporaryExpr 0x5642e3f81088 <col:16, col:35> 'const struct std::_Setprecision' lvalue // | | `-ImplicitCastExpr 0x5642e3f81070 <col:16, col:35> 'const struct std::_Setprecision' <NoOp> // | | `-CallExpr 0x5642e3f80d20 <col:16, col:35> 'struct std::_Setprecision' // | | |-ImplicitCastExpr 0x5642e3f80d08 <col:16, col:21> 'struct std::_Setprecision (*)(int)' <FunctionToPointerDecay> // | | | `-DeclRefExpr 0x5642e3f80c88 <col:16, col:21> 'struct std::_Setprecision (int)' lvalue Function 0x5642e3f63f70 'setprecision' 'struct std::_Setprecision (int)' // | | `-IntegerLiteral 0x5642e3f80cc0 <col:34> 'int' 2 // | `-ImplicitCastExpr 0x5642e3f817a0 <line:35:16> 'float' <LValueToRValue> // | `-DeclRefExpr 0x5642e3f812b8 <col:16> 'float' lvalue ParmVar 0x5642e3f80ab0 'f' 'float' // | | `-IntegerLiteral 0x5642e3f80cc0 <col:34> 'int' 2 // The AST of the function with the mock library include // |-FunctionDecl 0x55c053dfaa70 <line:32:1, line:35:1> line:32:6 test4 'void (float)' // | |-ParmVarDecl 0x55c053dfa9b0 <col:12, col:18> col:18 used f 'float' // | `-CompoundStmt 0x55c053dfb7c0 <col:21, line:35:1> // | `-ExprWithCleanups 0x55c053dfb7a8 <line:33:3, line:34:16> 'class std::basic_ostream<char>' lvalue // | `-CXXOperatorCallExpr 0x55c053dfb760 <line:33:3, line:34:16> 'class std::basic_ostream<char>' lvalue // | |-ImplicitCastExpr 0x55c053dfb748 <col:13> 'class std::basic_ostream<char> &(*)(float)' <FunctionToPointerDecay> // | | `-DeclRefExpr 0x55c053dfb6f8 <col:13> 'class std::basic_ostream<char> &(float)' lvalue CXXMethod 0x55c053df8120 'operator<<' 'class std::basic_ostream<char> &(float)' // | |-CXXOperatorCallExpr 0x55c053dfb670 <line:33:3, col:35> 'basic_ostream<char>':'class std::basic_ostream<char>' lvalue // | | |-ImplicitCastExpr 0x55c053dfb658 <col:13> 'basic_ostream<char> &(*)(basic_ostream<char> &, const class std::setprecision_manip &)' <FunctionToPointerDecay> // | | | `-DeclRefExpr 0x55c053dfb5d0 <col:13> 'basic_ostream<char> &(basic_ostream<char> &, const class std::setprecision_manip &)' lvalue Function 0x55c053dfb490 'operator<<' 'basic_ostream<char> &(basic_ostream<char> &, const class std::setprecision_manip &)' // | | |-DeclRefExpr 0x55c053dfab30 <col:3, col:8> 'std::ostream':'class std::basic_ostream<char>' lvalue Var 0x55c053df6210 'cout' 'std::ostream':'class std::basic_ostream<char>' // | | `-MaterializeTemporaryExpr 0x55c053dfb5b8 <col:16, col:35> 'const class std::setprecision_manip' lvalue // | | `-ImplicitCastExpr 0x55c053dfb5a0 <col:16, col:35> 'const class std::setprecision_manip' <NoOp> // | | `-CallExpr 0x55c053dfac20 <col:16, col:35> 'class std::setprecision_manip' // | | |-ImplicitCastExpr 0x55c053dfac08 <col:16, col:21> 'class std::setprecision_manip (*)(int)' <FunctionToPointerDecay> // | | | `-DeclRefExpr 0x55c053dfab88 <col:16, col:21> 'class std::setprecision_manip (int)' lvalue Function 0x55c053df2f20 'setprecision' 'class std::setprecision_manip (int)' // | | `-IntegerLiteral 0x55c053dfabc0 <col:34> 'int' 2 // | `-ImplicitCastExpr 0x55c053dfb6e0 <line:34:16> 'float' <LValueToRValue> // | `-DeclRefExpr 0x55c053dfb6b8 <col:16> 'float' lvalue ParmVar 0x55c053dfa9b0 'f' 'float' // // I haven't been able to modify the mock to exhibit the same behaviour.
I think its fine to leave it like this, because if this behaviour changes in case of the real life implementation of the standard library, the fallback branch still catches the case.
As for the evalCall concerns, I have implemented the invalidation of the global stream object. With API of the ProgramState::invalidateRegions I could solve the invalidation of the stream object (which in turn takes care of the corresponding global memory region as well if it is indeed globally stored, like std::cout).
I would like proceed with contributing this checker to codebase, should no further concerns arise.
lib/StaticAnalyzer/Checkers/OStreamFormatChecker.cpp | ||
---|---|---|
514 | I agree that we should not use evalCall(), especially, in an opt-in checker, as it can introduce subtle/hard to debug issues. As was mentioned, if a checker implements evalCall(), the usual evaluation by the analyzer core does not occur, which could lead to various unpredictable results. |
There are multiple new additions to stream formatters, and this patch is way too old. I may consider creating a new checker like this with an extended feature set later.
If you ever want to extend the CallDescription class to cover your use case, please let us know :)
While most of these aren't functions, the approach used in this class could be used here as well - making initialization code shorter.