This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add a new checker callback, check::NewAllocator.
ClosedPublic

Authored by NoQ on Dec 19 2017, 11:08 AM.

Details

Summary

This patch is roughly based on the discussion we've had in http://lists.llvm.org/pipermail/cfe-dev/2017-December/056314.html about how our support for C++ operator new() should provide useful callbacks to the checkers. As in, this patch tries to do a minimal necessary thing while avoiding all decisions that we didn't have a consensus over.

This patch also absorbs D36708.

In the discussion, we've been mentioning PreAllocator/PostAllocator callbacks. However, we already sort of have them: they are PreCall/PostCall callbacks for the allocator call, which are readily available in the c++-allocator-inlining mode.

This patch also causes no change in the behavior of the pre/post CXXNewExpr callbacks. They remain broken in the c++-allocator-inlining mode for now.

Missing in the current system of callbacks, however, is a callback that would be called before construction while providing the casted return value of operator new (in the sense of D41250: operator new returns void *, but we need to have C *). The user can subscribe on the allocator's PostCall and perform the cast manually, but that's an unpleasant thing to do. So it sounds like a good idea to provide a callback that would contain the relevant information.

A similar "Pre" callback doesn't seem necessary - there's not much additional information we can provide compared to PreCall.

In this patch, I add the callback (with a lot of code that is the usual boilerplate around the callback, though there's still a questionable TODO in CheckNewAllocatorContext::runChecker()) and modify MallocChecker to make use of it when c++-allocator-inlining mode is turned on. Specifically:

  • If the allocator (operator new) call is evaluated conservatively, checkNewAllocator() provides the return value of the allocator that is semantically equivalent to the value in checkPostStmt<CXXNewExpr>() that we've been using earlier. Without the cast, the value is different and the checker crashes.
  • If the allocator call is inlined, the checker does not treat it as an allocator, but instead as a simple inlined function. That is, it does not start tracking the return value here, and in particular avoids crashes, though it may continue tracking the value if it is already being tracked (see the new-ctor-malloc.cpp test). This policy has already been there because previously we could inline the operator when it was written without operator syntax - see tests in test/Analysis/NewDelete-custom.cpp. Whether this policy was correct or not is a matter of discussion. For now it means that in c++-allocator-inlining mode we'd have new negatives (i.e. lost positives) on code that uses custom allocators which do not boil down to simple malloc.

The std::nothrow_t tests in NewDelete-custom.cpp are affected in a surprising manner. I'd fix them in a follow-up commit.

Diff Detail

Repository
rC Clang

Event Timeline

NoQ created this revision.Dec 19 2017, 11:08 AM
NoQ updated this revision to Diff 127579.Dec 19 2017, 11:34 AM
  • Actually call the new callback when the allocator call is inlined.
  • Update checker documentation :)
NoQ added a comment.Dec 19 2017, 3:19 PM

TODOs for the future commits:

  • Constructor shouldn't cause pointer escape of the newly allocated pointer immediately after the NewAllocator callback, otherwise we ain't gonna find no leaks. Without c++-allocator-inlining, we only started tracking the pointer after the constructor, so this wasn't a problem. Should this be a checker-side workaround, or we should globally disable pointer escape (but not invalidation!) for this-pointer and constructor?
  • Refactor other checkers that relied on PostStmt<CXXNewExpr>, namely DynamicTypePropagation and PointerArithmeticChecker, to use the new callback. Then see if we can remove the callback entirely, and probably re-introduce later depending on where the discussion goes.

Maybe debug.AnalysisOrder could be used to test the callback order explicitly. This way the test could also serve as a documentation for the callback order.

NoQ added a comment.Dec 20 2017, 11:24 AM

Maybe debug.AnalysisOrder could be used to test the callback order explicitly. This way the test could also serve as a documentation for the callback order.

Yep, totally, will do.

dcoughlin accepted this revision.Jan 8 2018, 6:10 PM

Looks good to me with some minor nits inside (and also a request to consider factoring out common code).

include/clang/StaticAnalyzer/Core/CheckerManager.h
311

Does 'wasInlined' really need to have a default argument? It looks like there are only two callers. (Also the capitalization is inconsistent).

lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
138

I find the following confusing: "Post-call for the allocator is called before step (1)." Did you mean "after" step one?

lib/StaticAnalyzer/Checkers/MallocChecker.cpp
298

It is probably worth explaining what RetVal is and what it means if it is None in the doxygen.

328

Same comment about documenting RetVal here.

968

Nit: Capitalization of "retVal".

1291

Capitalization, again.

1306

I think it is worth a comment here about why we expect this assert succeed: the value will always be the result of a malloc function or an un-inlined call to the new allocator (if I understand correctly).

lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
354

Is it possible/desirable to factor out this logic (running the PostCall callbacks and then the NewAllocator callbacks) into a helper method on ExprEngine then call the helper from both VisitCXXNewAllocatorCall() and processCallExit()?

This revision is now accepted and ready to land.Jan 8 2018, 6:10 PM

Do you have a plan for the new false negatives when c++-allocator-inlining is on? Should the user mark allocation functions with attributes?

Also, I was wondering if it would make sense to only have the PostCall callback return the casted memory region in these specific cases instead of introducing a new callback.

Otherwise looks good.

lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
143

Is it possible to have a NonLoc Target? If no, would it be better to make it Loc type?

NoQ added a comment.Jan 9 2018, 8:59 AM

Also, I was wondering if it would make sense to only have the PostCall callback return the casted memory region in these specific cases instead of introducing a new callback.

Yep, i actually think it's a very good idea. Like, it was obvious to me that it's a bad idea because PostCall and CallEvent should describe the function call just like before - in particular, the moment they capture and the return value should correspond to the actual value returned from the function. And it's kinda obvious by now that there are two different values, both of which may be of interest to the user, and certain amount of modeling (namely, pointer cast) occurs between these two moments.

But i guess we could also squeeze both values into CallEvent. Like, make a new CallEvent sub-class called, say, CXXNewAllocatorCall, which would additionally provide a method, say, .getAllocatedObject(), which would re-evaluate the cast and return the casted object. The whole reusable casting procedure would in this case live in SValBuilder and be accessed from both ExprEngine and CallEvent (in the latter case through .getState()->getStateManager()).

Performance-wise, switching to the CallEvent approach would trade an additional exploded node or two for a potentially unlimited number of re-evaluations of the cast (in every checker) - both seem lightweight enough for now to never bother. Storing the casted value in CallEvent would require making the CallEvent object more bulky, which doesn't seem justified. Futureproofness-wise, we may potentially re-introduce the callback when we actually require two program points, or if our cast procedure becomes slow and we'd want to re-use the result, but both is unlikely.

NoQ added a comment.Jan 9 2018, 1:57 PM

which would re-evaluate the cast and return the casted object

Rather not. Because i'm changing my mind again about avoiding the redundant cast in &element{T, HeapSymRegion{conj<T *>}} - this time by not calling evalCast after a conservatively evaluated operator new call (this would ensure that no existing behavior breaks in the conservative case without changing how all casts everywhere work), and this wouldn't be compatible with this CallEvent-based approach because in the CallEvent we have no way of figuring out if the call was inlined or evaluated conservatively. We could still move the cast logic before the PostCall callback, and then retrieve the casted value from the program state (wherever it is). But this is an example of stuff that becomes harder when we merge the callbacks together, and it took me a few hours to stumble upon that, so who knows what other problems would we encounter, so i feel that future-proof-ness is worth it.

NoQ added a comment.Jan 9 2018, 2:25 PM

Like, make a new CallEvent sub-class called, say, CXXNewAllocatorCall

Oh, we already have it (CXXAllocatorCall).

NoQ added a comment.Jan 9 2018, 2:34 PM

But still, i guess, it is also easier for checker authors to discover a checker callback (it's right there in the CheckerDocumentation, which is short enough to read fully) than to discover a CallEvent sub-class (which was so hard that i never discovered it until like now) and then figure out that it would have the method they need (when they don't even know that they need it).

NoQ updated this revision to Diff 129214.Jan 9 2018, 7:39 PM
NoQ marked 6 inline comments as done.

Address review comments. Stick to the callback approach for now.

include/clang/StaticAnalyzer/Core/CheckerManager.h
311

That's copied from other methods (eg. runCheckersForPostCall), should i clean them up as well some day?

lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
143

I guess it can be UnknownVal as well, or even UndefinedVal.

lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
354

I think it's more readable inline. It's a small piece of code with subtle differences between copies (eg. here we have one predecessor node, there we have a set of nodes, then we need to pass the WasInlined flag), and also in VisitCXXNewAllocatorCall() all other checker callbacks are written out directly, so it doesn't make much sense to put these two into a sub-function while leaving three other callbacks in place - rather hurts readability. This code is kind of idiomatic and repeated many times in ExprEngine - maybe we should clean up the whole stuff, but i don't think cleaning up just this place is the right thing to do.

NoQ added a comment.Jan 9 2018, 8:00 PM

Do you have a plan for the new false negatives when c++-allocator-inlining is on? Should the user mark allocation functions with attributes?

Not immediately - the immediate plan is to simply believe that we'd either see something reasonable inside the call (eg. malloc() or a concrete array), or we'd rather not even try to understand what's going on and how to properly release memory in this case (avoid false positives, which is also good). I guess we shall see if it makes sense to track the allocated value anyway, even in the inlined case, and in this case we'd need to work around tracking stuff twice (eg. operator new(size) { return malloc(size); } would need to be tracked as both new and malloc for the purposes of mismatched deallocator check) or other unexpected issues (because MallocChecker has strong opinions all over the place on how allocated values normally look like). We don't have any annotations in MallocChecker yet, and i didn't think about adding support for them or even see if there are existing useful annotations around.

NoQ updated this revision to Diff 129369.Jan 10 2018, 4:29 PM

Rebase (getter rename).

This revision was automatically updated to reflect the committed changes.