This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Track multiple raw pointer symbols in DanglingInternalBufferChecker
ClosedPublic

Authored by rnkovacs on Jul 8 2018, 10:06 AM.

Details

Summary

Previously, the checker only tracked one raw pointer symbol for each container object.
But member functions returning a pointer to the object's inner buffer may be called on the object several times.
These pointer symbols are now collected in a set inside the program state map and thus all of them is checked for use-after-free problems.

Diff Detail

Event Timeline

rnkovacs created this revision.Jul 8 2018, 10:06 AM
xazax.hun accepted this revision.Jul 8 2018, 10:21 AM

Overall looks good, some nits inline. Let's run it on some projects to exercise this change.

lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
27–42

I think we should use using now instead of typedef.

137–139

Using both contains and get will result in two lookups. Maybe it would be better to just use get and check if the result is nullptr?

138

Is it possible here the set to be empty? We just added a new element to it above. Maybe turn this into an assert or just omit this if it is impossible?

157

Same comment as above.

test/Analysis/dangling-internal-buffer.cpp
111

We started to use lowercase letters for variable names. Maybe this is not the best, since it is not following the LLVM coding guidelines. So I think it would be better to rename this Cond to start with a lowercase letter in this patch for consistency, and update the tests to follow the LLVM coding styleguide in a separate patch later.

This revision is now accepted and ready to land.Jul 8 2018, 10:21 AM
rnkovacs updated this revision to Diff 154519.Jul 8 2018, 10:57 AM
rnkovacs marked 5 inline comments as done.
rnkovacs edited the summary of this revision. (Show Details)

Addressed comments.

lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
138

I'm not sure whether add() can fail. I turned it into an assert now and will see if it ever fails.

Thanks! The changes look good, I forgot to mark one double lookup though in my previous review.

lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
121

Oh, there is also a double lookup here, sorry I did not spot it in the initial review.

rnkovacs updated this revision to Diff 154520.Jul 8 2018, 11:13 AM
rnkovacs marked an inline comment as done.
rnkovacs added inline comments.
lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
121

Thanks for noticing. I should have seen it in the first place :)

NoQ added a comment.Jul 8 2018, 12:25 PM

Much symbols!

lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
32–40

Please add a comment on how this template is useful. This trick is used by some checkers, but it's a veeeery unobvious trick. We should probably add a macro for that, i.e. REGISTER_FACTORY_WITH_PROGRAMSTATE or something like that.

113

Maybe turn the FIXME into a NOTE or something like that. It indeed seems fine, but let's still make sure the reader realizes that there's a subtle potential problem here.

115

I'd much rather unwrap the symbol here, i.e. if (SymbolRef Sym = RawPtr.getAsSymbol()). A lot of other cornercases may occur if the implementation is accidentally inlined (Undefined, concrete regions). Also, speculatively, .getAsSymbol(/* search for symbolic base = */ true) for the same reason*.

If we didn't care about inlined implementations, the Unknown check would have been redundant. So it should also be safe to straightforwardly ignore inlined implementations by consulting C.wasInlined, then the presence of the symbol can be asserted. But i'd rather speculatively care about inlined implementations as long as it seems easy.

__
* In fact your code relies on a very wonky implementation detail of our SVal hierarchy: namely, pointer-type return values of conservatively evaluated functions are always expressed as &SymRegion{conj_$N<pointer type>} and never as &element{SymRegion{conj_$N<pointer type>}, 0 S32b, pointee type}. Currently nobody knows the rules under which zero element regions are added in different parts of the analyzer, i.e. what is the "canonical" representation of the symbolic pointer, though i made a few attempts to speculate.

119

My favorite idiom for such cases, looks a bit cleaner:

const PtrSet *SetPtr = State->get<RawPtrMap>(ObjRegion)
PtrSet Set = SetPtr ? *SetPtr : F.getEmptySet()
122

This contains check is very unlikely to yield true because if the implementation is not inlined, the symbol is newly born. I'd much rather skip the check; it doesn't sound like it'd make things any faster.

Even better, let's assert that: assert(C.wasInlined || !Set.contains(Sym)). It'll probably help us catch some "reincarnated symbol" bugs in the core.

138

It totally can't. We're just adding an object to a set. What could possibly go wrong?

rnkovacs updated this revision to Diff 154556.Jul 9 2018, 2:33 AM
rnkovacs marked 5 inline comments as done.

Thanks very much for your review!

lib/StaticAnalyzer/Checkers/DanglingInternalBufferChecker.cpp
32–40

Should I do that then? Maybe in CheckerContext.h?

115

I wasn't aware of much of this. Thanks for the detailed explanation :)

NoQ accepted this revision.Jul 10 2018, 11:06 AM

Looks great, thanks!

This revision was automatically updated to reflect the committed changes.