This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] MoveChecker Pt.5: Improve invalidation policies.
ClosedPublic

Authored by NoQ on Dec 4 2018, 12:12 PM.

Details

Summary

If a moved-from object is passed into a conservatively evaluated function by pointer or by reference, we assume that the function may reset its state.

Make sure it doesn't apply to const pointers and const references. Add a test that demonstrates that it does apply to rvalue references.

Additionally, make sure that the object is invalidated when its contents change for reasons other than invalidation caused by evaluating a call conservatively. In particular, when the object's fields are manipulated directly, we should assume that some sort of reset may be happening.

Diff Detail

Event Timeline

NoQ created this revision.Dec 4 2018, 12:12 PM
NoQ added inline comments.Dec 4 2018, 12:14 PM
lib/StaticAnalyzer/Checkers/MoveChecker.cpp
542

This is clumsy. I think we shouldn't include non-invalidated regions in the ExplicitRegions array in the first place.

Hi Artem,
The change looks fine, just some nits.

lib/StaticAnalyzer/Checkers/MoveChecker.cpp
534

It looks like Call is already checked for null, and we can use just dyn_cast.

542

Just a reminder: we have llvm::find and a bunch of nice related range wrappers.

Cool!

lib/StaticAnalyzer/Checkers/MoveChecker.cpp
526–527

This isn't specific to this revision, but I find the parameter name Regions way too vague. Maybe ImplicitRegions?

537–539

Really? Then I guess this needs to be updated in CheckerDocumentation.cpp:

/// \param ExplicitRegions The regions explicitly requested for invalidation.
///        For a function call, this would be the arguments. For a bind, this
///        would be the region being bound to.

To me, this clearly indicates that the elements of ExplicitRegions will be invalidated. Does "requested for" really just mean "requested for potential"? Since this happens before any invalidation, it could easily be interpreted as "invalidated after the end of the callback".

NoQ added inline comments.Dec 5 2018, 12:45 PM
lib/StaticAnalyzer/Checkers/MoveChecker.cpp
537–539

The callback fires after invalidation - cf. ProgramState::invalidateRegionsImpl. Note that the if (Eng) condition is always true, because how the heck were we supposed to run path-sensitive analysis without ExprEngine.

The terminology is really weird here because we the word "invalidation" has too many meanings. Essentially, ExplicitRegions are regions that were specifically requested for invalidation, but it is up to the CallEvent / ProgramState / StoreManager (depending on which invalidateRegions() was called) to decide what invalidation means in this case by filling in the RegionAndSymbolInvalidationTraits map.

For regions that represent values of const pointers/references directly passed into the function, CallEvent decides to set the TK_PreserveContents trait, which says "invalidate the region but keep its contents intact". It would still perform other forms of invalidation on that region, say, pointer escape: if there are pointer values written down somewhere within that region, checkers need to stop tracking them.

Now, the callback never said that it has something to do with "invalidation". Instead, it is about "region changes", which means changes of the region's contents in the Store. This doesn't necessarily happen due to invalidation; it may also happen while evaluating a simple assignment in the program, as we see in the newly added testUpdateField() test. And, as we've seen above, not every "invalidation" changes contents of the region.

Similarly, TK_SuppressEscape would suppress the checkPointerEscape() callback for the region, but will not suppress checkRegionChanges() for that (non-explicit) region.

Therefore we end up in a weird situation when some regions were requested to be invalidated but never were actually invalidated in terms of the Store. It kinda makes sense, but i hate it anyway. The checkRegionChanges() callback is hard to understand and hard to use and too general and has too much stuff stuffed into it. checkPointerEscape is an easier-to-use fork of it, but it doesn't cover the use case of checkers that track objects via regions. I suspect that the right solution here is to start tracking objects as "symbols", i.e., assign a unique opaque ID to every state through which every object in the program goes (regardless of which object it is) and use that as keys in the map. That should remove the stress of dealing with invalidation from C++ checkers that need to track objects opaquely. The problem won't magically disappear, as we still need to identify when exactly does the state change, but an additional level of indirection (Region -> ID -> CheckerState instead of just Region -> CheckerState) allows us to decompose it into smaller parts and de-duplicate some of this work.

But regardless of that, it is still weird that ExplicitRegions is not a sub-set of Regions. We need to either document it or fix it, and for some reason i prefer the latter. In particular, the only checker that actually actively acts upon ExplicitRegions when they're const is RetainCountChecker, but in fact people don't ever use const in stuff that it checks, it just isn't idiomatic.

NoQ updated this revision to Diff 176877.Dec 5 2018, 1:59 PM
NoQ marked 5 inline comments as done.

Address comments :)

lib/StaticAnalyzer/Checkers/MoveChecker.cpp
526–527

How about these?

542

Thx!

NoQ marked an inline comment as done.Dec 5 2018, 2:02 PM
NoQ added inline comments.
lib/StaticAnalyzer/Checkers/MoveChecker.cpp
537–539

Another funny thing about RegionAndSymbolInvalidationTraits is that it races when the same region is passed both via const pointer and a non-const pointer. The region will be invalidated or not depending on which one goes in first. We really need to fix it.

Szelethus marked an inline comment as done.Dec 6 2018, 8:53 AM
Szelethus added inline comments.
lib/StaticAnalyzer/Checkers/MoveChecker.cpp
56

This should be RequestedRegions too then :)

526–527

Awesome, thanks! :)

537–539

Oh okay! Thanks for the detailed explanation! I share your concern with this being a little messy.

We need to either document it or fix it, and for some reason i prefer the latter.

Me too.

NoQ updated this revision to Diff 177047.Dec 6 2018, 2:14 PM

Fxd :)

NoQ marked an inline comment as done.Dec 6 2018, 2:14 PM
This revision is now accepted and ready to land.Dec 6 2018, 2:30 PM
Szelethus accepted this revision.Dec 6 2018, 2:39 PM
This revision was automatically updated to reflect the committed changes.