This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] pr18953: Zeroing constructors, store binding extents, ASTRecordLayouts.
ClosedPublic

Authored by NoQ on May 2 2018, 3:37 PM.

Details

Summary

As the old assertion says,

"Thou shalt not initialize the same memory twice."

Namely, the State->bindDefault()/StoreManager.BindDefault() API has the purpose of overriding the initial value with which memory is initialized. For instance, a checker can express that loading from freshly malloc()ed memory would yield Undefined, while loading from freshly calloc()ed memory would yield zero, even though loading from a normal symbolic region would yield a region-value symbol. BindDefault never was meant for binding arbitrary default values to arbitrary locations (though it was never doxygen'ed anywhere). For that reason it has the aforementioned assertion.

The assertion is actually a bit more relaxed than that; it simply says that the new default binding should never overwrite an existing binding key, which means that not only the base region but also the offset should coincide in order to trigger the assertion.

Other sorts of default store bindings are created by different APIs. Namely, there's also the invalidation case, which consists in default-binding a conjured symbol, and it is performed by State->invalidateRegions()/StoreManager.invalidateRegions() which never calls BindDefault but instead manipulates bindings directly.

Now, when it comes to C++ zeroing constructors (i.e. non-user-supplied constructors that boil down to zero initialization of the object), a lot of weird things can happen, especially when such constructors are applied to fields or base objects, especially with multiple inheritance, especially with default inheritance from empty objects (objects with no fields, eg. a-la java interfaces). The same base object would undergo multiple initializations for every zeroing constructor of a field or a base. And due to empty fields or bases, it may happen that such bindings will have the exact same offset. It may also happen that the new default binding would conflict an old direct binding, as i demonstrated on one of the newly added tests.

It seems that C++ zero-initialization is quite different from normal BindDefault's idea of setting the initial/default value. Therefore i propose to split up the BindDefault API into two parts: BindDefaultInitial() for setting initial/default values and BindDefaultZero() for performing C++-like zero initialization specifically. This would leave us with 3 different APIs for manipulating default bindings, but i guess it's not that bad because these different APIs are quite good at expressing the coder's intent while hiding internals of the RegionStore, and their implementations are significantly different.


Now, recall that there was a previous attempt to relax that assertion in rC190530, for the same reason. The separation of APIs proposed here would allow us to restore the historical assertion in its original strength, while accepting that the assertion doesn't cover C++ zero-initialization. I'm not seeing a good way to cover the zero-initialization case with that, and i'm not convinced that it's necessary.

The interesting part here is, however, the special-case behavior that was chosen on the path on which the assertion did not hold. It was pointed out by Jordan that if a zeroing constructor follows a conservatively evaluated empty-base constructor, it is more correct to preserve the default conjured symbol produced by the empty-base constructor than to replace it with a zero default binding that would completely overwrite the conjured symbol, since we don't model store binding extents. This is more correct because zero-initializer should only cover its respective sub-object, but other parts of the big object must remain invalidated.

In the newly added tests i point out that keeping the original conjured symbol in that case would also not be fully safe. The actually-safe behavior would be to invalidate the object again, because it's no longer the same unknown value that we previously had. So i'm adding a bunch of tests in which we take various field values in different weird moments of time and see if we aren't accidentally assuming that they're the same known or unknown values as the ones taken in different moments of time.

These tests now also demonstrate another bug that would also make our store bindings incorrectly, namely by looking at ASTRecordLayout it is not always apparent what part of the object is going to be overwritten by making a binding to a specific sub-region. See the new architecture-specific tests in which field z misbehaves.

Long story short, RegionStore is modeling zero initializations imperfectly for at least two reasons.

We could work around that by invalidating object contents every time we detect something fishy. The only reasonable detection i can come up with would be to detect if we have a coinciding binding (i.e. what the assertion was previously protecting us from), eg.:

if (B.getDefaultBinding(R) || B.getDirectBinding(R))
  V = UnknownVal();

This detection is imperfect - there are tests that would still fail. In particular it won't fix the ASTRecordLayout problem. And i also suspect that it really destroys a lot more good data than bad data. So i'll be experimenting with that now. In the current version of the patch i remove the special case handling and simply bind the default value blindly even if i know that RegionStore can't handle it precisely.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ created this revision.May 2 2018, 3:37 PM
MTC added a subscriber: MTC.May 2 2018, 6:22 PM
This revision was not accepted when it landed; it landed in state Needs Review.May 4 2018, 3:01 PM
This revision was automatically updated to reflect the committed changes.
NoQ added a comment.May 4 2018, 3:02 PM

We could work around that by invalidating object contents every time we detect something fishy. The only reasonable detection i can come up with would be to detect if we have a coinciding binding (i.e. what the assertion was previously protecting us from), eg.:

if (B.getDefaultBinding(R) || B.getDirectBinding(R))
  V = UnknownVal();

This detection is imperfect - there are tests that would still fail. In particular it won't fix the ASTRecordLayout problem. And i also suspect that it really destroys a lot more good data than bad data. So i'll be experimenting with that now. In the current version of the patch i remove the special case handling and simply bind the default value blindly even if i know that RegionStore can't handle it precisely.

Committed as is for now. Didn't notice any difference between the three approaches on my C++ testing grounds.