This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Iterator Checkers - Use the region of the topmost base class for iterators stored in a region
ClosedPublic

Authored by baloghadamsoftware on Nov 13 2018, 12:11 AM.

Details

Summary

If an iterator is represented by a derived C++ class but its comparison operator is for its base the iterator checkers cannot recognize the iterators compared. This results in false positives in very straightforward cases (range error when dereferencing an iterator after disclosing that it is equal to the past-end iterator).

To overcome this problem we always use the region of the topmost base class for iterators stored in a region. A new method called getMostDerivedObjectRegion() was added to the MemRegion class to get this region.

Diff Detail

Repository
rL LLVM

Event Timeline

I marked this patch as WIP because I could not create a test-case for it. However in real projects this patch seems to reduce false positives significantly.

The member function getBaseRegion() of MemRegion does not work here because it recursively retrieves the base region of multiple kinds of SubRegion.

Hmmm, shouldn't we add this to MemRegion's interface instead?

NoQ added a comment.Nov 13 2018, 5:09 PM

I marked this patch as WIP because I could not create a test-case for it. However in real projects this patch seems to reduce false positives significantly.

False positives are hard to reduce via delta-debugging because they can be accidentally reduced into true positives, because the distinction between true positives and false positives cannot be tested automatically. But once you fix the false positive, creduce can be used to reduce them by writing down the condition as "the positive is there on the original clang but not on the modified clang". Strongly recommended :)

Hmmm, shouldn't we add this to MemRegion's interface instead?

I wouldn't insist, but this does indeed sound useful. I suggest MemRegion::getMostDerivedObjectRegion() or something like that.

The member function getBaseRegion() of MemRegion does not work here because it recursively retrieves the base region of multiple kinds of SubRegion.

Also, ugh, that nomenclature: the base region of CXXBaseObjectRegion in fact represents the derived object.

baloghadamsoftware retitled this revision from [Analyzer] [WIP] Iterator Checkers - Use the base region of C++ Base Object Regions (recursively) for iterators stored in a region to [Analyzer] Iterator Checkers - Use the region of the topmost base class for iterators stored in a region.
baloghadamsoftware edited the summary of this revision. (Show Details)
In D54466#1297887, @NoQ wrote:

I marked this patch as WIP because I could not create a test-case for it. However in real projects this patch seems to reduce false positives significantly.

False positives are hard to reduce via delta-debugging because they can be accidentally reduced into true positives, because the distinction between true positives and false positives cannot be tested automatically. But once you fix the false positive, creduce can be used to reduce them by writing down the condition as "the positive is there on the original clang but not on the modified clang". Strongly recommended :)

creduce did not work (continous clang_delta crash) but somehow I managed to create tests.

Hmmm, shouldn't we add this to MemRegion's interface instead?

This:

I wouldn't insist, but this does indeed sound useful. I suggest MemRegion::getMostDerivedObjectRegion() or something like that.

vs

Also, ugh, that nomenclature: the base region of CXXBaseObjectRegion in fact represents the derived object.

So if CXXBaseObjectRegion is the derived object, then MemRegion::getMostDerivedObjectRegion() gets the least derived object region now?

In D54466#1297887, @NoQ wrote:

Hmmm, shouldn't we add this to MemRegion's interface instead?

This:

I wouldn't insist, but this does indeed sound useful. I suggest MemRegion::getMostDerivedObjectRegion() or something like that.

vs

Also, ugh, that nomenclature: the base region of CXXBaseObjectRegion in fact represents the derived object.

So if CXXBaseObjectRegion is the derived object, then MemRegion::getMostDerivedObjectRegion() gets the least derived object region now?

Hmm, are there any particular reasons against renaming it to CXXDerivedRegion?

There is [[ https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CXXDerivedObjectRegion.html | CXXDerivedObjectRegion ]] as well. I am totally confused about the terminology now. Is there somewhere a documentation that explains all these things? If I make a class hierarchy, then the region of derived objects are CXXBaseObjectRegion. I did not meet CXXDerivedObjectRegion yet.

More standard-like tests.

NoQ added a comment.Nov 30 2018, 4:22 PM

Huh, gotcha :) The nomenclature is mostly correct, just feels weird.

Super-region of a region is a larger region. Sub-region is a smaller region. Base region is the largest non-memspace superregion, so it's a larger region.

Derived class object is a large object. Base class object is a small object within the derived class object, so it should be represented by a smaller region.

Hence we have CXXBaseObjectRegion which is a small sub-region that corresponds to the object of the base class within the object of a derived class. Which means that yes, you obtain the base region by removing CXXBaseObjectRegion layers (i.e., making the region larger), and that is fine because "base region" in our terminology means a completely different thing.

Now, right, there's the CXXDerivedObjectRegion that i added recently. It represents the region of a derived object if its super-object isn't a CXXBaseObjectRegion (otherwise we can just unwrap the CXXBaseObjectRegion). So it kinda sounds like a sub-region that is larger than its superregion, which contradicts the tradition. But this is fine because if the program under analysis is valid (which we assume unless we can prove the opposite), the only valid use case for CXXDerivedObjectRegion is to place it on top of a SymbolicRegion which points to an object of unknown size. So we kinda don't really know what's larger here. Also, that's not the only example of the situation when a sub-region is not a sub-segment of its super-region. We can always add an ElementRegion with an arbitrary offset over almost any other region and it'll be equally wonky (i.e., a valid program shouldn't make us do that, but if it does, then, well, it's not surprising that weird things are happening).

include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
121 ↗(On Diff #175125)

Let's start writing doxygen thingies!

lib/StaticAnalyzer/Core/MemRegion.cpp
1182–1188 ↗(On Diff #175125)

My attempt:

while (const auto *BR = dyn_cast<CXXBaseObjectRegion>(R))
  R = BR->getSuperRegion();

Now I think I understand the terminology and the concept so I could add Doxygen comment. I also refactored the code as you suggested, original code was based on getBaseRegion().

NoQ accepted this revision.Dec 3 2018, 6:39 PM

Looks great, thanks!

This revision is now accepted and ready to land.Dec 3 2018, 6:39 PM
This revision was automatically updated to reflect the committed changes.