Checking whether two regions are the same is a partially decidable problem: either we know for sure that they are the same or we cannot decide. A typical case for this are the symbolic regions based on conjured symbols. Two different conjured symbols are either the same or they are different. Since we cannot decide this and want to reduce false positives as much as possible we exclude these regions whenever checking whether two containers are the same at iterator mismatch check.
Details
- Reviewers
NoQ george.karpenkov - Commits
- rGd703305e404d: [Analyzer] Skip symbolic regions based on conjured symbols in comparison of the…
rL356049: [Analyzer] Skip symbolic regions based on conjured symbols in comparison of the…
rC356049: [Analyzer] Skip symbolic regions based on conjured symbols in comparison of the…
Diff Detail
Event Timeline
I wonder whether a method in MemRegion called isSameRegion or isSurelySameRegion would be better. I think it's likely that there are (or will be) checkers that would do similar things.
Maybe something like this?
bool MemRegion::isSurelySameRegion(const MemRegion *Other) const { // We can't reason about symbolic regions. if (/* this or Other is symbolic*/) return; return this == Other; }
Unfortunately the question whether two regions are the same is a partially decidable problem. We can never be sure whether two regions are "surely" the same. We can only disclose some highly unreliable region types such as symbolic regions based on conjured symbols to reduce false positives.
What makes you think that conjured symbols are special?
Doesn't the same problem appear in the following example?:
void foo(std::vector<int> &v1, std::vector<int> &v2) { v2.erase(v1.cbegin()); }
In this example if foo() is analyzed as a top level function, the respective symbols would be of SymbolRegionValue kind. It is also easy to come up with a test case that involves SymbolDerived.
I think that here the main difference is that if we analyze this function as top level, then we find a true positive: the regions for v1 and v2 may be the same but generally they are difference (hence the different parameters). However, we do not know anything about the sameness of the regions of different SymbolDeriveds, thus those findings may be false or true positives as well.
Aha, right, that's an interesting heuristic. I guess that the developer may also add a specific check (eg., if (&v1 == &v2) ..., but that's a separate story of aliasing and renaming, and i do admit that i don't see this sort of code being written sensibly.
Well, you can't really rely on my imagination, because i can still say the same about the SymbolConjured examples. I'm really curious how did this originally look, i.e. even if the user knows that a certain function always returns the same container, why would they call it twice? Was this happening in some sort of loop? Is there a more realistic test case that we can add?
Anyway, let's add a huge comment that explains why SymbolConjureds are special and commit. I mean, this definitely deserves a comment :)