This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Add new visitor to the iterator checkers
AbandonedPublic

Authored by baloghadamsoftware on May 28 2019, 7:10 AM.

Details

Reviewers
NoQ
Szelethus
Summary

Iterator bugs such as dereference of out-of-range or any access of invalidated iterators are difficult to understand. A BugReporterVisitor that marks the place where the iterator reached to past-the-end or the first position, became invalidated and marks the execution path with a message about the emptiness of the container helps much in understanding the actual bug. It also helps deciding whether the finding is a true or a false positive. This patch adds such a visitor to the iterator checkers.

Diff Detail

Event Timeline

Before someone asks: NoteTags are not applicable here since invalidation and reaching one end of the range happens in many different places. This is also true for container emptiness.

NoQ added a comment.May 28 2019, 12:48 PM

Before someone asks: NoteTags are not applicable here since invalidation and reaching one end of the range happens in many different places. This is also true for container emptiness.

Mm, what's wrong with many different places? If you're worried about code duplication, just put tag construction into a function (?)

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
281–282

I don't think we should stop here. I think it's worth it to highlight *all* increments and decrements of the interesting iterator, so that it was clear how come that it has the given value.

Additionally, because iterators are copied around, it makes sense to "recursively" apply the visitor to the original object when the iterator is obtained as a copy (similarly to how trackExpressionValue works).

In D62525#1519868, @NoQ wrote:

Before someone asks: NoteTags are not applicable here since invalidation and reaching one end of the range happens in many different places. This is also true for container emptiness.

Mm, what's wrong with many different places? If you're worried about code duplication, just put tag construction into a function (?)

It is not about code duplication. Of course, that can be handled by a function.

The problem is that the transition is added in the top level function but iterator position adjustments happen in the lower level ones. Data flow is one-way, top-down. For example, an insertion happens into a vector that invalidates all the iterators at and after the position where it is inserted. In this case handleInsert() calls a function that iterates all the active iterator positions and calls other one that invalidates the ones affected by the insertion. To propagate back the list of iterators invalidated would be too complex. Increments and decrements are even worse. Thus here the correct way seems to find the relevant transitions in a visitor instead of trying to figure out what note tags to add at each transition.

baloghadamsoftware marked an inline comment as done.May 30 2019, 7:59 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
281–282

This can be done but an iterator can reach the past-the-end or the first position by changing the container's size as well, not only by incrementing or decrementing the iterator itself.

Visitors now track increments and decrements.

Szelethus added inline comments.Jul 15 2019, 1:29 PM
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1091

/*PastTheEnd=*/ false

1274–1282

I'm confused, are these changes related to this patch? It doesn't seem to be.

baloghadamsoftware marked an inline comment as done.Jul 18 2019, 2:16 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1274–1282

Yes, they are. Since we want to track the iterator positions upwards along the bugpath as far a possible I had to reverse one of my decisions. When I first decided it was really like a coin thrown up, but now it turned out I took the wrong choice considering the visitor. The change is that upon move assignment of a container the iterators are transferred to the new container, except the past-the-end iterator. However we also have iterator positions relative to the past-the-end iterator (thus using the same symbol) which must be transferred. So I had to separate them from the past-the-end positions by generating a new end symbol. I originally used this new symbol for the positions to be transferred and kept the old symbol for the past-the-end positions. However this makes tracking of the non past-the-end positions very difficult so I reversed it and now I transfer the old symbol and use the new for the past-the-end positions.

I don't see obvious red flags here, @NoQ?

In D62525#1519868, @NoQ wrote:

Before someone asks: NoteTags are not applicable here since invalidation and reaching one end of the range happens in many different places. This is also true for container emptiness.

Mm, what's wrong with many different places? If you're worried about code duplication, just put tag construction into a function (?)

It is not about code duplication. Of course, that can be handled by a function.

The problem is that the transition is added in the top level function but iterator position adjustments happen in the lower level ones. Data flow is one-way, top-down. For example, an insertion happens into a vector that invalidates all the iterators at and after the position where it is inserted. In this case handleInsert() calls a function that iterates all the active iterator positions and calls other one that invalidates the ones affected by the insertion. To propagate back the list of iterators invalidated would be too complex. Increments and decrements are even worse. Thus here the correct way seems to find the relevant transitions in a visitor instead of trying to figure out what note tags to add at each transition.

Mhm, I personally find this reasoning sufficient.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
281–282

Maybe in a followup patch, but I'd love to see a trackIterator function!

1274–1282

Sometimes I just need a reality check to realize how absurdly difficult the problem is that you're solving... Wow. I mentioned this elsewhere, but do we not need a debug checker to validate that we argue about all of these correctly?

1485–1486

For example, an insertion happens into a vector that invalidates all the iterators at and after the position where it is inserted.

Is this actually correct?

https://en.cppreference.com/w/cpp/container/deque

std::deque (double-ended queue) is an indexed sequence container that allows fast insertion and deletion at both its beginning and its end. In addition, insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements.

https://en.cppreference.com/w/cpp/container/vector

insert, emplace, resize: If the vector changed capacity, all of the iterators are invalidated. If not, only those after the insertion point.

baloghadamsoftware marked 2 inline comments as done.Jul 31 2019, 4:57 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
281–282

Hey, but it is done now, in the updated patch! See lines 1782-1803.

1485–1486

https://en.cppreference.com/w/cpp/container/deque/insert

All iterators, including the past-the-end iterator, are invalidated.

I think I speak here about insert(), not about push_back() and push_front().

Since we use a conservative approach we always assume that vectors and double-end queues do not change capacities.

NoQ added a comment.Jul 31 2019, 3:58 PM

The problem is that the transition is added in the top level function but iterator position adjustments happen in the lower level ones. Data flow is one-way, top-down. For example, an insertion happens into a vector that invalidates all the iterators at and after the position where it is inserted. In this case handleInsert() calls a function that iterates all the active iterator positions and calls other one that invalidates the ones affected by the insertion. To propagate back the list of iterators invalidated would be too complex. Increments and decrements are even worse. Thus here the correct way seems to find the relevant transitions in a visitor instead of trying to figure out what note tags to add at each transition.

Mhm, I personally find this reasoning sufficient.

Ok, so is it problematic to define a lambda that checks whether an interesting iterator is invalidated? Or is it difficult to return the note tag from many stack frames in order to finally squeeze it into the transition?

Szelethus added inline comments.Aug 3 2019, 11:21 AM
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1485–1486

I stand corrected!

In D62525#1609334, @NoQ wrote:

Ok, so is it problematic to define a lambda that checks whether an interesting iterator is invalidated? Or is it difficult to return the note tag from many stack frames in order to finally squeeze it into the transition?

Both. The lambda can iterator through all the iterator positions, but how could it know which one was just invalidated? Or incremented, decremented, reached the first position or the past-the end position? We do not keep history. Returning note tags and passing them through all the functions called, combining them is a nightmare. It makes the code totally unreadable and I am not even sure how to do it correctly. Note tags are great, but here a conventional visitor fits better.