This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add filter_iterator for filtering elements
ClosedPublic

Authored by timshen on Jul 28 2016, 6:41 PM.

Details

Summary

filter_iterator is an iterator adaptor that filters a given iterator by a given predicate.

Diff Detail

Repository
rL LLVM

Event Timeline

timshen updated this revision to Diff 66070.Jul 28 2016, 6:41 PM
timshen retitled this revision from to [ADT] Add filter_iterator for filtering elements.
timshen updated this object.
timshen added reviewers: dblaikie, chandlerc.
timshen added a subscriber: llvm-commits.
chandlerc edited edge metadata.Jul 28 2016, 7:23 PM

Cool, glad this is getting added!

include/llvm/ADT/STLExtras.h
239 ↗(On Diff #66070)

No need for '\brief'.

242–244 ↗(On Diff #66070)

I might rephrase this as "The predicate parameter should be a callable object that accepts the wrapped iterator's reference type and returns a bool. When incrementing or decrementing the iterator, it will call the predicate on each element and skip any where it returns false."

My goal is to highlight:

  • The predicate is a callable.
  • We will invoke it on every element, and invoke it while incrementing or decrementing as opposed to when dereferencing.
246–248 ↗(On Diff #66070)

Why not support operator--? It seems easy?

252 ↗(On Diff #66070)

Why not use (and add below) a make_filter_iterator? It seems like we will often want to avoid naming the type due to the predicate.

269–273 ↗(On Diff #66070)

Instead, of the std::conditional, use std::common_type:

typename std::common_type<std::forward_iterator_tag,
                          typename std::iterator_traits<
                              WrappedIteratorT>::iterator_category>::type

This will use the derived to base conversion to select the best tag type possible, and as an advantage will fail to compile with an output iterator.

If you add operator--, then std::forward_iterator_tag above becomes std::bidirectional_iterator_tag. And you will *really* want to use common_type because the conditional gets crazy.

274–277 ↗(On Diff #66070)

You don't need these four lines. The adaptor base will do this for you.

302–309 ↗(On Diff #66070)

This needs its own doxygen comment.

unittests/Support/IteratorTest.cpp
104 ↗(On Diff #66070)

Please test other forms of callables. std::function, function pointer at least.

Also test an input iterator and combining filter with some of our other adaptors like a filtered pointee iterator.

timshen updated this revision to Diff 66156.Jul 29 2016, 11:43 AM
timshen marked 2 inline comments as done.
timshen edited edge metadata.

Updated iterator_adaptor_base, iterator category, comments and tests.

timshen marked 5 inline comments as done.Jul 29 2016, 11:45 AM
timshen added inline comments.
include/llvm/ADT/STLExtras.h
274–277 ↗(On Diff #66070)

The adaptor will do PointerT = T* and ReferenceT = T& for me, but I want to forward WrappedIteratorT's pointer and reference type in filter_iterator (because the filtered elements have the same type as the original ones).

unittests/Support/IteratorTest.cpp
104 ↗(On Diff #66070)

During adding the tests I do find a iterator_adaptor_base constructor problem. Changed the constructor to take WrappedIteratorT only. All users seem to still work.

One high level question that I'd love others' thoughts on: should this be spelled "filter_iterator" and "make_filter_range" or "filtered_iterator" and "make_filtered_range". For a type, I personally prefer "filtered_iterator". For the factory function, I wonder if we want something more like "reverse", perhaps "filter"?

include/llvm/ADT/STLExtras.h
246–248 ↗(On Diff #66156)

I think I was very wrong to suggest this. IT's actually really hard to support being bi-directional, because now your iterator needs to retain *both* begin and end in addition to the current position so that it can stop backing up when necessary. The code below, if you back up from end and filter out begin, will walk completely past the start of the wrapped iterator. Oops.

It also forces you to copy the predicate into the begin and the end in order to be able to move the end iterator.

All of this complexity doesn't seem worth it when the user can always reverse the wrapped iterator rather than reversing this iterator. Go back to your original (simpler) design and make it a forward iterator. =D

252 ↗(On Diff #66156)

I think this should be PredicateT here (at least for consistency).

284–287 ↗(On Diff #66156)

I think you should make the constructor(s) for this iterator private and add some comments explaining that because even the begin iterator tracks the end, we need to construct these iterators as a range with begin and end anyways.

Then you can befriend the make_... function below.

Also, I think you should have two constructors, one that is exactly as your current, and one that accepts just one wrapped iterator, the end, and leaves the predicate uninitialized.

With that, you'll be able to avoid unnecessary copying of the predicate (see below) and we shouldn't ever be using the predicate on the end iterator based on my suggestion above.

285 ↗(On Diff #66156)

You should move all of these.

311 ↗(On Diff #66156)

No need for '\brief' here either. =]

313 ↗(On Diff #66156)

As above, this should be PredicateT.

316 ↗(On Diff #66156)

I think you want to accept the predicate by value and move it as we should be modeling "consume" semantics here. See above for avoiding the copy.

And a by-value parameter for an empty struct is more efficient than any kind of reference.

317 ↗(On Diff #66156)

"IterT" is pretty ambiguous in the context including WrappedIteratorT... Also, I'd create the alias in the template parameter list so you can use it in the return type:

template <typename WrappedIteratorT, typename PredicateT>
          typename FilterIteratorT =
              filter_iterator<WrappedIteratorT, PredicateT>>
274–277 ↗(On Diff #66070)

If we want to change this it should be changed in iterator_adaptor_base, not specialized here IMO.

And if you're going to fix that, it should be in a separate patch from the one adding the new iterator.

timshen updated this revision to Diff 66377.Aug 1 2016, 1:54 PM
timshen marked 7 inline comments as done.

Changed it back to forward iterator. Do not keep predicate and the second iterator in end() construction.

include/llvm/ADT/STLExtras.h
246–248 ↗(On Diff #66156)

...so that it can stop backing up when necessary.

I don't think so, since when a user calls operator--() and our wrapped iterator goes past the original begin, the user deserve the undefined behavior, right? It's like in general calling operator--() on begin().

It also forces you to copy the predicate into the begin and the end in order to be able to move the end iterator.

Great point. Saving one predicate copy sounds awesome. I'll change it back to a forward iterator.

274–277 ↗(On Diff #66156)

Changing iterator_adaptor_base is weird, because forwarding the WrappedIteratorT's typedefs only makes sense in filter_iterator. In filter_iterator, the wrapped iterator has the same element as the filter_iterator has, while in the general iterator_adaptor_base it's not true.

Notice that filter_iterator doesn't require typename T parameter as iterator_adaptor_base does.

317 ↗(On Diff #66156)

It's really hard to let a class friend declaration cope with default template argument. I will leave the local type alias alone, with a better name FilterIteratorT.

dblaikie added inline comments.Aug 1 2016, 3:53 PM
unittests/Support/IteratorTest.cpp
156 ↗(On Diff #66377)

rather than repeating '7' here (the length of the array) you can use llvm::array_lengthof, but probably better than that, just use std::end(a) (and std::begin(a) if you like symmetry). Same goes for other cases of the array length constants in this test file.

dblaikie added inline comments.Aug 1 2016, 3:58 PM
unittests/Support/IteratorTest.cpp
113 ↗(On Diff #66377)

Rather than adding something complex like std::function to the test - might be worth just having a simple stateful function object that exercises whatever specific behavior you're interested in.

137 ↗(On Diff #66377)

we don't usually put 'const' on value typed locals - I'd skip this unless there's a particular need/reason (similar feedback for all const locals)

156 ↗(On Diff #66377)

Probably use make_filter_range here and elsewhere in the tests.

timshen updated this revision to Diff 66398.Aug 1 2016, 4:04 PM

filter_iterator -> filtered_iterator
Use std::begin() and std::end() for arrays.

timshen marked an inline comment as done.Aug 1 2016, 4:05 PM
timshen updated this revision to Diff 66400.Aug 1 2016, 4:19 PM

std::function -> Callable object
s/const auto/auto/

unittests/Support/IteratorTest.cpp
156 ↗(On Diff #66398)

Ehhh... I already used them?

timshen marked 2 inline comments as done.Aug 4 2016, 4:03 PM

Friendly ping :)

anemet added a subscriber: anemet.Aug 6 2016, 4:21 PM
timshen updated this revision to Diff 67251.Aug 8 2016, 5:03 PM

Rebase onto D23217.

dblaikie edited edge metadata.Aug 12 2016, 11:08 AM

One high level question that I'd love others' thoughts on: should this be spelled "filter_iterator" and "make_filter_range" or "filtered_iterator" and "make_filtered_range". For a type, I personally prefer "filtered_iterator". For the factory function, I wonder if we want something more like "reverse", perhaps "filter"?

I'll vote for "filter" here - at least the version that takes an existing range (& perhaps that's the only version we should have - we already have utilities for making ranges out of a pair of iterators, so users with 'legacy' iterator APIs can use that to construct a range to pass to filter).

As for the name of the iterator - I'd be OK making that an unmentionable implementation detail. Most of the time its type is going to include a lambda type anyway, right - and thus be unmentionable.

include/llvm/ADT/STLExtras.h
305 ↗(On Diff #67251)

If you like you can just define the friend inline at the friend declaration:

friend void foo() {
  ...
}
unittests/Support/IteratorTest.cpp
161–163 ↗(On Diff #67251)

The distance tests probably aren't necessary if you're testing for exact equality anyway?

timshen updated this revision to Diff 67880.Aug 12 2016, 12:22 PM
timshen marked an inline comment as done.
timshen edited edge metadata.

Removed std::distance check. Changed name to filter_iterator and make_filter_range.

One high level question that I'd love others' thoughts on: should this be spelled "filter_iterator" and "make_filter_range" or "filtered_iterator" and "make_filtered_range". For a type, I personally prefer "filtered_iterator". For the factory function, I wonder if we want something more like "reverse", perhaps "filter"?

I'll vote for "filter" here - at least the version that takes an existing range (& perhaps that's the only version we should have - we already have utilities for making ranges out of a pair of iterators, so users with 'legacy' iterator APIs can use that to construct a range to pass to filter).

As for the name of the iterator - I'd be OK making that an unmentionable implementation detail. Most of the time its type is going to include a lambda type anyway, right - and thus be unmentionable.

I don't have strong opinion on this. Changed to filter_iterator and make_filter_range.

unittests/Support/IteratorTest.cpp
161–163 ↗(On Diff #67251)

Right. Removed.

dblaikie added inline comments.Aug 12 2016, 12:58 PM
include/llvm/ADT/STLExtras.h
311 ↗(On Diff #67880)

This seems to be the wrong API - the function should accept any range, not just an iterator_range. (ie: you should be able to pass a std::vector to this function directly (or a std::set, SmallVector, DenseMap, etc))

So it should probably take the range as totally generic type T, as an rvalue reference/perfect forwarding.

This then gets into the issue of temporaries - which came up recently in the "zip" thread. It'd be good to resolve this aspect of teh design for both these utilities, before they go in.

(or, if you're in a hurry, just put some big comment saying this will do bad things with temporaries+range-for, for now)

unittests/Support/IteratorTest.cpp
182 ↗(On Diff #67880)

Should just be able to pass "A" here, rather than "make_range(InputIterator(std::begin(A)), InputIterator(std::end(A)))"

Similarly in other test cases. I suppose the Composition test isn't able to do that (though the pointee iterator should have a wrapper utility that works as a range adapter) - but I'm not sure how useful/important that test is anyway. The iterators should be orthogonal, so testing combinations seems like it'd create an unbounded test matrix.

timshen added inline comments.Aug 12 2016, 1:26 PM
include/llvm/ADT/STLExtras.h
305 ↗(On Diff #67880)

As discussed, with an inline friend definition we still need a declaration outside of the class, so I keep the function as it is now.

timshen updated this revision to Diff 67912.Aug 12 2016, 2:28 PM

Changed make_filter_range to take an arbitrary range.

timshen marked an inline comment as done.Aug 12 2016, 2:30 PM
timshen added inline comments.
include/llvm/ADT/STLExtras.h
317 ↗(On Diff #67912)

Added a comment with "FIXME".

unittests/Support/IteratorTest.cpp
182 ↗(On Diff #67912)

For this specific one it needs to be make_range(InputIterator(std::begin(A)), InputIterator(std::end(A))), because otherwise the iterator is random access iterator.

For others I have removed the unnecessary make_range calls.

dblaikie accepted this revision.Aug 12 2016, 2:36 PM
dblaikie edited edge metadata.
dblaikie added inline comments.
include/llvm/ADT/STLExtras.h
255 ↗(On Diff #67912)

Might be worth renaming one of the 'A's to make this a bit easier to read.

321 ↗(On Diff #67912)

I'd suggest renaming this to simply 'filter'. But if you'd rather defer that to some broader discussion, that's OK too.

This revision is now accepted and ready to land.Aug 12 2016, 2:36 PM
timshen updated this revision to Diff 67916.Aug 12 2016, 2:40 PM
timshen marked an inline comment as done.
timshen edited edge metadata.

Change the example comment to be more readable.

timshen added inline comments.Aug 12 2016, 2:42 PM
include/llvm/ADT/STLExtras.h
321 ↗(On Diff #67916)

Yeah we can talk about that later.

One quick argument of not naming it to "filter" is that, it's so general that people may think of other possibilities by looking at the name, e.g. in-place filtering of a container or something.

This revision was automatically updated to reflect the committed changes.