This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add zip_longest iterators.
ClosedPublic

Authored by Meinersbur on Jun 19 2018, 10:39 PM.

Details

Summary

Like the already existing zip_shortest/zip_first iterators, zip_longest iterates over multiple iterators at once, but has as many iterations as the longest sequence.

This means some iterators may reach the end before others do. There are two variants: zip_longest_optional uses llvm::Optional's None value to mark a past-the-end value. zip_longest_default uses the type's default-initialized value for this purpose (which means one cannot distinguish the default value from being in the list and having reached the end). I included both variants in this patch, but we may use only one of the variants.

Neither variant is reverse-iteratable because the tuples iterated over would be different for different length sequences (IMHO for the same reason neither zip_shortest nor zip_first should be reverse-iteratable; one can still reverse the ranges individually if that's the expected behavior).

In contrast to zip_shortest/zip_first, zip_longest tuples contain rvalues instead of references. This is because llvm::Optional cannot contain reference types and the value-initialized default does not have a memory location a reference could point to.

The motivation for these iterators is to use C++ foreach to compare two lists of ordered attributes in D48100 (SemaOverload.cpp and ASTReaderDecl.cpp).

Idea by @hfinkel.

Diff Detail

Repository
rL LLVM

Event Timeline

Meinersbur created this revision.Jun 19 2018, 10:39 PM

Idle thought: Could/should this be composed from other primitives that could be created. Such as an infinitely long (or maybe bounded, not sure) range built from another range and a functor of some kind? (or such a thing specifically tailored for Optional-or-default-T situations) & then zip two of those together & bound that resulting infinitely long range (or bound the input ranges, either way).

Maybe that's overgeneralization, but just an idea to see how others feel about it.

Is the following what you are thinking about?

An zip_infinite begin-iterator, that never reaches the end (maybe with two flavors: zip_infinite_default and zip_infinite_optional) and return a None/value-initialized value for any iterator past the list. Then one can use that with an end iterator, such as end_any (-> zip_shortest) or end_all (-> zip_longest).

Unfortunately, in the zip_infinite_default-case, we cannot distinguish between the default value end the end of the list. We'd have to look into the iterator's list of iterators, which makes this implementation-specific again.

Alternatively, zip_infinite could return tuples of iterators, which end_any/end_all could compare to the end iterators. However, the current implementation of zip_shortest does not need to store the end-iterators for operator++ (because in contrast to the others, it does not need to check whether one of the sequences is past the the end), making the implementation less efficient than it needs to. (template specialization of shortest_iterator<zip_infinite<T>> maybe?)

include/llvm/ADT/STLExtras.h
757–785 ↗(On Diff #152021)

The zip_*_range classes (including zippy) might no be needed: iterator_range could be used as well.

Nah - I was thinking of something where zip wouldn't need any changes,
potentially. One possibility:

  • iterator adapter that takes a (potentially finite) range of T and

produces an infinite range of Optional<T> (with elements beyond the
original range being empty Optionals)

  • same as the above, except potentially finite range of T to an

infinite range of T where the elements beyond the original range are
default constructed

  • A range truncation helper that just does something like:

iterator_range trunc(R r, size_t t) { return {adl_begin(r),
std::next(adl_end(r), t)}; }

  • then you do something like: trunc(zip(inf_opt(R1), inf_opt(R2)),

max(size(R1), size(R2)))

This allows the use of these infinite/padded ranges in other situations
apart from zipping.

But maybe that's niche enough/not worth generalizing over pre-emptively.
I'm not sure. Just a thought.

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100. The current zip_shortest also does not require a value-initializable type and returns references to elements in the container, both of which would not be possible.

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100.

Yeah, it's a tradeoff there, to be sure. Not sure if anyone else has thoughts on the design tradeoffs here - would hope other folks might chime in.

The current zip_shortest also does not require a value-initializable type and returns references to elements in the container, both of which would not be possible.

I believe returning references to elements is necessary to implement the iterator concept, unfortunately. This is usually dealt with in cases where there's no underlying sequence, by having the iterator contain a member of the desired type and returning references to that (with the restriction that references and pointers to that element are invalidated when the iterator is incremented - not sure if that requires downgrading the iterator category because of this, I don't think so though (I don't think the categories enshrine the invalidation semantics)).

I believe this requirement comes from the need for "i->x" to be valid when "(*i).x" is valid, and operator-> returns a pointer, so it has to have some storage to back it.

ping. Any other opinions on this?

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100.

Yeah, it's a tradeoff there, to be sure. Not sure if anyone else has thoughts on the design tradeoffs here - would hope other folks might chime in.

I'd prefer to not have to get the size in advance. The iterator as proposed seems like a cleaner abstraction. @chandlerc, any thoughts on this?

The current zip_shortest also does not require a value-initializable type and returns references to elements in the container, both of which would not be possible.

I believe returning references to elements is necessary to implement the iterator concept, unfortunately. This is usually dealt with in cases where there's no underlying sequence, by having the iterator contain a member of the desired type and returning references to that (with the restriction that references and pointers to that element are invalidated when the iterator is incremented - not sure if that requires downgrading the iterator category because of this, I don't think so though (I don't think the categories enshrine the invalidation semantics)).

I believe this requirement comes from the need for "i->x" to be valid when "(*i).x" is valid, and operator-> returns a pointer, so it has to have some storage to back it.

Hrmm. Can we have an Optional of a reference?

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100.

Yeah, it's a tradeoff there, to be sure. Not sure if anyone else has thoughts on the design tradeoffs here - would hope other folks might chime in.

IMHO this violates the don't-pay-for-what-you-don't-use principle.

Hrmm. Can we have an Optional of a reference?

Currently, trying to use Optional<int&> results in a compile error (two of Optional's constructors fall together). Note sure how the implementation of Optional can be (whether it makes sense to) changed.

I /think/ this still doesn't meet the iterator requirements (that if "(*i).j" is valid, then "i->j" must be valid too - [input.iterators]p2, Table 95, "a->m is equivalent to (*a).m if a is dereferencable"). I believe the way standard iterators (like istream_iterator) implement this is by having an instance of the value type inside the iterator that has the current value in it, so operator-> can return a pointer to that. Would that be possible/practical?

I /think/ this still doesn't meet the iterator requirements (that if "(*i).j" is valid, then "i->j" must be valid too - [input.iterators]p2, Table 95, "a->m is equivalent to (*a).m if a is dereferencable"). I believe the way standard iterators (like istream_iterator) implement this is by having an instance of the value type inside the iterator that has the current value in it, so operator-> can return a pointer to that. Would that be possible/practical?

The type of *a is a std::tuple of the client-iterator's value types. Storing an internal scratch field inside the the zip-operator would allow operator-> to return a pointer. However, std::tuple has just two methods that could be called: swap and operator=. Both would be useless and operator-> should probably return a const std::tuple* to avoid messing with the internal field anyway. It still would not allow assigning to the tuple's elements.

Experimenting with llvm::zip and llvm::map_iterator shows that they don't meet the requirement either:

const SmallVector<unsigned, 6> pi{3, 1, 4, 1, 5, 9};
auto i = map_iterator(pi.begin(), [](unsigned v) -> std::pair<unsigned,char> {return (v == 3) ? std::make_pair(v,'a') : std::make_pair(v,'\0');} );
(*i).first;
i->first;

error: taking the address of a temporary object of type 'std::pair<unsigned int, char>' [-Waddress-of-temporary]
  PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      SmallVector<unsigned, 6> pi{3, 1, 4, 1, 5, 9};
      auto i = zip(pi,pi).begin();
      std::tuple<const unsigned&,const unsigned&> x{pi[0],pi[1]};
      (*i).operator=(x);
      i->operator=(x);

llvm/ADT/iterator.h:170:34: error: taking the address of a temporary object of type 'llvm::detail::zip_common<llvm::detail::zip_shortest<unsigned int *, unsigned int *>, unsigned int *, unsigned int *>::value_type' (aka 'std::tuple<unsigned int &, unsigned int &>') [-Waddress-of-temporary]
  PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To answer the question. Yes, it it possible to add a scratch std::tuple to zip_longest iterators, but there is no practical gain.

dblaikie accepted this revision.Nov 30 2018, 12:54 PM

I /think/ this still doesn't meet the iterator requirements (that if "(*i).j" is valid, then "i->j" must be valid too - [input.iterators]p2, Table 95, "a->m is equivalent to (*a).m if a is dereferencable"). I believe the way standard iterators (like istream_iterator) implement this is by having an instance of the value type inside the iterator that has the current value in it, so operator-> can return a pointer to that. Would that be possible/practical?

The type of *a is a std::tuple of the client-iterator's value types. Storing an internal scratch field inside the the zip-operator would allow operator-> to return a pointer. However, std::tuple has just two methods that could be called: swap and operator=. Both would be useless and operator-> should probably return a const std::tuple* to avoid messing with the internal field anyway. It still would not allow assigning to the tuple's elements.

Experimenting with llvm::zip and llvm::map_iterator shows that they don't meet the requirement either:

const SmallVector<unsigned, 6> pi{3, 1, 4, 1, 5, 9};
auto i = map_iterator(pi.begin(), [](unsigned v) -> std::pair<unsigned,char> {return (v == 3) ? std::make_pair(v,'a') : std::make_pair(v,'\0');} );
(*i).first;
i->first;

error: taking the address of a temporary object of type 'std::pair<unsigned int, char>' [-Waddress-of-temporary]
  PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      SmallVector<unsigned, 6> pi{3, 1, 4, 1, 5, 9};
      auto i = zip(pi,pi).begin();
      std::tuple<const unsigned&,const unsigned&> x{pi[0],pi[1]};
      (*i).operator=(x);
      i->operator=(x);

llvm/ADT/iterator.h:170:34: error: taking the address of a temporary object of type 'llvm::detail::zip_common<llvm::detail::zip_shortest<unsigned int *, unsigned int *>, unsigned int *, unsigned int *>::value_type' (aka 'std::tuple<unsigned int &, unsigned int &>') [-Waddress-of-temporary]
  PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To answer the question. Yes, it it possible to add a scratch std::tuple to zip_longest iterators, but there is no practical gain.

Fair enough - thanks for the explanation (it should be const and there are no const member functions to use, so there's no "i->u" or "(*i).u" that's valid to support).

This revision is now accepted and ready to land.Nov 30 2018, 12:54 PM
  • Rebase to trunk
  • Fix msvc build

Thanks for accepting. I expected a discussion on whether zip_longest_default or zip_longest_optional is better (or make llvm::Optional support references and use zip_longest_optional with llvm::Optional<T&>). Should I commit both, as-is?

Thanks for accepting. I expected a discussion on whether zip_longest_default or zip_longest_optional is better (or make llvm::Optional support references and use zip_longest_optional with llvm::Optional<T&>). Should I commit both, as-is?

Ah, sure - I'd probably prefer Optional over default (because the Optional makes it strictly more descriptive/non-lossy) & just call it zip_longest. We could have another iterator adapter that collapses optionals to default constructed instances in tuples, for instance... though I guess that'd be a bit single-use anyway.

unittests/ADT/IteratorTest.cpp
464–479 ↗(On Diff #176195)

This testing is a bit convoluted - the bit testing, etc. Could it be done in a more direct manner, perhaps? (either stamping out/loop unrolling each value & testing the values directly against known constants (rather than against each other) or with a separate container containing the expected resulting pairs - and comparing those in a loop?)

(similarly below)

Meinersbur updated this revision to Diff 176486.Dec 3 2018, 2:37 PM
Meinersbur marked an inline comment as done.
  • Rebase
  • Remove zip_longest_default, rename zip_longest_optional to zip_longest
  • Use adl_begin/adl_end instead of std::begin/std::end
  • Deconvolute zipped_and_filtered test
Meinersbur marked an inline comment as done.Dec 3 2018, 2:39 PM
Meinersbur added inline comments.
unittests/ADT/IteratorTest.cpp
464–479 ↗(On Diff #176195)

This was the same patter as ZipIteratorTest above. Changed to use an explicit expected array.

dblaikie added inline comments.Dec 3 2018, 2:46 PM
unittests/ADT/IteratorTest.cpp
331 ↗(On Diff #176486)

I'd probably do this test the way you wrote the other test - two vectors of different lengths of interesting things, then a vector of pairs with the hardcoded pairs of optionals in it - then loop and compare each one? Rather than the bit fiddling, counting, etc?

429 ↗(On Diff #176486)

I probably wouldn't include this test unless there's some really specific relationship between zip and filter - it's not practical to test all the combinations of iterators with each other, etc - so only the specific interface of each one should generally be tested, I think?

Meinersbur updated this revision to Diff 176492.Dec 3 2018, 3:10 PM
  • Remove ZipLongestFilter tests
  • Deconvolute ZipLongestBasic test
dblaikie accepted this revision.Dec 3 2018, 3:13 PM
dblaikie added inline comments.
unittests/ADT/IteratorTest.cpp
333–334 ↗(On Diff #176492)

Wonder if it's worth testing with two different (& incompatible) types - like const char* in the second vector, for instance?

Up to you.

Meinersbur updated this revision to Diff 176507.Dec 3 2018, 4:13 PM
Meinersbur marked 2 inline comments as done.
  • Use StringRef for one side of zip_longest test case
  • Use size_t to avoid signed/unsigned comparison compiler warning
Meinersbur marked 2 inline comments as done.Dec 3 2018, 4:15 PM
Meinersbur added inline comments.
unittests/ADT/IteratorTest.cpp
333–334 ↗(On Diff #176492)

Using StringRef which has an overloaded comparison operator. I get an ambiguity error if just using a literal string in the initializer_list of Optional<StringRef>, so there is an explicit conversion.

dblaikie added inline comments.Dec 3 2018, 4:27 PM
unittests/ADT/IteratorTest.cpp
333–334 ↗(On Diff #176492)

Ah, cool - you might be able to get away with {...} instead of StringRef(...) if you like, but yeah, either's fine - thanks!

Meinersbur updated this revision to Diff 176513.Dec 3 2018, 4:34 PM
Meinersbur marked an inline comment as done.
  • Use brace initialization for Optional<StringRef>
Meinersbur marked 2 inline comments as done.Dec 3 2018, 4:35 PM
Meinersbur added inline comments.
unittests/ADT/IteratorTest.cpp
333–334 ↗(On Diff #176492)

That worked. Thanks, looks nice like this.

Meinersbur updated this revision to Diff 176516.Dec 3 2018, 4:42 PM
Meinersbur marked an inline comment as done.
  • Rename ZipLongestOptionalValueType to ZipLongestValueType
  • Remove unused ZipLongestDefaultValueType
This revision was automatically updated to reflect the committed changes.