This patch allows iterating typed enum via the ADT/Sequence utility.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
214–230 | I don't think it makes sense to force the caller to decide between seq and enum_seq spellings. If I were implementing this facility, I'd just give iota_range a second template type parameter: template<class V, class U> struct iota_range_iterator { using value_type = V; V operator*() const { return static_cast<V>(value_); } ~~~ private: U value_; }; and then here instantiate iota_range<EnumType, std::underlying_type_t<EnumType>>. template <class T, std::enable_if_t<std::is_integral<T>::value, int> = 0> auto seq(T Begin, T End) { return iota_range_iterator<T, T>(Begin, End); } template <class T, std::enable_if_t<std::is_enum<T>::value, int> = 0> auto seq(T Begin, T End) { using U = std::underlying_type_t<T>; return iota_range_iterator<T, U>(U(Begin), U(End)); } Notice you don't need std::move because these types are integral. | |
llvm/include/llvm/CodeGen/ValueTypes.h | ||
43 ↗ | (On Diff #350610) | Get rid of IterableEnum as I propose above, and then you can get rid of this new overload. Ditto throughout. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
214–230 | Thx for helping me taking a step back. Now conflating the two orthogonal aspect make the implementation a bit more complex but I think it's worth it. There is an additional design point that you may have missed. seq has the C++ iterator semantic, i.e. the end value is exclusive. I've fixed this by adding another method seq_inclusive but I'm not too happy about the naming. Suggestions welcome. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
214–230 |
Yuck. I had missed that. seq_inclusive seems like a decent solution to me. If you really wanted to be explicit everywhere, you could have a llvm::halfopen_seq and an llvm::closed_seq... but I assume that's a bad idea because the half-open version is used all over the place and we like the shortness of its name? for (int i : llvm::seq<int>(0, 10)) { /* loop from 0 to 9 */ } for (int i : llvm::seq<int>(1, 10).inclusive()) { /* loop from 1 to 10 */ } Watch out, though: the trivial implementation of .inclusive() would lead to integer overflow in llvm::seq<int>(INT_MIN, INT_MAX).inclusive() and so you'd have to think about what you wanted to do there: make it work, at the cost of complicated implementation? make it runtime-assert-fail? just ignore the potential overflow? find a simple implementation I'm not thinking of off the top of my head? ;) I definitely agree that we do not want seq<E>(x, y) to iterate over one more element than seq<underlying_type_t<E>>(x, y); the enum-ness of the type parameter should affect only the enum-ness of the value_type, definitely not the number of things iterated over. Btw, some enumeration types do have "one-past-the-end" values, as in enum E { Apple, Orange, Banana, MaxFruit = Banana+1 }; for (E fruit : llvm::halfopen_seq<E>(Apple, MaxFruit)) { /* is correct */ } and of course all integer types do have "closed" range forms as well: for (int i : llvm::closed_seq<int>(INT_MIN, INT_MAX)) { /* is correct */ } for (int i : llvm::seq<int>(INT_MIN, INT_MAX).inclusive()) { /* is correct */ } |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
214–230 |
I count about 150 call sites most of them are exclusive so yeah, I'd prefer to keep the most used version short.
I usually like fluent interfaces but it has to be omnipotent (setting a value). The way I implemented it would lead to bugs if client calls llvm::seq<int>(1, 10).inclusive().inclusive().
I'll go with runtime assert.
Yeah they ought to be orthogonal.
Let me add more tests to make sure these cases are handled. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
123–129 | llvm::detail::StorageType is too generic of a name to land-grab, IMHO. This can/should be an implementation detail of iota_range_iterator — maybe it doesn't even need to be a template parameter. template<class T> struct iota_range_iterator { using storage_type = typename std::conditional_t<std::is_enum<T>::value, type_identity<T>, std::underlying_type<T>>::type; Technically, instantiating underlying_type<T> for non-enum T was UB before C++20. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
199 | I've just noticed that this is unidiomatic: on normal iterators, --begin() would be UB. (std::reverse_iterator always stores one-past-where-it's-"pointing"-to, for this reason.) I guess it's OK here, but if you've got the time, maybe think about some way to avoid the hiccup here. | |
223 | Phab somehow ate my comment here. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
123–129 |
Unfortunately no, std::conditional must evaluated both types, and std::underlying_type<T> does not provide the type member when T is not an enum. The standard seems to specify this use case explicitly. I had to provide type resolution via template specialization. I tried many different solutions but I couldn't make it much better. I renamed StorageType to iota_range_iterator_storage_type so at least it's less susceptible to accidental hijacking.
type_identity doesn't help in that case. | |
223 |
Because End may be a scoped enum End + 1 is not always a valid statement. return seq(Begin, static_cast<T>(static_cast<detail::StorageType<T>::type>(End) + 1)); Which, IMHO, is redoing most of the work that has been moved to iota_range_iterator (hence the use of iterators to bump the values) |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
123–129 | I'd mixed up the arguments to conditional_t. Try this: |
- Use std::conditional instead of custom specialization
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
123–129 | oh... my apologies. I should have checked as well. I had convinced myself that the error was std::conditional_t needing to resolve both types ... So I had to provide a C++14 type_identity but it works. |
Please take another look @Quuxplusone .
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
199 | I've added a note as I couldn't find a better implementation. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
215–229 | I would say "from Begin to End, exclusive of End" — or more colloquially but clearer IMO, "from Begin up to but not including End." |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
223 | Yeah, I guess since End + 1 doesn't work for scoped enums, I'm willing to live with *++result.end(). Or... Maybe iota_range should just take three arguments? Does this work? explicit iota_range_iterator(T Value, bool plus1) : Value(static_cast<U>(Value) + plus1) {} explicit iota_range(T Begin, T End, bool inclusive) : Begin(Begin, false), End(End, inclusive) { assert(Begin <= End); } // seq return iota_range<T>(Begin, End, false); // seq_inclusive return iota_range<T>(Begin, End, true); | |
llvm/unittests/ADT/SequenceTest.cpp | ||
77–84 | I'd also test reverse(seq_inclusive(min, min)). | |
llvm/unittests/IR/ConstantRangeTest.cpp | ||
1554–1556 | You shouldn't need the explicit template argument here. |
- Address comments
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
223 | Begin and End are not iterators but values (BTW I renamed them to avoid confusion), this is needed because of rbegin()/rend() must be constructed from values. Now this can be handled by adding an additional member but then it gets really confusing when creating the reverse iterators: value_type Begin; value_type End; bool Inclusive; explicit iota_range(T Begin, T End, bool Inclusive) : Begin(Begin), End(End), Inclusive(Inclusive) { assert(Begin <= End); } auto begin() const { return const_iterator(Begin, false); } auto end() const { return const_iterator(End, Inclusive); } auto rbegin() const { return const_reverse_iterator(*--end(), false); } auto rend() const { return const_reverse_iterator(*--begin(), false); } // <- false here is confusing | |
llvm/unittests/ADT/SequenceTest.cpp | ||
77–84 | Unfortunately I do because + 1 and - 1 trigger integer promotion which makes both arguments of different types. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
186–230 | @Quuxplusone when using seq_inclusive with End == std::numeric_limits<T>::max() iteration works fine but iterator comparison is broken.
The current implementation is for 1. but I think 2. is more reasonable. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
192–201 | Hmm. I suggest { assert(Begin <= End); // This assertion forbids iota_range(0, UINT_MAX, true) // which would otherwise be treated as an empty range, // and iota_range(1, UINT_MAX, true) which would otherwise // report that s.begin() > s.end(). // iota_range(INT_MIN, INT_MAX, true) will have already caused // undefined overflow, but this assertion will likely catch it too. if (Inclusive) assert(End < PastEndValue); } FWIW, I like your renaming of Begin to BeginValue etc. | |
llvm/unittests/ADT/SequenceTest.cpp | ||
77–84 | Ah, I see. (FWIW, I'd bikeshed the names to minp1 and maxm1, but whatever.) |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
192–201 |
The assert didn't trigger in the case of iota_range(INT_MIN, INT_MAX, true) so I used numeric_limits. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
195–196 | I believe teeechnically this is still UB, because if the enumeration has no specified underlying type, and its enumerator values are e.g. enum E { A=1, B=2, C=3 }, then it's UB to evaluate E(4). (It basically works like a bit-field: you can evaluate E(mask) for any mask that doesn't use bits beyond the highest enumerator's bits.) But in practice I think this is fine. if (Inclusive) assert(End != T(std::numeric_limits<typename iterator::underlying_type>::max())); Although now you've got me confused again about why we're storing BeginValue and PastEndValue as value_type (i.e. E) instead of underlying_type. How does your size() method work when value_type is an enum type? I think data "at rest" should always be stored in the underlying integer type; the only time it should ever be converted to value_type (i.e. E) is when the iterator is being dereferenced. |
- Address comments
- Fix clang-tidy warning
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
195–196 |
Thx! So I'm now moving all the logic in the underlying type world.
Thx a lot for challenging me on this one. You are correct that we can use underlying_type instead of value_type directly into iota_range. |
I probably shouldn't be the only person to accept this PR, but it looks fine to me at this point.
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
28 | Nits: Please add a newline before struct; and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted. | |
199 | Note that BeginValue-1 might be UB; e.g. seq(INT_MIN, 0).rend(). But I think it's fine to ignore this corner case. | |
222 | "End is not allowed to be"... what? | |
llvm/unittests/ADT/SequenceTest.cpp | ||
54 | Nit: I don't see why you bother defining LAST when you could just say E below. |
- Trying a different approach
- rebase and use seq_inclusive in a few places
- rename StorageType to iota_range_iterator_storage_type
- Use std::conditional instead of custom specialization
- Address comments
- Address comments
- Prevent overflow of PastEndValue + tests
- Address comments
- Fix clang-tidy warning
- Fix another clang-tidy warning
- Fix UB in reverse iteration with signed type
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
28 |
clang-format puts it back to the same line :-/
Done. | |
199 |
Fixed it. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
159 | So you can't iota_range and enum with uint8_t underlying type and 256 values ? That seems a bit artificial. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
224 | Maybe also note that all enum values are generated, even those that do not appear in the enumerator list ? |
- Added warning on validity of enum range iteration.
- Use uintmax_t/intmax_t types for storage.
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
159 |
Fair enough. I'm slightly concerned about integer promotion/conversion rules but it should be safe (famous last words). |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
159 | Well, now I'm wondering why we don't just use uintptr_t all the time. Why does the storage type ever have to be signed intptr_t? Please do verify that the test failures in the latest run aren't your fault (e.g. by rebasing and letting the tests run again) before committing. | |
217–219 | Maybe "seq will generate each consecutive value, even if no enumerator with that value exists." |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
159 |
So there are enums that explicitly use negative values (e.g. FloatingPointMode.h:44) so I think it's reasonable to allow negative values.
Thx a lot for the time you spent on the review. Highly appreciate it.
Yes. |
llvm/include/llvm/ADT/Sequence.h | ||
---|---|---|
159 | For the record, I was talking about the storage type (raw_type) being unsigned, not the value type (value_type). I certainly agree that the value_type needs to be able to be signed or unsigned, enum or integral. But the reason you ever need anything besides uintptr_t as the raw_type is much subtler and harder to see. |
This is currently blocked on failing tests see https://reviews.llvm.org/D83351#2857797 for reference.
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
---|---|---|
87–89 | LGTM, but per @lebedev.ri 's comment in D83351, I think this would benefit from a follow-up patch that turns index_begin and index_end into signed ints. | |
llvm/unittests/ADT/SequenceTest.cpp | ||
118–119 | Pre-existing consistency issue: I see all of the following in llvm/unittests/, and they can't all be right: #if !defined(NDEBUG) && GTEST_HAS_DEATH_TEST // 1 #if defined(GTEST_HAS_DEATH_TEST) && !defined(NDEBUG) // 2 #ifndef NDEBUG // 3 and as a two-liner: #ifdef GTEST_HAS_DEATH_TEST #ifndef NDEBUG // 4 In a vacuum, I'd recommend (2) above, because once the #if condition mentions DEATH, then your comment on line 118 becomes redundant and can be removed. |
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
---|---|---|
87–89 | This change NACK. for (unsigned SetIdx : seq<int>(AL.index_begin(), AL.index_end())) { or for (unsigned SetIdx : seq((int)AL.index_begin(), (int)AL.index_end())) { |
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
---|---|---|
87–89 | I approve of seq((int)AL.index_begin(), (int)AL.index_end()) but not seq<int>(AL.index_begin(), AL.index_end()) — I don't want people writing explicit template arguments all over the place. |
- Fix DEATH_TEST guard
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
---|---|---|
87–89 | Ideally, AttributeList should be changed to preserve iterator semantic, this was the intention of one of the previous author (Read summary of D32811). I went for the very explicit version because using int involves two casts:
See Integer conversions in https://en.cppreference.com/w/c/language/conversion I think it's important to make sure the code is crystal clear and the double cast makes it non trivial to reason about. We can:
WDYT? |
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
---|---|---|
87–89 | Since I'm blocked on D105485, I'll submit this one using the idiomatic way of iterating through AttributeList. This is consistent with what I did in D105485 as well. % git grep \.index_begin\( include/llvm/IR/Attributes.h:764: unsigned index_begin() const { return AttributeList::FunctionIndex; } lib/Bitcode/Writer/BitcodeWriter.cpp:836: for (unsigned i = AL.index_begin(), e = AL.index_end(); i != e; ++i) { lib/Bitcode/Writer/ValueEnumerator.cpp:1039: for (unsigned i = PAL.index_begin(), e = PAL.index_end(); i != e; ++i) { lib/IR/Attributes.cpp:1550: for (unsigned i = index_begin(), e = index_end(); i != e; ++i) { lib/Transforms/Utils/FunctionComparator.cpp:113: for (unsigned i = L.index_begin(), e = L.index_end(); i != e; ++i) { |
Nits: Please add a newline before struct; and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted.