This patch allows iterating typed enum via the ADT/Sequence utility.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 214–230 | I don't think it makes sense to force the caller to decide between seq and enum_seq spellings. If I were implementing this facility, I'd just give iota_range a second template type parameter: template<class V, class U>
struct iota_range_iterator {
using value_type = V;
V operator*() const { return static_cast<V>(value_); }
~~~
private:
U value_;
};and then here instantiate iota_range<EnumType, std::underlying_type_t<EnumType>>. template <class T, std::enable_if_t<std::is_integral<T>::value, int> = 0>
auto seq(T Begin, T End) {
return iota_range_iterator<T, T>(Begin, End);
}
template <class T, std::enable_if_t<std::is_enum<T>::value, int> = 0>
auto seq(T Begin, T End) {
using U = std::underlying_type_t<T>;
return iota_range_iterator<T, U>(U(Begin), U(End));
}Notice you don't need std::move because these types are integral. | |
| llvm/include/llvm/CodeGen/ValueTypes.h | ||
| 43 ↗ | (On Diff #350610) | Get rid of IterableEnum as I propose above, and then you can get rid of this new overload. Ditto throughout. |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 214–230 | Thx for helping me taking a step back. Now conflating the two orthogonal aspect make the implementation a bit more complex but I think it's worth it. There is an additional design point that you may have missed. seq has the C++ iterator semantic, i.e. the end value is exclusive. I've fixed this by adding another method seq_inclusive but I'm not too happy about the naming. Suggestions welcome. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 214–230 |
Yuck. I had missed that. seq_inclusive seems like a decent solution to me. If you really wanted to be explicit everywhere, you could have a llvm::halfopen_seq and an llvm::closed_seq... but I assume that's a bad idea because the half-open version is used all over the place and we like the shortness of its name? for (int i : llvm::seq<int>(0, 10)) { /* loop from 0 to 9 */ }
for (int i : llvm::seq<int>(1, 10).inclusive()) { /* loop from 1 to 10 */ }Watch out, though: the trivial implementation of .inclusive() would lead to integer overflow in llvm::seq<int>(INT_MIN, INT_MAX).inclusive() and so you'd have to think about what you wanted to do there: make it work, at the cost of complicated implementation? make it runtime-assert-fail? just ignore the potential overflow? find a simple implementation I'm not thinking of off the top of my head? ;) I definitely agree that we do not want seq<E>(x, y) to iterate over one more element than seq<underlying_type_t<E>>(x, y); the enum-ness of the type parameter should affect only the enum-ness of the value_type, definitely not the number of things iterated over. Btw, some enumeration types do have "one-past-the-end" values, as in enum E { Apple, Orange, Banana, MaxFruit = Banana+1 };
for (E fruit : llvm::halfopen_seq<E>(Apple, MaxFruit)) { /* is correct */ }and of course all integer types do have "closed" range forms as well: for (int i : llvm::closed_seq<int>(INT_MIN, INT_MAX)) { /* is correct */ }
for (int i : llvm::seq<int>(INT_MIN, INT_MAX).inclusive()) { /* is correct */ } | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 214–230 |
I count about 150 call sites most of them are exclusive so yeah, I'd prefer to keep the most used version short.
I usually like fluent interfaces but it has to be omnipotent (setting a value). The way I implemented it would lead to bugs if client calls llvm::seq<int>(1, 10).inclusive().inclusive().
I'll go with runtime assert.
Yeah they ought to be orthogonal.
Let me add more tests to make sure these cases are handled. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 123–129 | llvm::detail::StorageType is too generic of a name to land-grab, IMHO. This can/should be an implementation detail of iota_range_iterator — maybe it doesn't even need to be a template parameter. template<class T>
struct iota_range_iterator {
using storage_type = typename std::conditional_t<std::is_enum<T>::value,
type_identity<T>, std::underlying_type<T>>::type;Technically, instantiating underlying_type<T> for non-enum T was UB before C++20. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 199 | I've just noticed that this is unidiomatic: on normal iterators, --begin() would be UB. (std::reverse_iterator always stores one-past-where-it's-"pointing"-to, for this reason.) I guess it's OK here, but if you've got the time, maybe think about some way to avoid the hiccup here. | |
| 223 | Phab somehow ate my comment here. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 123–129 |
Unfortunately no, std::conditional must evaluated both types, and std::underlying_type<T> does not provide the type member when T is not an enum. The standard seems to specify this use case explicitly. I had to provide type resolution via template specialization. I tried many different solutions but I couldn't make it much better. I renamed StorageType to iota_range_iterator_storage_type so at least it's less susceptible to accidental hijacking.
type_identity doesn't help in that case. | |
| 223 |
Because End may be a scoped enum End + 1 is not always a valid statement. return seq(Begin, static_cast<T>(static_cast<detail::StorageType<T>::type>(End) + 1)); Which, IMHO, is redoing most of the work that has been moved to iota_range_iterator (hence the use of iterators to bump the values) | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 123–129 | I'd mixed up the arguments to conditional_t. Try this: | |
- Use std::conditional instead of custom specialization
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 123–129 | oh... my apologies. I should have checked as well. I had convinced myself that the error was std::conditional_t needing to resolve both types ... So I had to provide a C++14 type_identity but it works. | |
Please take another look @Quuxplusone .
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 199 | I've added a note as I couldn't find a better implementation. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 215–229 | I would say "from Begin to End, exclusive of End" — or more colloquially but clearer IMO, "from Begin up to but not including End." | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 223 | Yeah, I guess since End + 1 doesn't work for scoped enums, I'm willing to live with *++result.end(). Or... Maybe iota_range should just take three arguments? Does this work? explicit iota_range_iterator(T Value, bool plus1)
: Value(static_cast<U>(Value) + plus1) {}
explicit iota_range(T Begin, T End, bool inclusive)
: Begin(Begin, false), End(End, inclusive) {
assert(Begin <= End);
}
// seq
return iota_range<T>(Begin, End, false);
// seq_inclusive
return iota_range<T>(Begin, End, true); | |
| llvm/unittests/ADT/SequenceTest.cpp | ||
| 77–84 | I'd also test reverse(seq_inclusive(min, min)). | |
| llvm/unittests/IR/ConstantRangeTest.cpp | ||
| 1554–1556 | You shouldn't need the explicit template argument here. | |
- Address comments
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 223 | Begin and End are not iterators but values (BTW I renamed them to avoid confusion), this is needed because of rbegin()/rend() must be constructed from values. Now this can be handled by adding an additional member but then it gets really confusing when creating the reverse iterators: value_type Begin;
value_type End;
bool Inclusive;
explicit iota_range(T Begin, T End, bool Inclusive)
: Begin(Begin), End(End), Inclusive(Inclusive) {
assert(Begin <= End);
}
auto begin() const { return const_iterator(Begin, false); }
auto end() const { return const_iterator(End, Inclusive); }
auto rbegin() const { return const_reverse_iterator(*--end(), false); }
auto rend() const { return const_reverse_iterator(*--begin(), false); } // <- false here is confusing | |
| llvm/unittests/ADT/SequenceTest.cpp | ||
| 77–84 | Unfortunately I do because + 1 and - 1 trigger integer promotion which makes both arguments of different types. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 186–230 | @Quuxplusone when using seq_inclusive with End == std::numeric_limits<T>::max() iteration works fine but iterator comparison is broken.
The current implementation is for 1. but I think 2. is more reasonable. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 192–201 | Hmm. I suggest {
assert(Begin <= End);
// This assertion forbids iota_range(0, UINT_MAX, true)
// which would otherwise be treated as an empty range,
// and iota_range(1, UINT_MAX, true) which would otherwise
// report that s.begin() > s.end().
// iota_range(INT_MIN, INT_MAX, true) will have already caused
// undefined overflow, but this assertion will likely catch it too.
if (Inclusive)
assert(End < PastEndValue);
}FWIW, I like your renaming of Begin to BeginValue etc. | |
| llvm/unittests/ADT/SequenceTest.cpp | ||
| 77–84 | Ah, I see. (FWIW, I'd bikeshed the names to minp1 and maxm1, but whatever.) | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 192–201 |
The assert didn't trigger in the case of iota_range(INT_MIN, INT_MAX, true) so I used numeric_limits. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 195–196 | I believe teeechnically this is still UB, because if the enumeration has no specified underlying type, and its enumerator values are e.g. enum E { A=1, B=2, C=3 }, then it's UB to evaluate E(4). (It basically works like a bit-field: you can evaluate E(mask) for any mask that doesn't use bits beyond the highest enumerator's bits.) But in practice I think this is fine. if (Inclusive) assert(End != T(std::numeric_limits<typename iterator::underlying_type>::max())); Although now you've got me confused again about why we're storing BeginValue and PastEndValue as value_type (i.e. E) instead of underlying_type. How does your size() method work when value_type is an enum type? I think data "at rest" should always be stored in the underlying integer type; the only time it should ever be converted to value_type (i.e. E) is when the iterator is being dereferenced. | |
- Address comments
- Fix clang-tidy warning
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 195–196 |
Thx! So I'm now moving all the logic in the underlying type world.
Thx a lot for challenging me on this one. You are correct that we can use underlying_type instead of value_type directly into iota_range. | |
I probably shouldn't be the only person to accept this PR, but it looks fine to me at this point.
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 28 | Nits: Please add a newline before struct; and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted. | |
| 199 | Note that BeginValue-1 might be UB; e.g. seq(INT_MIN, 0).rend(). But I think it's fine to ignore this corner case. | |
| 222 | "End is not allowed to be"... what? | |
| llvm/unittests/ADT/SequenceTest.cpp | ||
| 54 | Nit: I don't see why you bother defining LAST when you could just say E below. | |
- Trying a different approach
- rebase and use seq_inclusive in a few places
- rename StorageType to iota_range_iterator_storage_type
- Use std::conditional instead of custom specialization
- Address comments
- Address comments
- Prevent overflow of PastEndValue + tests
- Address comments
- Fix clang-tidy warning
- Fix another clang-tidy warning
- Fix UB in reverse iteration with signed type
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 28 |
clang-format puts it back to the same line :-/
Done. | |
| 199 |
Fixed it. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 159 | So you can't iota_range and enum with uint8_t underlying type and 256 values ? That seems a bit artificial. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 224 | Maybe also note that all enum values are generated, even those that do not appear in the enumerator list ? | |
- Added warning on validity of enum range iteration.
- Use uintmax_t/intmax_t types for storage.
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 159 |
Fair enough. I'm slightly concerned about integer promotion/conversion rules but it should be safe (famous last words). | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 159 | Well, now I'm wondering why we don't just use uintptr_t all the time. Why does the storage type ever have to be signed intptr_t? Please do verify that the test failures in the latest run aren't your fault (e.g. by rebasing and letting the tests run again) before committing. | |
| 217–219 | Maybe "seq will generate each consecutive value, even if no enumerator with that value exists." | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 159 |
So there are enums that explicitly use negative values (e.g. FloatingPointMode.h:44) so I think it's reasonable to allow negative values.
Thx a lot for the time you spent on the review. Highly appreciate it.
Yes. | |
| llvm/include/llvm/ADT/Sequence.h | ||
|---|---|---|
| 159 | For the record, I was talking about the storage type (raw_type) being unsigned, not the value type (value_type). I certainly agree that the value_type needs to be able to be signed or unsigned, enum or integral. But the reason you ever need anything besides uintptr_t as the raw_type is much subtler and harder to see. | |
This is currently blocked on failing tests see https://reviews.llvm.org/D83351#2857797 for reference.
| llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
|---|---|---|
| 87–89 | LGTM, but per @lebedev.ri 's comment in D83351, I think this would benefit from a follow-up patch that turns index_begin and index_end into signed ints. | |
| llvm/unittests/ADT/SequenceTest.cpp | ||
| 118–119 | Pre-existing consistency issue: I see all of the following in llvm/unittests/, and they can't all be right: #if !defined(NDEBUG) && GTEST_HAS_DEATH_TEST // 1 #if defined(GTEST_HAS_DEATH_TEST) && !defined(NDEBUG) // 2 #ifndef NDEBUG // 3 and as a two-liner: #ifdef GTEST_HAS_DEATH_TEST #ifndef NDEBUG // 4 In a vacuum, I'd recommend (2) above, because once the #if condition mentions DEATH, then your comment on line 118 becomes redundant and can be removed. | |
| llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
|---|---|---|
| 87–89 | This change NACK. for (unsigned SetIdx : seq<int>(AL.index_begin(), AL.index_end())) {or for (unsigned SetIdx : seq((int)AL.index_begin(), (int)AL.index_end())) { | |
| llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
|---|---|---|
| 87–89 | I approve of seq((int)AL.index_begin(), (int)AL.index_end()) but not seq<int>(AL.index_begin(), AL.index_end()) — I don't want people writing explicit template arguments all over the place. | |
- Fix DEATH_TEST guard
| llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
|---|---|---|
| 87–89 | Ideally, AttributeList should be changed to preserve iterator semantic, this was the intention of one of the previous author (Read summary of D32811). I went for the very explicit version because using int involves two casts:
See Integer conversions in https://en.cppreference.com/w/c/language/conversion I think it's important to make sure the code is crystal clear and the double cast makes it non trivial to reason about. We can:
WDYT? | |
| llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp | ||
|---|---|---|
| 87–89 | Since I'm blocked on D105485, I'll submit this one using the idiomatic way of iterating through AttributeList. This is consistent with what I did in D105485 as well. % git grep \.index_begin\(
include/llvm/IR/Attributes.h:764: unsigned index_begin() const { return AttributeList::FunctionIndex; }
lib/Bitcode/Writer/BitcodeWriter.cpp:836: for (unsigned i = AL.index_begin(), e = AL.index_end(); i != e; ++i) {
lib/Bitcode/Writer/ValueEnumerator.cpp:1039: for (unsigned i = PAL.index_begin(), e = PAL.index_end(); i != e; ++i) {
lib/IR/Attributes.cpp:1550: for (unsigned i = index_begin(), e = index_end(); i != e; ++i) {
lib/Transforms/Utils/FunctionComparator.cpp:113: for (unsigned i = L.index_begin(), e = L.index_end(); i != e; ++i) { | |
Nits: Please add a newline before struct; and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted.