Page MenuHomePhabricator

[llvm] Add enum iteration to Sequence
ClosedPublic

Authored by gchatelet on Jun 8 2021, 8:30 AM.

Details

Summary

This patch allows iterating typed enum via the ADT/Sequence utility.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Quuxplusone added inline comments.Jun 16 2021, 12:28 PM
llvm/include/llvm/ADT/Sequence.h
223

Yeah, I guess since End + 1 doesn't work for scoped enums, I'm willing to live with *++result.end().

Or... Maybe iota_range should just take three arguments? Does this work?

explicit iota_range_iterator(T Value, bool plus1)
    : Value(static_cast<U>(Value) + plus1) {}

explicit iota_range(T Begin, T End, bool inclusive)
    : Begin(Begin, false), End(End, inclusive) {
  assert(Begin <= End);
}

// seq
return iota_range<T>(Begin, End, false);

// seq_inclusive
return iota_range<T>(Begin, End, true);
llvm/unittests/ADT/SequenceTest.cpp
77–84

I'd also test reverse(seq_inclusive(min, min)).
You shouldn't need the <T> anywhere on lines 72-84.

llvm/unittests/IR/ConstantRangeTest.cpp
1554–1556

You shouldn't need the explicit template argument here.

gchatelet updated this revision to Diff 352655.Jun 17 2021, 2:47 AM
gchatelet marked 4 inline comments as done.
  • Address comments
llvm/include/llvm/ADT/Sequence.h
223

Begin and End are not iterators but values (BTW I renamed them to avoid confusion), this is needed because of rbegin()/rend() must be constructed from values.

Now this can be handled by adding an additional member but then it gets really confusing when creating the reverse iterators:

value_type Begin;
value_type End;
bool Inclusive;

explicit iota_range(T Begin, T End, bool Inclusive)
    : Begin(Begin), End(End), Inclusive(Inclusive) {
  assert(Begin <= End);
}

auto begin() const { return const_iterator(Begin, false); }
auto end() const { return const_iterator(End, Inclusive); }

auto rbegin() const { return const_reverse_iterator(*--end(), false); }
auto rend() const { return const_reverse_iterator(*--begin(), false); } // <- false here is confusing
llvm/unittests/ADT/SequenceTest.cpp
77–84

Unfortunately I do because + 1 and - 1 trigger integer promotion which makes both arguments of different types.
I can introduce variables though.

gchatelet added inline comments.Jun 17 2021, 2:52 AM
llvm/include/llvm/ADT/Sequence.h
186–230

@Quuxplusone when using seq_inclusive with End == std::numeric_limits<T>::max() iteration works fine but iterator comparison is broken.
There are two possibilities here:

  1. Leave it as-is and document the broken behavior while allowing inclusive End being INT_MAX
  2. assert when using seq_inclusive and End == std::numeric_limits<T>::max(), effectively disabling iteration up to - and including - INT_MAX.

The current implementation is for 1. but I think 2. is more reasonable.
What do you think?

Quuxplusone added inline comments.Jun 17 2021, 8:22 AM
llvm/include/llvm/ADT/Sequence.h
192–201

Hmm. I suggest

{
  assert(Begin <= End);
  // This assertion forbids iota_range(0, UINT_MAX, true)
  // which would otherwise be treated as an empty range,
  // and iota_range(1, UINT_MAX, true) which would otherwise
  // report that s.begin() > s.end().
  // iota_range(INT_MIN, INT_MAX, true) will have already caused
  // undefined overflow, but this assertion will likely catch it too.
  if (Inclusive)
    assert(End < PastEndValue);
}

FWIW, I like your renaming of Begin to BeginValue etc.

llvm/unittests/ADT/SequenceTest.cpp
77–84

Ah, I see. (FWIW, I'd bikeshed the names to minp1 and maxm1, but whatever.)

gchatelet updated this revision to Diff 352981.Jun 18 2021, 5:36 AM
  • rebase + prevent overflow of PastEndValue + tests
gchatelet marked 2 inline comments as done.Jun 18 2021, 5:38 AM
gchatelet added inline comments.
llvm/include/llvm/ADT/Sequence.h
192–201

Hmm. I suggest

{
  assert(Begin <= End);
  // This assertion forbids iota_range(0, UINT_MAX, true)
  // which would otherwise be treated as an empty range,
  // and iota_range(1, UINT_MAX, true) which would otherwise
  // report that s.begin() > s.end().
  // iota_range(INT_MIN, INT_MAX, true) will have already caused
  // undefined overflow, but this assertion will likely catch it too.
  if (Inclusive)
    assert(End < PastEndValue);
}

The assert didn't trigger in the case of iota_range(INT_MIN, INT_MAX, true) so I used numeric_limits.
Tests are passing.

Quuxplusone added inline comments.Jun 18 2021, 9:55 AM
llvm/include/llvm/ADT/Sequence.h
195–196

I believe teeechnically this is still UB, because if the enumeration has no specified underlying type, and its enumerator values are e.g. enum E { A=1, B=2, C=3 }, then it's UB to evaluate E(4). (It basically works like a bit-field: you can evaluate E(mask) for any mask that doesn't use bits beyond the highest enumerator's bits.) But in practice I think this is fine.
For readability, I would still prefer to see it with an if:

if (Inclusive)
  assert(End != T(std::numeric_limits<typename iterator::underlying_type>::max()));

Although now you've got me confused again about why we're storing BeginValue and PastEndValue as value_type (i.e. E) instead of underlying_type. How does your size() method work when value_type is an enum type? I think data "at rest" should always be stored in the underlying integer type; the only time it should ever be converted to value_type (i.e. E) is when the iterator is being dereferenced.

gchatelet updated this revision to Diff 353791.Jun 22 2021, 2:27 PM
gchatelet marked an inline comment as done.
  • Address comments
  • Fix clang-tidy warning
llvm/include/llvm/ADT/Sequence.h
195–196

I believe teeechnically this is still UB, because if the enumeration has no specified underlying type, and its enumerator values are e.g. enum E { A=1, B=2, C=3 }, then it's UB to evaluate E(4). (It basically works like a bit-field: you can evaluate E(mask) for any mask that doesn't use bits beyond the highest enumerator's bits.) But in practice I think this is fine.

Thx! So I'm now moving all the logic in the underlying type world.
Technically End + 1 is implementation defined for signed types but it's guarded by the assertion so it's a logical error before being implementation defined.

Although now you've got me confused again about why we're storing BeginValue and PastEndValue as value_type (i.e. E) instead of underlying_type. How does your size() method work when value_type is an enum type? I think data "at rest" should always be stored in the underlying integer type; the only time it should ever be converted to value_type (i.e. E) is when the iterator is being dereferenced.

Thx a lot for challenging me on this one. You are correct that we can use underlying_type instead of value_type directly into iota_range.
I've moved things around: I think it's much better now. Let me know if you think this can be improved further.

gchatelet updated this revision to Diff 353793.Jun 22 2021, 2:28 PM
gchatelet marked an inline comment as done.
  • Fix another clang-tidy warning
Quuxplusone accepted this revision.Jun 22 2021, 2:42 PM

I probably shouldn't be the only person to accept this PR, but it looks fine to me at this point.

llvm/include/llvm/ADT/Sequence.h
28

Nits: Please add a newline before struct; and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted.

199

Note that BeginValue-1 might be UB; e.g. seq(INT_MIN, 0).rend(). But I think it's fine to ignore this corner case.

222

"End is not allowed to be"... what?

llvm/unittests/ADT/SequenceTest.cpp
54

Nit: I don't see why you bother defining LAST when you could just say E below.

This revision is now accepted and ready to land.Jun 22 2021, 2:42 PM
gchatelet updated this revision to Diff 353885.Jun 23 2021, 1:45 AM
gchatelet marked 3 inline comments as done.
  • Trying a different approach
  • rebase and use seq_inclusive in a few places
  • rename StorageType to iota_range_iterator_storage_type
  • Use std::conditional instead of custom specialization
  • Address comments
  • Address comments
  • Prevent overflow of PastEndValue + tests
  • Address comments
  • Fix clang-tidy warning
  • Fix another clang-tidy warning
  • Fix UB in reverse iteration with signed type
gchatelet marked an inline comment as done.Jun 23 2021, 1:46 AM
gchatelet added inline comments.
llvm/include/llvm/ADT/Sequence.h
28

Nits: Please add a newline before struct;

clang-format puts it back to the same line :-/

and consider renaming/reordering the parameters so that T comes before U. I think that should be easy and mechanical (if tedious), because neither T nor U is deduced nor defaulted.

Done.

199

Note that BeginValue-1 might be UB; e.g. seq(INT_MIN, 0).rend(). But I think it's fine to ignore this corner case.

Fixed it.

courbet added inline comments.Jun 23 2021, 2:12 AM
llvm/include/llvm/ADT/Sequence.h
159

So you can't iota_range and enum with uint8_t underlying type and 256 values ? That seems a bit artificial.
Maybe we could always use intmax_t (resp. uintmax_t for unsigned underlying types) as raw_type ?

courbet added inline comments.Jun 23 2021, 2:31 AM
llvm/include/llvm/ADT/Sequence.h
224

Maybe also note that all enum values are generated, even those that do not appear in the enumerator list ?

gchatelet updated this revision to Diff 353930.Jun 23 2021, 4:38 AM
gchatelet marked 2 inline comments as done.
  • Added warning on validity of enum range iteration.
  • Use uintmax_t/intmax_t types for storage.
gchatelet marked an inline comment as done.Jun 23 2021, 4:39 AM
gchatelet added inline comments.
llvm/include/llvm/ADT/Sequence.h
159

So you can't iota_range and enum with uint8_t underlying type and 256 values ? That seems a bit artificial.
Maybe we could always use intmax_t (resp. uintmax_t for unsigned underlying types) as raw_type ?

Fair enough. I'm slightly concerned about integer promotion/conversion rules but it should be safe (famous last words).

gchatelet updated this revision to Diff 353936.Jun 23 2021, 5:36 AM
gchatelet marked an inline comment as done.
  • rebase and squash commits
courbet accepted this revision.Jun 23 2021, 6:59 AM

@Quuxplusone are you still OK with the last changes? If so I think I can submit.

Quuxplusone added inline comments.Jun 23 2021, 4:44 PM
llvm/include/llvm/ADT/Sequence.h
159

Well, now I'm wondering why we don't just use uintptr_t all the time. Why does the storage type ever have to be signed intptr_t?
I think the rationale has to do with ensuring that r.begin() < r.end(); but maybe that just indicates that Op::difference is computing its return value via the wrong expression.
But anyway, I continue to be uninterested in blocking this at this point. :)

Please do verify that the test failures in the latest run aren't your fault (e.g. by rebasing and letting the tests run again) before committing.

217–219

Maybe "seq will generate each consecutive value, even if no enumerator with that value exists."

gchatelet updated this revision to Diff 354239.Jun 24 2021, 6:42 AM
  • Updated enum documentation
gchatelet marked an inline comment as done.Jun 24 2021, 6:42 AM
gchatelet marked an inline comment as done.Jun 24 2021, 6:51 AM
gchatelet added inline comments.
llvm/include/llvm/ADT/Sequence.h
159

Well, now I'm wondering why we don't just use uintptr_t all the time. Why does the storage type ever have to be signed intptr_t?

So there are enums that explicitly use negative values (e.g. FloatingPointMode.h:44) so I think it's reasonable to allow negative values.
Also, I think it's a valid use case to use seq with negative values as well (e.g. seq<int>(-10, 0)).

I think the rationale has to do with ensuring that r.begin() < r.end(); but maybe that just indicates that Op::difference is computing its return value via the wrong expression.
But anyway, I continue to be uninterested in blocking this at this point. :)

Thx a lot for the time you spent on the review. Highly appreciate it.

Please do verify that the test failures in the latest run aren't your fault (e.g. by rebasing and letting the tests run again) before committing.

Yes.

Quuxplusone added inline comments.Jun 24 2021, 9:14 AM
llvm/include/llvm/ADT/Sequence.h
159

For the record, I was talking about the storage type (raw_type) being unsigned, not the value type (value_type). I certainly agree that the value_type needs to be able to be signed or unsigned, enum or integral. But the reason you ever need anything besides uintptr_t as the raw_type is much subtler and harder to see.

gchatelet marked an inline comment as done.

This is currently blocked on failing tests see https://reviews.llvm.org/D83351#2857797 for reference.

gchatelet updated this revision to Diff 356499.Mon, Jul 5, 7:04 AM
  • Fix iteration over AttributeList
Quuxplusone added inline comments.
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp
87–89

LGTM, but per @lebedev.ri 's comment in D83351, I think this would benefit from a follow-up patch that turns index_begin and index_end into signed ints.
Style nit: Personally I'd rather see "unsigned(-1)" and "-1, 0, 1...", than "unsigned(~0)" and "~0, 0, 1...".

llvm/unittests/ADT/SequenceTest.cpp
118–119

Pre-existing consistency issue: I see all of the following in llvm/unittests/, and they can't all be right:

#if !defined(NDEBUG) && GTEST_HAS_DEATH_TEST  // 1
#if defined(GTEST_HAS_DEATH_TEST) && !defined(NDEBUG)  // 2
#ifndef NDEBUG  // 3

and as a two-liner:

#ifdef GTEST_HAS_DEATH_TEST
#ifndef NDEBUG  // 4

In a vacuum, I'd recommend (2) above, because once the #if condition mentions DEATH, then your comment on line 118 becomes redundant and can be removed.

lebedev.ri added inline comments.Mon, Jul 5, 7:23 AM
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp
87–89

This change NACK.
It should probably be something like

for (unsigned SetIdx : seq<int>(AL.index_begin(), AL.index_end())) {

or

for (unsigned SetIdx : seq((int)AL.index_begin(), (int)AL.index_end())) {
Quuxplusone added inline comments.Mon, Jul 5, 7:42 AM
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp
87–89

I approve of seq((int)AL.index_begin(), (int)AL.index_end()) but not seq<int>(AL.index_begin(), AL.index_end()) — I don't want people writing explicit template arguments all over the place.
Regardless, this would still benefit from a followup patch to actually change the type of index_begin so that we don't have to cast it on every use.

gchatelet updated this revision to Diff 356507.Mon, Jul 5, 7:46 AM
gchatelet marked 2 inline comments as done.
  • Fix DEATH_TEST guard
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp
87–89

Ideally, AttributeList should be changed to preserve iterator semantic, this was the intention of one of the previous author (Read summary of D32811).
Unfortunately it seems a bit more work than just renaming the enum.

I went for the very explicit version because using int involves two casts:

  • two from unsigned to signed which are implementation defined and thus non standard.
  • one from signed to unsigned which is well defined and uses modulo 2 arithmetic.

See Integer conversions in https://en.cppreference.com/w/c/language/conversion

I think it's important to make sure the code is crystal clear and the double cast makes it non trivial to reason about.

We can:

  1. Use for (unsigned SetIdx : seq<int>(AL.index_begin(), AL.index_end())) although I think it's implementation defined and a fix for something that is broken elsewhere.
  2. Write the loop explicitly explaining why we have to do it this way, allegedly not fixing the real problem either.
  3. Invest extra time to fix the issue in AttributeList so it respects begin<=end.

WDYT?

FYI I'm fixing AttributeList indexing in D105485.

gchatelet updated this revision to Diff 358297.Tue, Jul 13, 9:21 AM
  • rebase and use the same iterating scheme as in D105485.
This revision was landed with ongoing or failed builds.Tue, Jul 13, 9:22 AM
This revision was automatically updated to reflect the committed changes.
gchatelet added inline comments.Tue, Jul 13, 9:22 AM
llvm/tools/llvm-reduce/deltas/ReduceAttributes.cpp
87–89

Since I'm blocked on D105485, I'll submit this one using the idiomatic way of iterating through AttributeList. This is consistent with what I did in D105485 as well.

 % git grep \.index_begin\(
include/llvm/IR/Attributes.h:764:  unsigned index_begin() const { return AttributeList::FunctionIndex; }
lib/Bitcode/Writer/BitcodeWriter.cpp:836:    for (unsigned i = AL.index_begin(), e = AL.index_end(); i != e; ++i) {
lib/Bitcode/Writer/ValueEnumerator.cpp:1039:  for (unsigned i = PAL.index_begin(), e = PAL.index_end(); i != e; ++i) {
lib/IR/Attributes.cpp:1550:  for (unsigned i = index_begin(), e = index_end(); i != e; ++i) {
lib/Transforms/Utils/FunctionComparator.cpp:113:  for (unsigned i = L.index_begin(), e = L.index_end(); i != e; ++i) {

This breaks check-llvm on arm macs: http://45.33.8.238/macm1/13855/step_11.txt

Thx for reporting. I'll revert the patch asap.