This is an archive of the discontinued LLVM Phabricator instance.

Make enum iteration with seq safe by default
ClosedPublic

Authored by kuhar on Aug 3 2021, 10:57 AM.

Details

Summary

By default llvm::seq would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with llvm::seq and llvm::seq_inclusive and adds two new functions: enum_seq and enum_seq_inclusive.

To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing enum_iteration_traits<SomeEnum>. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing force_iteration_on_noniterable_enum.

The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas force_iteration_on_noniterable_enum should stand out and be easy to grep for.

This emerged from a discussion with gchatelet@ about reusing llvm's Sequence.h in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.

Diff Detail

Event Timeline

kuhar created this revision.Aug 3 2021, 10:57 AM
kuhar requested review of this revision.Aug 3 2021, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 10:57 AM
kuhar updated this revision to Diff 363823.Aug 3 2021, 11:37 AM
kuhar updated this revision to Diff 363830.Aug 3 2021, 11:53 AM
kuhar updated this revision to Diff 369372.Aug 29 2021, 10:32 PM
kuhar retitled this revision from [WiP] Make enum iteration with seq safe by default to Make enum iteration with seq safe by default.
kuhar edited the summary of this revision. (Show Details)
kuhar added reviewers: dblaikie, gchatelet, chandlerc.
kuhar set the repository for this revision to rG LLVM Github Monorepo.
kuhar added a subscriber: gchatelet.

Fixes based on the discussion with @gchatelet in https://reviews.llvm.org/D107350.

kuhar updated this revision to Diff 369439.Aug 30 2021, 7:00 AM

Fix formatting

gchatelet added inline comments.Sep 7 2021, 7:24 AM
llvm/include/llvm/ADT/Sequence.h
25

Since the API changed and it's now less easy to iterate over enums can you provide a documentation block explaining the different ways of iterating over numbers and enums?

54–55

A word seems to be missing, also could you rephrase, it's not super clear.

263–266

Either use T everywhere or value_type everywhere (including in the error message).

291

Why is this necessary? Can you add a comment?

292

Maybe just inline

llvm/unittests/ADT/SequenceTest.cpp
235–236

Here and below. in the empty case you can use the gmock matcher IsEmpty (documentation)

EXPECT_THAT(llvm::iota_range(UntypedEnum::A, UntypedEnum::A, false), IsEmpty());

It's not exactly the same semantic as it relies on the empty() member function instead of testing that begin and end are equal, but the test is more symmetric and easier to read.

236

You shouldn't need to qualify the template argument here and below. This would help readability.

Do you/we have a use for this machinery in mind? (especially the extra forge pieces - probably nice to leave that out until/unless it's needed - maybe even if it is needed, might be easier to review separately/in a follow-up patch)

llvm/include/llvm/ADT/Sequence.h
54–60

If the goal is for this to be a compilation error if it's called, then maybe it should be defined as deleted instead of the static_asserts?

template<typename EnumT>
void add_enum_traits(EnumT value) = delete;
dblaikie added inline comments.Sep 7 2021, 9:44 AM
llvm/include/llvm/ADT/Sequence.h
67–83

Why are both of these necessary? Presumably there's a default (which is either iterable or not iterable) and there's only a need to declare the non-default case?

Also it'd be preferable if the syntax for this could be simplified to the point where it didn't need a macro.

I'd have thought something like using classic C++ traits:

struct EnumTraits<MyEnum> : std::true_type { };

Might be adequate.

153

Is this (& similar changes) unrelated to this patch? It should be committed separately if that's the case/if it can be done separately.

kuhar updated this revision to Diff 371161.Sep 7 2021, 1:03 PM
kuhar marked 7 inline comments as done.

Rebase and clean up the patch, as suggested by @gchatelet and @dblaikie.

kuhar marked 2 inline comments as done.Sep 7 2021, 1:07 PM

Do you/we have a use for this machinery in mind? (especially the extra forge pieces - probably nice to leave that out until/unless it's needed - maybe even if it is needed, might be easier to review separately/in a follow-up patch)

This works with the existing code by making iteration over enum values safe by considering enums unsafe to iterate over unless opted in. forge_iterable_enum is used as an escape hatch to opt into unsafe iteration when it makes sense at a given callsite, e.g., when only some subset of values is safe to iterate over, or when it's not possible/easy to provide traits next to the enum definition. I used forge_iterable_enum with X86::CondCode in this patch.

llvm/unittests/ADT/SequenceTest.cpp
236

You shouldn't need to qualify the template argument here and below. This would help readability.

This seems required in c++14. iota_range is a class template, not a function template.

A few more typos and nits.

llvm/include/llvm/ADT/Sequence.h
26

Ultra-nit: Technically there's a trailing space. same below.

38

typo

68

"Use this macro to ...."

kuhar updated this revision to Diff 372483.Sep 14 2021, 7:56 AM
kuhar marked 3 inline comments as done.

Rebase. Fix nits and typos, as pointed out by @gchatelet. Remove detail::always_false which was unused.

gchatelet accepted this revision.Sep 14 2021, 8:01 AM

Please wait for @dblaikie approval as well.

llvm/include/llvm/ADT/Sequence.h
293

I think I already asked the question but can't you inline this? I mean removing the using above,

This revision is now accepted and ready to land.Sep 14 2021, 8:01 AM
kuhar added inline comments.Sep 14 2021, 8:16 AM
llvm/include/llvm/ADT/Sequence.h
293

Previously, I had a local variable which I inlined into the decltype.

I prefer the current version over having everything inside the static_assert. This is because there are 3 failure points here:

  1. add_enum_traits is not defined and fails.
  2. add_enum_traits is defined but incorrectly and does not return a type with ::traits_type.
  3. enum_traits::is_iterable is false.

I think compiler error messages may be confusing if we put all possible 3 failures in one place. With this version, 3. should appear on its own line.

I also think that the local typedef helps with debugging with type traits, in case someone want's to inspect their type with something like std::is_same etc.

What do you think?

gchatelet added inline comments.Sep 15 2021, 2:06 AM
llvm/include/llvm/ADT/Sequence.h
293

Yep sounds good to me. Thx for the explanation.

kuhar marked 2 inline comments as done.Sep 15 2021, 6:34 AM

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

llvm/include/llvm/ADT/Sequence.h
11–12

might be good to commit this part separately before the rest, since it's not related to the enum stuff being added in the rest of this patch?

44

Looks like this is unused and could be removed/leave it to users of enum_with_traits to compose this themselves out of enum_type and std::underlying_type_t.

45

Not sure that bundling all these things together's going to be helpful or more complicated. I think there's probably value in this code looking as much as possible like classic C++ traits (with the exception of using an inline function to retrieve the traits if needed/if that's the direction this goes)

So maybe changing "enum_with_traits add_enum_traits(Enum)" to "traits get_enum_traits(Enum)".

This might allow the "escape hatch" part of this patch to go in separately from the first part of the patch (unless existing users need the escape hatch? - though at least even then the escape hatch functionality could be separate/easier to understand/test/implement in isolation from the rest). Since the iota_range<Enum, std::enable_if_t<std::is_enum<Enum>::value>> is separate from the iota_range<enum_with_traits<Enum, Traits>, ...> specialization anyway - I'm not sure the implementation benefits from implementation details that overlap between these two things?

64

I think for usability (error message quality, etc) it might be worth not using forge_iterable_enum here, since if it appears in an error it might be confusing to users since this type isn't being forged, it's using the legitimate entry point for a type truly declared iterable. (so maybe the implementation should be return enum_with_traits<Enum, iterable_enum_traits<Enum>>(Value);)

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.

But maybe others disagree?

kuhar added a comment.EditedSep 29 2021, 10:21 AM

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.

But maybe others disagree?

Let me start with some brief history:

  • I started something similar for an out-of-tree llvm-based compiler, LLPC: https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h. At the time, the implementation could not use llvm::seq because it did not support enums.
  • A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Before this work, the status quo was that people would create hand written for loops like for (MyEnum val = MyEnum::A; val != MyEnum::X, val = static_cast<MyEnum>(static_cast<unsigned>(val) + 1)). This has 3 main issues:

  1. The loop is verbose and ugly.
  2. The underlying type may not be unsigned, and in our codebase nobody used std::underlying_type instead. But that would make it even more verbose. Some places implemented custom operators or helper increment functions instead.
  3. The enum may not be continuous. I don't have links to public bugs in llvm or llpc to point to, but this showed up a few times internally, and I personally messed it up a few times. The issue is that even if the original code had continuous values, modifying the enum definition may invalidate all loops that use this type. This was difficult for me to grep for, given that for loops like this can be written slightly differently, with values hoisted out or custom increment operators, etc. In my opinion, this pattern is very bugprone.

I believe that this patch fixes all 3 issues. The most tricky one to handle is 3.: I found that the most reliable way to ensure that enumeration is safe is to disallow it by default and place this logic/declaration just after enum definition. This way, a person modifying the enum does not have to hunt for all of the uses across the codebase, and is likely to notice potential issues because of the close proximity of the declaration in the source code.

From my perspective, I'm happy with enumRange in LLPC, which is even more restrictive than what is proposed here (it also forces users to specify the first and end enum value). However, I'm concerned that the unsafe default is behavior is allowed with llvm::seq.

If you see this patch as overcomplicated, maybe it would make sense to keep the underlying iteration mechanism of iota_range in place, but disallow enum iteration with seq and implement it separately, with extra safety checks outside of iota_range?

kuhar added inline comments.
llvm/include/llvm/ADT/Sequence.h
11–12

A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Do you have a link to the patch that made that change?

Any references to problems created (bugs introduced/patches that fix those bugs, etc) by that more liberal change compared to the direction in this patch?

kuhar added a comment.Sep 30 2021, 2:46 PM

A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Do you have a link to the patch that made that change?

I believe it was https://reviews.llvm.org/D106279.

Any references to problems created (bugs introduced/patches that fix those bugs, etc) by that more liberal change compared to the direction in this patch?

This API looks inherently unsafe to me. However, I'm not aware of any existing bugs caused by it in LLVM.

ychen added a subscriber: ychen.Oct 2 2021, 1:17 AM

I think I'm OK with this if it's moved away from the macro - using a template specialization type trait system even if that sacrifices the ability to define the trait next to a nested type. (indeed the couple of examples migrated in this patch do the registration outside the outer class rather than benefiting from this nested ability anyway), if that sounds OK?

llvm/include/llvm/ADT/Sequence.h
64

Ping on this.

Let me start with some brief history:

  • I started something similar for an out-of-tree llvm-based compiler, LLPC: https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h. At the time, the implementation could not use llvm::seq because it did not support enums.
  • A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Before this work, the status quo was that people would create hand written for loops like for (MyEnum val = MyEnum::A; val != MyEnum::X, val = static_cast<MyEnum>(static_cast<unsigned>(val) + 1)). This has 3 main issues:

  1. The loop is verbose and ugly.

FWIW, I see this as a benefit. The loop is deeply strange and needs extra attention because of the use of enumerations, so making it stick out like a sore thumb is a good thing IMO.

  1. The underlying type may not be unsigned, and in our codebase nobody used std::underlying_type instead. But that would make it even more verbose. Some places implemented custom operators or helper increment functions instead.
  2. The enum may not be continuous. I don't have links to public bugs in llvm or llpc to point to, but this showed up a few times internally, and I personally messed it up a few times. The issue is that even if the original code had continuous values, modifying the enum definition may invalidate all loops that use this type. This was difficult for me to grep for, given that for loops like this can be written slightly differently, with values hoisted out or custom increment operators, etc. In my opinion, this pattern is very bugprone.

Strongly agreed! That's actually why I'm not super keen on adding facilities to make it easier to iterate over an enumeration. I feel like this need should be somewhat rare in the code base, so it's already not strongly motivated, but the fact that these facilities only make sense on enumerations that are defined in a very particular way makes it feel like blessing a code pattern we don't want people to use in the first place.

I believe that this patch fixes all 3 issues. The most tricky one to handle is 3.: I found that the most reliable way to ensure that enumeration is safe is to disallow it by default and place this logic/declaration just after enum definition. This way, a person modifying the enum does not have to hunt for all of the uses across the codebase, and is likely to notice potential issues because of the close proximity of the declaration in the source code.

I think this approach alleviates my concerns about the danger of using the API (I think the original API was way too dangerous, so fixes here are really appreciated), but it doesn't really alleviate my concerns of "do we want to encourage people to do this by making it easier?".

From my perspective, I'm happy with enumRange in LLPC, which is even more restrictive than what is proposed here (it also forces users to specify the first and end enum value). However, I'm concerned that the unsafe default is behavior is allowed with llvm::seq.

If you see this patch as overcomplicated, maybe it would make sense to keep the underlying iteration mechanism of iota_range in place, but disallow enum iteration with seq and implement it separately, with extra safety checks outside of iota_range?

Personally, I sort of wonder if it makes sense to to disallow enum iteration entirely. Are we convinced we want to encourage people to do this even when the enumeration is held just right?

Personally, I sort of wonder if it makes sense to to disallow enum iteration entirely. Are we convinced we want to encourage people to do this even when the enumeration is held just right?

I don't think we want to encourage this pattern but when we have to use it I think it's better to have a tool to help with everything that can go wrong (integer promotion, non UB wrap around for signed types, reverse iteration, etc...).
Being in a position where people re-implement it with subtle bugs is not great either.

Now if enum iteration usage can be brought to 0 then yes we can certainly disallow it entirely.
If not, maybe we want a separate interface to stress that it's an uncommon pattern (most of the underlying logic is shared with integer iteration so it would be best not to implement it twice).

There are not a lot of places in llvm-project where we make use of enum iteration. MachineValueType is probably the most pervasive. It relies on enum iteration to produce MVT or EVT on the fly.
A quick search through the code base gives the following calls for all_valuetypes(), integer_valuetypes(), fp_valuetypes() and vector_valuetypes.

 % git grep all_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
4
 % git grep integer_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
33
 % git grep fp_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
16
 % git grep vector_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
49

If we can afford it, these functions can be implemented with static arrays (either lazily or eagerly initialized). Having real storage will allow reverse iteration which was the main motivation behind me touching this file D102679.
Once these are gone, it may be possible to kill enum iteration entirely if we wish too.

Personally, I sort of wonder if it makes sense to to disallow enum iteration entirely. Are we convinced we want to encourage people to do this even when the enumeration is held just right?

I don't think we want to encourage this pattern but when we have to use it I think it's better to have a tool to help with everything that can go wrong (integer promotion, non UB wrap around for signed types, reverse iteration, etc...).
Being in a position where people re-implement it with subtle bugs is not great either.

Agreed!

Now if enum iteration usage can be brought to 0 then yes we can certainly disallow it entirely.
If not, maybe we want a separate interface to stress that it's an uncommon pattern (most of the underlying logic is shared with integer iteration so it would be best not to implement it twice).

There are not a lot of places in llvm-project where we make use of enum iteration. MachineValueType is probably the most pervasive. It relies on enum iteration to produce MVT or EVT on the fly.
A quick search through the code base gives the following calls for all_valuetypes(), integer_valuetypes(), fp_valuetypes() and vector_valuetypes.

 % git grep all_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
4
 % git grep integer_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
33
 % git grep fp_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
16
 % git grep vector_valuetypes\(\) | grep -v include/llvm/Support/MachineValueType.h | wc -l
49

If we can afford it, these functions can be implemented with static arrays (either lazily or eagerly initialized). Having real storage will allow reverse iteration which was the main motivation behind me touching this file D102679.
Once these are gone, it may be possible to kill enum iteration entirely if we wish too.

Thank you for these details! You've convinced me it's worth having this interface for safety for the existing uses we aren't ready to refactor yet. I hadn't realized this was being used as much as it is in LLVM (I'm far more familiar with the Clang side of things, which I don't think does this often (ever?).) I think the approach you mention here:

If you see this patch as overcomplicated, maybe it would make sense to keep the underlying iteration mechanism of iota_range in place, but disallow enum iteration with seq and implement it separately, with extra safety checks outside of iota_range?

makes sense to me. We make it a bit more clear that we expect enum iteration to be different from seq even if there's a shared underlying implementation.

kuhar added a comment.Oct 19 2021, 8:29 AM

Thanks for the comments. If I understand correctly, we are converging to something like this:

  1. We should keep enum iteration utilities but make them safe by default.
  2. By default, enums shouldn't not be iterable and enum iteration should be opt-in.
    1. @dblaikie prefers the opt-in mechanism to be plain type traits, not the original type traits / ADL / macro mechanism from this patch.
  3. We should rely on iota_range for the implementation, but create a new convenience function instead of reusing seq and seq_inclusive, say enum_seq and enum_seq_inclusive.
    1. To provide an 'escape hatch' to iterate over enums without the new trait, we can use an intentionally ugly traits/tag like forge_iterable_enum(_traits).

If this sounds good to folks, I'll go ahead and prepare a new version of this patch.

Thanks for the comments. If I understand correctly, we are converging to something like this:

  1. We should keep enum iteration utilities but make them safe by default.
  2. By default, enums shouldn't not be iterable and enum iteration should be opt-in.
    1. @dblaikie prefers the opt-in mechanism to be plain type traits, not the original type traits / ADL / macro mechanism from this patch.
  3. We should rely on iota_range for the implementation, but create a new convenience function instead of reusing seq and seq_inclusive, say enum_seq and enum_seq_inclusive.
    1. To provide an 'escape hatch' to iterate over enums without the new trait, we can use an intentionally ugly traits/tag like forge_iterable_enum(_traits).

If this sounds good to folks, I'll go ahead and prepare a new version of this patch.

That sounds like a good plan to me.

Thanks for the comments. If I understand correctly, we are converging to something like this:

  1. We should keep enum iteration utilities but make them safe by default.
  2. By default, enums shouldn't not be iterable and enum iteration should be opt-in.
    1. @dblaikie prefers the opt-in mechanism to be plain type traits, not the original type traits / ADL / macro mechanism from this patch.
  3. We should rely on iota_range for the implementation, but create a new convenience function instead of reusing seq and seq_inclusive, say enum_seq and enum_seq_inclusive.
    1. To provide an 'escape hatch' to iterate over enums without the new trait, we can use an intentionally ugly traits/tag like forge_iterable_enum(_traits).

If this sounds good to folks, I'll go ahead and prepare a new version of this patch.

Yep, sounds about right to me - maybe (1) might not need a trait or tag - could be an extra argument to the enum_seq functions that can only be spelled "force_iteration_on_noniterable_enum" or something like that.

If this sounds good to folks, I'll go ahead and prepare a new version of this patch.

Yes, SGTM.

Yep, sounds about right to me - maybe (1) might not need a trait or tag - could be an extra argument to the enum_seq functions that can only be spelled "force_iteration_on_noniterable_enum" or something like that.

+1 it's easily grepable and long enough to not be tempted to use it more than necessary : )

kuhar updated this revision to Diff 381681.Oct 22 2021, 4:29 PM
kuhar edited the summary of this revision. (Show Details)

Refactor based on the plan from https://reviews.llvm.org/D107378#3073037. Also rebase.

kuhar added inline comments.Oct 22 2021, 4:32 PM
llvm/include/llvm/Support/MachineValueType.h
1408–1410

A enum_iteration_traits definition would have to be provided outside of the class, in the llvm namespace. These could be handled by moving the implementation out of line, but that's kinda of repetitive.

I think the intention behind these helper functions is to define sub-ranges that are fine to iterate over, so it may be better not to make the whole enum iterable.

kuhar added a comment.Oct 29 2021, 7:54 AM

Ping for code review of the new revision.

dblaikie accepted this revision.Nov 1 2021, 4:25 PM

Sounds OK to me - maybe cleanup/improve the force_iteration_on_noniterable_enum to work like other tags (such as llvm::None and std::piecewise_construct).

llvm/include/llvm/Support/MachineValueType.h
1438

Perhaps force_iteration_on_noniterable_enum could be a const variable (like llvm::None) so callers don't have to use {}? (like other tags such as piecewise_construct)

kuhar updated this revision to Diff 383939.Nov 1 2021, 6:37 PM
kuhar marked an inline comment as done.

Use a constant to define force_iteration_on_noniterable_enum. Rebase.

kuhar added a comment.Nov 1 2021, 6:38 PM

@gchatelet and @aaron.ballman could you please confirm if you are OK with the latest revision?

kuhar updated this revision to Diff 383940.Nov 1 2021, 6:41 PM

Update comment.

aaron.ballman added inline comments.Nov 2 2021, 4:16 AM
llvm/include/llvm/ADT/Sequence.h
41–42

I'm pretty sure this is an ODR violation without also declaring it as inline (this definition otherwise has internal linkage).

nikic added a subscriber: nikic.Nov 2 2021, 5:02 AM

For what it's worth, I really don't like this change. Seems like unnecessary complexity for very little gain.

Case in point, while CmpInst::Predicate is marked as safe to iterate by this change, this is not actually correct. It's safe to iterate all FCMP predicates and it's safe to iterate all ICMP predicates, but it's not safe to cross between those -- there is an unassigned range between them.

llvm/include/llvm/IR/InstrTypes.h
22

Should forward declare the template instead?

kuhar added inline comments.Nov 2 2021, 7:13 AM
llvm/include/llvm/ADT/Sequence.h
41–42

What's the proper way to do it in the pre-C++17 world? I saw that both llvm::None and std::piecewise_construct do it the same way.

aaron.ballman added inline comments.Nov 2 2021, 8:28 AM
llvm/include/llvm/ADT/Sequence.h
41–42

You need to use a function to wrap the variables, or you ignore the ODR issue and hope nothing explodes later. Good point on C++17 though, we don't currently support it. So I think this is "fine" for the moment and we can come back and fix it later once we bump to C++17.

aaron.ballman accepted this revision.Nov 2 2021, 8:40 AM

LGTM!

For what it's worth, I really don't like this change. Seems like unnecessary complexity for very little gain.

I think we have a need in this area, but I wish we didn't. So I wish we didn't need this change, but I think it's an improvement over the previous interface because it makes it very explicit that you're forming a sequence from a type that may or may not be a contiguous sequence.

kuhar added a comment.Nov 2 2021, 9:01 AM

For what it's worth, I really don't like this change. Seems like unnecessary complexity for very little gain.

Case in point, while CmpInst::Predicate is marked as safe to iterate by this change, this is not actually correct. It's safe to iterate all FCMP predicates and it's safe to iterate all ICMP predicates, but it's not safe to cross between those -- there is an unassigned range between them.

Good catch, I find things like this easy to miss. This made me start working on safe iteration in the first place. I'll update the patch to not declare it as safe and introduce helper function that return subranges for fcm and icmp instead.
With these in place, do you oppose this patch going in, @nikic?

kuhar updated this revision to Diff 384138.Nov 2 2021, 9:27 AM

Don't declare CmpInst::Predicates as safe to iterate over.

kuhar added inline comments.Nov 2 2021, 9:28 AM
llvm/include/llvm/ADT/Sequence.h
41–42

Since this pattern has been working for years with the existing code (llvm::None, std::piecewise_construct), I added a TODO to revisit this after we update to c++17 .

kuhar marked 2 inline comments as done.Nov 2 2021, 9:29 AM
gchatelet edited the summary of this revision. (Show Details)Nov 3 2021, 2:38 AM
gchatelet accepted this revision.Nov 3 2021, 2:44 AM

LGTM!
Make sure to fix the typo and it's good to go on my side.

llvm/include/llvm/ADT/Sequence.h
21

typo std::is_integral

This revision was automatically updated to reflect the committed changes.
kuhar marked an inline comment as done.Nov 3 2021, 5:57 PM

Hi @kuhar,

looks like these changes are reason of bunch of build warnings on the windows builders (VC). I see more than 1000 additional warnings such as

C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(119): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(120): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(119): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(120): warning C4309: 'static_cast': truncation of constant value

starting at this build https://lab.llvm.org/buildbot/#/builders/60/builds/5295

would you take care of them?
Thank you.

kuhar added a comment.Nov 5 2021, 2:58 PM

Hi @kuhar,

looks like these changes are reason of bunch of build warnings on the windows builders (VC). I see more than 1000 additional warnings such as

C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(119): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(120): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(119): warning C4309: 'static_cast': truncation of constant value
C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\include\llvm/ADT/Sequence.h(120): warning C4309: 'static_cast': truncation of constant value

starting at this build https://lab.llvm.org/buildbot/#/builders/60/builds/5295

would you take care of them?
Thank you.

Hi @vvereschaka,

These warnings seem to originate in the existing code, unmodified by this commit (template <typename T, typename U> bool canTypeFitValue(const U Value) {). I think this code was contributed by @gchatelet?
I don't have access to a Windows machine and it's difficult for me to tell how to address these warnings without more details.

understood, thank you!
I'll take a closer look at these warnings to figure out the source.