Page MenuHomePhabricator

Make enum iteration with seq safe by default
AcceptedPublic

Authored by kuhar on Aug 3 2021, 10:57 AM.

Details

Summary

By default llvm::seq would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch forces the user to declare the enum as iterable before being able to do so (with DECLARE_ITERABLE_ENUM). Similarly, we allow enums to be explicitly marked as unsafe for iteration with DECLARE_NON_ITERABLE_ENUM.

The main benefit of this approach is that these global declarations (DECLARE_ITERABLE_ENUM) appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values.
Because it's not always possible to add these declarations next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis with forge_iterable_enum.

This emerged from a discussion with gchalet@ about reusing llvm's Sequence.h in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.

Diff Detail

Event Timeline

kuhar created this revision.Aug 3 2021, 10:57 AM
kuhar requested review of this revision.Aug 3 2021, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 10:57 AM
kuhar updated this revision to Diff 363823.Aug 3 2021, 11:37 AM
kuhar updated this revision to Diff 363830.Aug 3 2021, 11:53 AM
kuhar updated this revision to Diff 369372.Aug 29 2021, 10:32 PM
kuhar retitled this revision from [WiP] Make enum iteration with seq safe by default to Make enum iteration with seq safe by default.
kuhar edited the summary of this revision. (Show Details)
kuhar added reviewers: dblaikie, gchatelet, chandlerc.
kuhar set the repository for this revision to rG LLVM Github Monorepo.
kuhar added a subscriber: gchatelet.

Fixes based on the discussion with @gchatelet in https://reviews.llvm.org/D107350.

kuhar updated this revision to Diff 369439.Aug 30 2021, 7:00 AM

Fix formatting

gchatelet added inline comments.Sep 7 2021, 7:24 AM
llvm/include/llvm/ADT/Sequence.h
83

Since the API changed and it's now less easy to iterate over enums can you provide a documentation block explaining the different ways of iterating over numbers and enums?

112–113

A word seems to be missing, also could you rephrase, it's not super clear.

303–306

Either use T everywhere or value_type everywhere (including in the error message).

331

Why is this necessary? Can you add a comment?

332

Maybe just inline

llvm/unittests/ADT/SequenceTest.cpp
236–237

Here and below. in the empty case you can use the gmock matcher IsEmpty (documentation)

EXPECT_THAT(llvm::iota_range(UntypedEnum::A, UntypedEnum::A, false), IsEmpty());

It's not exactly the same semantic as it relies on the empty() member function instead of testing that begin and end are equal, but the test is more symmetric and easier to read.

237

You shouldn't need to qualify the template argument here and below. This would help readability.

Do you/we have a use for this machinery in mind? (especially the extra forge pieces - probably nice to leave that out until/unless it's needed - maybe even if it is needed, might be easier to review separately/in a follow-up patch)

llvm/include/llvm/ADT/Sequence.h
112–118

If the goal is for this to be a compilation error if it's called, then maybe it should be defined as deleted instead of the static_asserts?

template<typename EnumT>
void add_enum_traits(EnumT value) = delete;
125–141

Why are both of these necessary? Presumably there's a default (which is either iterable or not iterable) and there's only a need to declare the non-default case?

Also it'd be preferable if the syntax for this could be simplified to the point where it didn't need a macro.

I'd have thought something like using classic C++ traits:

struct EnumTraits<MyEnum> : std::true_type { };

Might be adequate.

193

Is this (& similar changes) unrelated to this patch? It should be committed separately if that's the case/if it can be done separately.

kuhar updated this revision to Diff 371161.Sep 7 2021, 1:03 PM
kuhar marked 7 inline comments as done.

Rebase and clean up the patch, as suggested by @gchatelet and @dblaikie.

kuhar marked 2 inline comments as done.Sep 7 2021, 1:07 PM

Do you/we have a use for this machinery in mind? (especially the extra forge pieces - probably nice to leave that out until/unless it's needed - maybe even if it is needed, might be easier to review separately/in a follow-up patch)

This works with the existing code by making iteration over enum values safe by considering enums unsafe to iterate over unless opted in. forge_iterable_enum is used as an escape hatch to opt into unsafe iteration when it makes sense at a given callsite, e.g., when only some subset of values is safe to iterate over, or when it's not possible/easy to provide traits next to the enum definition. I used forge_iterable_enum with X86::CondCode in this patch.

llvm/unittests/ADT/SequenceTest.cpp
237

You shouldn't need to qualify the template argument here and below. This would help readability.

This seems required in c++14. iota_range is a class template, not a function template.

A few more typos and nits.

llvm/include/llvm/ADT/Sequence.h
24

Ultra-nit: Technically there's a trailing space. same below.

36

typo

126

"Use this macro to ...."

kuhar updated this revision to Diff 372483.Sep 14 2021, 7:56 AM
kuhar marked 3 inline comments as done.

Rebase. Fix nits and typos, as pointed out by @gchatelet. Remove detail::always_false which was unused.

gchatelet accepted this revision.Sep 14 2021, 8:01 AM

Please wait for @dblaikie approval as well.

llvm/include/llvm/ADT/Sequence.h
333

I think I already asked the question but can't you inline this? I mean removing the using above,

This revision is now accepted and ready to land.Sep 14 2021, 8:01 AM
kuhar added inline comments.Sep 14 2021, 8:16 AM
llvm/include/llvm/ADT/Sequence.h
333

Previously, I had a local variable which I inlined into the decltype.

I prefer the current version over having everything inside the static_assert. This is because there are 3 failure points here:

  1. add_enum_traits is not defined and fails.
  2. add_enum_traits is defined but incorrectly and does not return a type with ::traits_type.
  3. enum_traits::is_iterable is false.

I think compiler error messages may be confusing if we put all possible 3 failures in one place. With this version, 3. should appear on its own line.

I also think that the local typedef helps with debugging with type traits, in case someone want's to inspect their type with something like std::is_same etc.

What do you think?

gchatelet added inline comments.Sep 15 2021, 2:06 AM
llvm/include/llvm/ADT/Sequence.h
333

Yep sounds good to me. Thx for the explanation.

kuhar marked 2 inline comments as done.Sep 15 2021, 6:34 AM

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

llvm/include/llvm/ADT/Sequence.h
9–33

might be good to commit this part separately before the rest, since it's not related to the enum stuff being added in the rest of this patch?

102

Looks like this is unused and could be removed/leave it to users of enum_with_traits to compose this themselves out of enum_type and std::underlying_type_t.

103

Not sure that bundling all these things together's going to be helpful or more complicated. I think there's probably value in this code looking as much as possible like classic C++ traits (with the exception of using an inline function to retrieve the traits if needed/if that's the direction this goes)

So maybe changing "enum_with_traits add_enum_traits(Enum)" to "traits get_enum_traits(Enum)".

This might allow the "escape hatch" part of this patch to go in separately from the first part of the patch (unless existing users need the escape hatch? - though at least even then the escape hatch functionality could be separate/easier to understand/test/implement in isolation from the rest). Since the iota_range<Enum, std::enable_if_t<std::is_enum<Enum>::value>> is separate from the iota_range<enum_with_traits<Enum, Traits>, ...> specialization anyway - I'm not sure the implementation benefits from implementation details that overlap between these two things?

122

I think for usability (error message quality, etc) it might be worth not using forge_iterable_enum here, since if it appears in an error it might be confusing to users since this type isn't being forged, it's using the legitimate entry point for a type truly declared iterable. (so maybe the implementation should be return enum_with_traits<Enum, iterable_enum_traits<Enum>>(Value);)

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.

But maybe others disagree?

kuhar added a comment.EditedWed, Sep 29, 10:21 AM

@jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?

@dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?

I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.

FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.

I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.

But maybe others disagree?

Let me start with some brief history:

  • I started something similar for an out-of-tree llvm-based compiler, LLPC: https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h. At the time, the implementation could not use llvm::seq because it did not support enums.
  • A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Before this work, the status quo was that people would create hand written for loops like for (MyEnum val = MyEnum::A; val != MyEnum::X, val = static_cast<MyEnum>(static_cast<unsigned>(val) + 1)). This has 3 main issues:

  1. The loop is verbose and ugly.
  2. The underlying type may not be unsigned, and in our codebase nobody used std::underlying_type instead. But that would make it even more verbose. Some places implemented custom operators or helper increment functions instead.
  3. The enum may not be continuous. I don't have links to public bugs in llvm or llpc to point to, but this showed up a few times internally, and I personally messed it up a few times. The issue is that even if the original code had continuous values, modifying the enum definition may invalidate all loops that use this type. This was difficult for me to grep for, given that for loops like this can be written slightly differently, with values hoisted out or custom increment operators, etc. In my opinion, this pattern is very bugprone.

I believe that this patch fixes all 3 issues. The most tricky one to handle is 3.: I found that the most reliable way to ensure that enumeration is safe is to disallow it by default and place this logic/declaration just after enum definition. This way, a person modifying the enum does not have to hunt for all of the uses across the codebase, and is likely to notice potential issues because of the close proximity of the declaration in the source code.

From my perspective, I'm happy with enumRange in LLPC, which is even more restrictive than what is proposed here (it also forces users to specify the first and end enum value). However, I'm concerned that the unsafe default is behavior is allowed with llvm::seq.

If you see this patch as overcomplicated, maybe it would make sense to keep the underlying iteration mechanism of iota_range in place, but disallow enum iteration with seq and implement it separately, with extra safety checks outside of iota_range?

kuhar added inline comments.
llvm/include/llvm/ADT/Sequence.h
9–33

A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Do you have a link to the patch that made that change?

Any references to problems created (bugs introduced/patches that fix those bugs, etc) by that more liberal change compared to the direction in this patch?

kuhar added a comment.Thu, Sep 30, 2:46 PM

A couple of weeks later, I learned that seq was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Do you have a link to the patch that made that change?

I believe it was https://reviews.llvm.org/D106279.

Any references to problems created (bugs introduced/patches that fix those bugs, etc) by that more liberal change compared to the direction in this patch?

This API looks inherently unsafe to me. However, I'm not aware of any existing bugs caused by it in LLVM.

ychen added a subscriber: ychen.Sat, Oct 2, 1:17 AM

I think I'm OK with this if it's moved away from the macro - using a template specialization type trait system even if that sacrifices the ability to define the trait next to a nested type. (indeed the couple of examples migrated in this patch do the registration outside the outer class rather than benefiting from this nested ability anyway), if that sounds OK?

llvm/include/llvm/ADT/Sequence.h
122

Ping on this.