This is an archive of the discontinued LLVM Phabricator instance.

[clang] adds `__type_pack_index` so we can get a type's parameter pack index
Needs ReviewPublic

Authored by cjdb on Jun 1 2023, 5:27 PM.

Details

Summary

Similarly to __type_pack_element, C++ doesn't offer a way to extract a
the index of the first occurrence of a type in a parameter pack. This
means that we need to build up context for something that the compiler
already has, and then compute the value in a much less efficient manner
than the compiler.

Clang can compute this information without needing to use extra
resources.

Diff Detail

Event Timeline

cjdb created this revision.Jun 1 2023, 5:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 5:27 PM
cjdb requested review of this revision.Jun 1 2023, 5:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 5:27 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cjdb updated this revision to Diff 527681.Jun 1 2023, 5:48 PM

fixes some code I wrote when debugging

cjdb added inline comments.Jun 1 2023, 6:14 PM
clang/include/clang/AST/Stmt.h
796–802

These numbers feel very low for how this modification is using them. Perhaps they should both be bumped to 32?

dblaikie added inline comments.Jun 1 2023, 10:02 PM
clang/include/clang/AST/Stmt.h
796–802

clang does, I think, have specific implementation limits for this sort of thing -

Well, maybe we don't actually enforce a limit on number of template arguments... https://godbolt.org/z/accncYson compiles successfully, and has 2^18 parameters, beyond the 2^16 suggested here.

But maybe 2^16 is just what we test for/somewhat guarantee.

But if the Value is going to be the index into the args, shouldn't it be the same size as NumArgs, at least? (& the comment says 16, so 16 for both Value and NumArgs would seem suitably symmetric, and no larger than the current situation - since it'd just be splitting NumArgs in half, while still meeting the comment's suggestion/requirements?)

An assert, at least, when populating NumArgs to ensure it's no larger might not hurt... (which might be reachable from code/not checked prior - so it could be a hard compilation error, I guess, but I wouldn't feel super strongly about it)

Needs a release note.

Also, is this something that has been requested by library? I'd hope this is something that will be used, so I'd like evidence of that.

clang/include/clang/AST/Stmt.h
796–802

Can you explain the math here of how you chose to change this to 16? I see that you removed 7 bits, but took away 16. I'm not sure what NumExprBits is doing though, I got lost a little in that, so if you can calculate that to make sure this needs to be done, it would be appreciated.

Additionally, a static_assert after this to ensure the size isn't changing would also be appreciated.

clang/include/clang/Basic/DiagnosticSemaKinds.td
2960
clang/lib/Sema/SemaExprCXX.cpp
5562

Oh, please don't do this.

5591

What do you mean by usually deliberate here? This is a message users will see, so telling them an assertion is deliberate seems incorrect?

5620

I realize this is the 1st, but this seems like it is going to be a maintenance pain. Can you at least split this into a static-function of IsIntegralTypeTrait that we can use later with table-gen if this gets out of hand?

5625

For the purposes of making the two the same, can you integrate this into EvaluateBooleanTypeTrait? That would be something like:

if (IsIntegralTypeTrait(Kind))
  return EvaluateIntegralTypeTrait(...);
return EvaluateBooleanTypeTrait(...);
dblaikie added inline comments.Jun 2 2023, 8:39 AM
clang/lib/Sema/SemaExprCXX.cpp
5562

perhaps another way to do this if you want to avoid repeating the return expression would be a small lambda that contains the return expression and uses a simple name - so the returns can just be return ret(); ? (or I guess nest the switch in an if, or in another function that uses ref parameters to populate the result)

Though in this case it's only twice, so I'd probably repeat the expression?

5591

I second the question - though I'd push back on the "This is a message users will see" - unreachables won't be seen by users, they get lowered to UB in release builds - the text isn't in the program anymore. The text is only for Clang developers (and/or clang-as-library users).

cjdb added inline comments.Jun 2 2023, 9:43 AM
clang/include/clang/AST/Stmt.h
796–802

But if the Value is going to be the index into the args, shouldn't it be the same size as NumArgs, at least? (& the comment says 16, so 16 for both Value and NumArgs would seem suitably symmetric, and no larger than the current situation - since it'd just be splitting NumArgs in half, while still meeting the comment's suggestion/requirements?)

Someone independently confirmed that you can have 200k types in a template, so we should probably be able to count at least that high (or alternatively, we should possibly consider not allowing more than 2^16 template parameters).

Can you explain the math here of how you chose to change this to 16? I see that you removed 7 bits, but took away 16. I'm not sure what NumExprBits is doing though, I got lost a little in that, so if you can calculate that to make sure this needs to be done, it would be appreciated.

The value of NumArgs hasn't changed: I've just codified it. I can't work out why we need NumExprBits, but it's apparently required padding (when I removed it, a bunch of tests failed). Its value is 18, courtesy of clangd in VS Code.

Additionally, a static_assert after this to ensure the size isn't changing would also be appreciated.

There's a static_assert in one of the Stmt constructors, which doesn't want more than eight bytes (we apparently need 16 if we're to change this).

erichkeane added inline comments.Jun 2 2023, 10:12 AM
clang/include/clang/AST/Stmt.h
796–802

Got it, thanks! We need the NumExprBits because this gets cast to that ExprBitFields (since this inherits from Expr). So thats the 'base type' bits.

18 + 8 + 8 (for the NumExprbits and Kind and Value fields) is 34, so that leaves plenty of extra bits here, right? Usually in these cases we leave things 'as big as we can without growing the thing', then comment where extra bits can be stolen in the future.

So I would just make this 'the rest' of our size, since we're already over 4 bytes, mind as well use them as long as we can.

cjdb added inline comments.Jun 2 2023, 2:27 PM
clang/lib/Sema/SemaExprCXX.cpp
5620

Done, but in D152034. It's a change that should happen independently of this one IMO.

erichkeane added inline comments.Jun 2 2023, 2:29 PM
clang/lib/Sema/SemaExprCXX.cpp
5620

Thanks! THat looks good.

rsmith added a subscriber: rsmith.Jun 2 2023, 4:41 PM

Taking a step back for a moment: what is the intended use case for this? My concern is that most of the time you're going to want to make O(N) such queries into a pack of N elements, resulting in O(N^2) compile time, much like we regrettably get from uses of __type_pack_element. If we can avoid the regret in this case by providing a different intrinsic, perhaps we should do so. (For example, we could take two packs of types, and produce a sequence of integers giving the indexes of each of the first types in the second list, in O(N) time. And similarly we could add a __type_pack_elements that takes a pack of types and a pack of indexes and performs N indexing operations at once, in O(N) time, rather than having the program do it in O(N^2) time.)

cjdb added a comment.EditedJun 5 2023, 10:33 AM

Taking a step back for a moment: what is the intended use case for this? My concern is that most of the time you're going to want to make O(N) such queries into a pack of N elements, resulting in O(N^2) compile time, much like we regrettably get from uses of __type_pack_element. If we can avoid the regret in this case by providing a different intrinsic, perhaps we should do so. (For example, we could take two packs of types, and produce a sequence of integers giving the indexes of each of the first types in the second list, in O(N) time. And similarly we could add a __type_pack_elements that takes a pack of types and a pack of indexes and performs N indexing operations at once, in O(N) time, rather than having the program do it in O(N^2) time.)

When I wrote this I narrowly had std::get<T>(tuple) and std::get<T>(variant) in mind, both of which are linear in nature. Now that you raise this, I can see the described problem and will implement said feature (maybe there exists a world where std::visit can use this too?). Since we're doing this, it may also be worth checking that T1s... are unique in T2s..., which I'd intended to be a separate feature.

Can we get away with modifying __type_pack_element so that we don't need to introduce a new keyword, or should we have two for each?

I am considering making use of this feature in a real-world project. I have a few short observations after rebasing this diff and playing with it for a bit:

  • It seems to me that __has_builtin(__type_pack_index) currently evaluates to false. Please include a check for the feature detection working correctly in the test.
  • Has there been any coordination with GCC folks on the name/semantics of this builtin?

    GCC trunk has recently gained support for __type_pack_element, and on the feature request for it (PR100157), @jwakely has already expressed interest in implementing the reverse of that, i.e. what this diff proposes. A slight incompatibility has managed to seep in: GCC does not allow the use of __type_pack_element in a position where its name would end up being manged; see the Folly bug report. While this is easy to work around (wrap it in a template), as a user, it would be inconvenient needing to add a GCC/Clang check alongside __has_builtin if the two compilers were to come up with incompatibly behaving implementations for this one.
  • What would be the expected way of gracefully handing the queried type not being present in the list?

    Our std::variant-like type uses a can_contain constraint on its accessors to prevent asking for a type that's not in the type list. We currently achieve this by index_of returning sizeof...(Ts) as a sentinel value if no match was found. libc++'s __find_exactly_one_t uses (size_t)-1 for the same purpose.

    I'm targeting C++20, so requires { __type_pack_index(T, InTypes...); } seems to do just what I want, but std::variant is C++17 and std::tuple is even earlier than that. Of course, in earlier language modes you don't have to worry about it working in concepts, but e.g. libc++ static_asserts that the index is not the sentinel. Not sure about what libstdc++'s requirements would be.

    In any case, could you please test/document the expected way of recovering from the error?

Since we're doing this, it may also be worth checking that T1s... are unique in T2s..., which I'd intended to be a separate feature.

It seems like it would be more efficient to include the uniqueness check in the intrinsic, otherwise std::get<T> and helpers like libstdc++'s variant::__index_of<T> will still need some template metaprogramming to check for uniqueness. Even if there's a separate intrinsic for that, it seems more useful to just have one intrinsic that does both. If you have a use case for finding the index of the first T in a pack, maybe add an additional boolean that requests uniqueness and would cause __type_pack_index to fail if T is not unique.

As @BertalanD also said, it doesn't seem useful to be ill-formed if the type isn't in the pack. That makes it unusable for std::get<T> and std::variant. Libstdc++'s __find_uniq_type_in_pack<T, Types...> returns sizeof...(Types) if T isn't found (either this or -1uz seems fine, you can check the returned index is < sizeof...(Types) to tell if the type was found).

clang/docs/LanguageExtensions.rst
1660

If this intrinsic doesn't (yet?) check for uniqueness, then this get<T> doesn't match std::get<T>. It might be worth mentioning that here.