This is an archive of the discontinued LLVM Phabricator instance.

Support visitor pattern by PointerUnion.
Needs ReviewPublic

Authored by bakhtiyarneyman on Mar 29 2021, 9:15 PM.

Details

Summary

Depends On D99560

Diff Detail

Event Timeline

bakhtiyarneyman requested review of this revision.Mar 29 2021, 9:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2021, 9:15 PM
mehdi_amini added inline comments.
llvm/include/llvm/ADT/PointerUnion.h
263

Other places in ADT are including #include "llvm/Support/ErrorHandling.h" and using llvm_unreachable directly. You should be able to do the same here?

Also, thanks for sending the patch!
Is it possible to write a unit-test for it?

Needs some unit tests.

How's this compare to std::variant's API? It'd probably be a good thing to keep them similar.

Added a unit test.

Thanks for taking a look, Mehdi and David! PTAL.

David, it was a goal to try to keep the API similar to std::variant. The biggest difference is that std::visit is a static function that accepts variable number of variants. I never experienced a need for that and replaced it with a non-static method (less verbose, more idiomatic, less complex). In my mind this API difference is insignificant, and the APIs are similar enough, but maybe when you say similar you mean something different?

ezhulenev added inline comments.
llvm/include/llvm/ADT/PointerUnion.h
302

template <typename... seems to be ~5x times more common in the llvm codebase, though didn't find anything in the style guide (google style guide recommends typename)

llvm/unittests/ADT/PointerUnionTest.cpp
175

I'd add a test case with labmdas returning values?

Replaced class with typename in template definition.

bakhtiyarneyman marked 3 inline comments as done.Mar 30 2021, 3:57 PM
bakhtiyarneyman added inline comments.
llvm/unittests/ADT/PointerUnionTest.cpp
175

Discussed offline: unit test of Visitor in the diffbase tests this.

Thanks for taking a look, Mehdi and David! PTAL.

David, it was a goal to try to keep the API similar to std::variant. The biggest difference is that std::visit is a static function that accepts variable number of variants. I never experienced a need for that and replaced it with a non-static method (less verbose, more idiomatic, less complex). In my mind this API difference is insignificant, and the APIs are similar enough, but maybe when you say similar you mean something different?

I'd be inclined to stick with the same API as std::variant - though I guess that API doesn't directly make sense/isn't especially usable until C++23 with overloadable lambdas, etc. Fair enough, member seems OK, I'd lean slightly towards a bit more std::variant like (eg: llvm::visit(lambda_1, lambda_2, ..., the_variant); - yeah, it does feel a bit wonky, but at least for now I'd favor the consistency even with a future standard API - but not so much that I'll insist on it - up to your judgment).

(+ @lhames too here, in case he's interested in this sort of thing)

llvm/include/llvm/ADT/PointerUnion.h
69–91

sorry - could you summarize what is changing here?

279–299

No chance of reasonably generalizing this? Pity to have such a hardcoded limit.

True llvm::visit API similar to std::visit requires visited type to be a std::variant (llvm::variant?) and implementng it is a lot of code (e.g. implementation in absl). Given that it's unlikely that PointerUnion at any point will become std::variant, I'd say that it will always be a separate API. And for this usability improvement the change is not that large.

True llvm::visit API similar to std::visit requires visited type to be a std::variant (llvm::variant?) and implementng it is a lot of code (e.g. implementation in absl). Given that it's unlikely that PointerUnion at any point will become std::variant, I'd say that it will always be a separate API. And for this usability improvement the change is not that large.

it's not so much that PointerUnion might be replaced by std::variant one day - I agree that that's not expected.

It's that having two similar types with slightly different ways of spelling similar operations would be a hinderance to programmer productivity - having to think about which one to use in the given context. Not that this'd come up all that often, but consistency is nice to have.

bakhtiyarneyman marked an inline comment as done.

Aligned the calling convention of visit to match that of std::visit. Renamed the more convenient version of visit to match to have the best of the two worlds: better ergonomics and not clashing with std::visit.

Aligned the calling convention of visit to match that of std::visit. Renamed the more convenient version of visit to match to have the best of the two worlds: better ergonomics and not clashing with std::visit. This time for real.

Aligned the calling convention of visit to match that of std::visit. Renamed the more convenient version of visit to match to have the best of the two worlds: better ergonomics and not clashing with std::visit.

I'm still inclined towards only exposing the standard phrasing - even with a different name, it'll be a matter of "Oh, what's this match thing?" when it could be "Oh, I've seen this visit thing before" and also the issue of "Oh, I'm used to using x.match(a, b) - ah, bother, this is the standard thing that doesn't support x.match, I have to use visit(a, b, x) instead".

scott.linder added a subscriber: scott.linder.EditedApr 5 2021, 10:39 PM

To chime in on the spelling of things relative to std, I had similar questions for IntrusiveVariant in https://reviews.llvm.org/D98477

A natural question that might help with deciding how to handle visit is: how should we spell alternative type access? std uses the free function std::get. It makes sense, as that same free function is overloaded for e.g. std::tuple, std::pair, std::variant, etc. but we don't currently have anything similar in ADT.

I think my leaning would be to go all-in on the std approach by doing the following:

  • Add an llvm::adt namespace (or something similar) containing things which are "designed to fulfill the same roles as the identically-named member of the std namespace, but for whatever reason we cannot actually use that member directly". Generally the reason will be: that thing is not an extension point, and it is forbidden to re-open std to overload that thing.
  • Add the appropriate llvm::adt::get(llvm::PointerUnion) overloads
  • When adding InrusiveVariant, I can add llvm::adt::get(llvm::IntrusiveVariant) overloads
  • Add llvm::adt::visit and the appropriate llvm::PointerUnion overloads
  • When adding IntrusiveVariant, I can add llvm::adt::visit overloads

Then everything is spelled the same/quacks the same, but names are not ambiguous in the context of using namespace llvm; or using namespace std; or both.

For other things, where we expect to delete our copy entirely once we move the codebase to a certain std version, we can just drop the exact "polyfill" definition with the same name into llvm::, and then delete the code later. An example I plan to post a patch for is in_place{_type,_index}_t.

What are people's thoughts?

Edit: Alternatively the additional namespace I mention could be omitted, and every use of e.g. get would have to be qualified with either std:: or llvm:: if it would be ambiguous. My intention was to try to make it obvious that the adt:: version is definitely not actually the std:: version, even with the most common forms of using namespace.

To chime in on the spelling of things relative to std, I had similar questions for IntrusiveVariant in https://reviews.llvm.org/D98477

A natural question that might help with deciding how to handle visit is: how should we spell alternative type access? std uses the free function std::get. It makes sense, as that same free function is overloaded for e.g. std::tuple, std::pair, std::variant, etc. but we don't currently have anything similar in ADT.

I think my leaning would be to go all-in on the std approach by doing the following:

  • Add an llvm::adt namespace (or something similar) containing things which are "designed to fulfill the same roles as the identically-named member of the std namespace, but for whatever reason we cannot actually use that member directly". Generally the reason will be: that thing is not an extension point, and it is forbidden to re-open std to overload that thing.

Oh - what would happen if the name went in llvm directly and we called it via adl? ie: made the call unqualified & let ADL+overload resolution sort it out?

  • Add the appropriate llvm::adt::get(llvm::PointerUnion) overloads
  • When adding InrusiveVariant, I can add llvm::adt::get(llvm::IntrusiveVariant) overloads
  • Add llvm::adt::visit and the appropriate llvm::PointerUnion overloads
  • When adding IntrusiveVariant, I can add llvm::adt::visit overloads

Then everything is spelled the same/quacks the same, but names are not ambiguous in the context of using namespace llvm; or using namespace std; or both.

Are there cases where they'd be ambiguous, though? Wouldn't template ordering still match them distinctly, unless one or more of these templates were overly general?

For other things, where we expect to delete our copy entirely once we move the codebase to a certain std version, we can just drop the exact "polyfill" definition with the same name into llvm::, and then delete the code later. An example I plan to post a patch for is in_place{_type,_index}_t.

What are people's thoughts?

Edit: Alternatively the additional namespace I mention could be omitted, and every use of e.g. get would have to be qualified with either std:: or llvm:: if it would be ambiguous. My intention was to try to make it obvious that the adt:: version is definitely not actually the std:: version, even with the most common forms of using namespace.

I'm not sure to me it's important that they be obviously distinct - if they do the same thing I'm OK with it being a bit vague. Like 'swap' called via the adl (admittedly that one's very well documented as being adl) - overload resolution'll work itself out and if the behavior's obvious enough I don't think it'll matter to most folks whether it's implemented by the standard or llvm.

also the issue of "Oh, I'm used to using x.match(a, b) - ah, bother, this is the standard thing that doesn't support x.match, I have to use visit(a, b, x) instead".

The same thing could be said about many of our APIs in Support/ADT which we make in a form that we think is more convenient than the standard, like how llvm::sort and all the others we have don't need .begin()/.end() to operate on the entire range. I'm annoyed every time I have to an std:: entry point that forces me in the extra boilerplate, but I wouldn't want to remove these nicer APIs from LLVM for "consistency with the standard".
(don't forget also that the standard often will have constraint around API design we don't have, for example around exception handling).

also the issue of "Oh, I'm used to using x.match(a, b) - ah, bother, this is the standard thing that doesn't support x.match, I have to use visit(a, b, x) instead".

The same thing could be said about many of our APIs in Support/ADT which we make in a form that we think is more convenient than the standard, like how llvm::sort and all the others we have don't need .begin()/.end() to operate on the entire range. I'm annoyed every time I have to an std:: entry point that forces me in the extra boilerplate, but I wouldn't want to remove these nicer APIs from LLVM for "consistency with the standard".
(don't forget also that the standard often will have constraint around API design we don't have, for example around exception handling).

To be sure - though I think the algorithm wrappers we implement (like sort(R) instead of sort(begin, end)) don't produce the same sort of confusion that I think this could. The sort ones work on all containers in either version, so you don't end up reaching for one that only works on LLVM data structures and then find out you had a standard container and can't use it, or the other way around.

Right, if we look at our containers, we also find APIs that don't match exactly the standard (DenseMap::find_as or insert_as for example, or SmallVector::pop_back_val), so yes someone swapping std::vector for SmallVector may be surprised.

To chime in on the spelling of things relative to std, I had similar questions for IntrusiveVariant in https://reviews.llvm.org/D98477

A natural question that might help with deciding how to handle visit is: how should we spell alternative type access? std uses the free function std::get. It makes sense, as that same free function is overloaded for e.g. std::tuple, std::pair, std::variant, etc. but we don't currently have anything similar in ADT.

I think my leaning would be to go all-in on the std approach by doing the following:

  • Add an llvm::adt namespace (or something similar) containing things which are "designed to fulfill the same roles as the identically-named member of the std namespace, but for whatever reason we cannot actually use that member directly". Generally the reason will be: that thing is not an extension point, and it is forbidden to re-open std to overload that thing.

Oh - what would happen if the name went in llvm directly and we called it via adl? ie: made the call unqualified & let ADL+overload resolution sort it out?

I hadn't thought too much about using ADL, as pre-C++20 there is a subtle issue with ADL when the function is a template, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0846r0.html

I may have dismissed it too quickly, though, as either using llvm::get; or using namespace llvm; or using namespace std; should be enough to get ADL to work fine for bare get<N>() pre-C++20.

Also, in trying to find other examples of people doing this, it seems like C++17 makes get a customization point for at least one case: https://en.cppreference.com/w/cpp/language/structured_binding (under case 2, it reads "Otherwise, get<i>(e), where get is looked up by argument-dependent lookup only, ignoring non-ADL lookup.")

  • Add the appropriate llvm::adt::get(llvm::PointerUnion) overloads
  • When adding InrusiveVariant, I can add llvm::adt::get(llvm::IntrusiveVariant) overloads
  • Add llvm::adt::visit and the appropriate llvm::PointerUnion overloads
  • When adding IntrusiveVariant, I can add llvm::adt::visit overloads

Then everything is spelled the same/quacks the same, but names are not ambiguous in the context of using namespace llvm; or using namespace std; or both.

Are there cases where they'd be ambiguous, though? Wouldn't template ordering still match them distinctly, unless one or more of these templates were overly general?

I think you're right, I had just assumed there would be an issue for some reason.

For other things, where we expect to delete our copy entirely once we move the codebase to a certain std version, we can just drop the exact "polyfill" definition with the same name into llvm::, and then delete the code later. An example I plan to post a patch for is in_place{_type,_index}_t.

What are people's thoughts?

Edit: Alternatively the additional namespace I mention could be omitted, and every use of e.g. get would have to be qualified with either std:: or llvm:: if it would be ambiguous. My intention was to try to make it obvious that the adt:: version is definitely not actually the std:: version, even with the most common forms of using namespace.

I'm not sure to me it's important that they be obviously distinct - if they do the same thing I'm OK with it being a bit vague. Like 'swap' called via the adl (admittedly that one's very well documented as being adl) - overload resolution'll work itself out and if the behavior's obvious enough I don't think it'll matter to most folks whether it's implemented by the standard or llvm.

That seems reasonable to me, I was just cautious to use ADL for an unqualified name found in std when the standard doesn't explicitly "endorse" it, although I'm not sure why; I don't think there is anything forbidding us from doing it.

I think I'm just cautious around ADL as a feature because it can lead to confusion if we aren't careful, and as I linked above it has subtley different behavior for templates pre-C++20. As you mention, our get is faithful to the semantics of std::get, so it seems like a good candidate for ADL.

llvm/include/llvm/ADT/PointerUnion.h
306

I don't know that I'm in favor of having this version at all, but even if we do I don't know that comparing it to Rust's match helps explain it. I see Rust's match as closer to std::visit, in that neither use the postfix-dot notation.

Naming aside, I also just think the bar for ergonomics needs to be higher to justify having more than one way to do something.

scott.linder added inline comments.Apr 7 2021, 9:59 AM
llvm/include/llvm/ADT/PointerUnion.h
306

To be more specific about why I would lean towards not adding it, this match function lets us write:

variant.match(
  [](...) { ... },
  [](...) { ... },
  ...
  [](...) { ... }
);

instead of:

visit(makeVisitor(
  [](...) { ... },
  [](...) { ... },
  ...
  [](...) { ... }
), variant);

Which is a net elimination of ~3 tokens (i.e. remove makeVisitor, (, ) and replace , with .) in an expression that likely has a non-trivial amount of meaningful code in each "..."

Compare that to the sort case, where there can be more boilerplate than actual meaningful code:

sort(foo);
sort(foo.begin(), foo.end());