This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Make llvm::is_contained call member `contains` or `find` when available
ClosedPublic

Authored by kuhar on Mar 14 2023, 9:15 AM.

Details

Summary

This makes it so that calling llvm::is_contained no longer degrades
performance over member contains, even though both have almost identical
names. This would be the case in most set/map classes that can check for
an element being present in O(1) or O(log n) time vs. linear scan with
std::find. For C++17 maps/sets without .contains, use .find when available,
falling back to a linear scan with std::find.

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains. This would also make some code fail
only when compiled in the C++20 mode when more container types come with
.contains member functions.

This was actually already the case with CommandLine.h calling is_contained
on SmallPtrSet and in a recent BOLT patch.

Diff Detail

Event Timeline

kuhar created this revision.Mar 14 2023, 9:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 9:15 AM
Herald added a subscriber: hanchung. · View Herald Transcript
kuhar requested review of this revision.Mar 14 2023, 9:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 9:15 AM
kuhar added a reviewer: beanz.Mar 14 2023, 9:15 AM
kuhar edited the summary of this revision. (Show Details)
dblaikie accepted this revision.Mar 14 2023, 9:47 AM

Looks good

This revision is now accepted and ready to land.Mar 14 2023, 9:47 AM
nikic added a subscriber: nikic.Mar 14 2023, 9:50 AM

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains.

I think I would prefer this option, so we keep the ability to distinguish O(1) contains from O(n) is_contained at a glance.

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains.

I think I would prefer this option, so we keep the ability to distinguish O(1) contains from O(n) is_contained at a glance.

Yeah, that's fair - I'm OK with that too/see the value there. Only issue is in generic code where you might not know the type of a container.

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains.

I think I would prefer this option, so we keep the ability to distinguish O(1) contains from O(n) is_contained at a glance.

Yeah, that's fair - I'm OK with that too/see the value there. Only issue is in generic code where you might not know the type of a container.

I'd certainly discourage uses of is_contained where the type is known to have a contains member function - maybe the generic code cases are so rare they're not worth leaving this generic functionality in where it's more likely to reduce readability in the non-generic cases.

kuhar added a comment.EditedMar 14 2023, 10:19 AM

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains.

I think I would prefer this option, so we keep the ability to distinguish O(1) contains from O(n) is_contained at a glance.

The way I was thinking about this is that usually the C++ standard specifies algorithms in terms of worst-case complexity and anything better is a bonus. In a completely generic context, as a user of is_contained my expectation would be that it's O(n) or better.

This situation is a bit more complicated because C++20 is added member contains functions to (unordered_)set/map, so a static assertion would cause compilation failures with some compiler flags and not the other ones. IMO this is not a huge deal but probably annoying because I'd expect the Hyrum law to apply and API usages of is_contained with these containers to creep in over time. The current patch makes this API instability problem better , as there'll be no need to have ways to essentially do the same thing.

Another thing I intended to change was to make llvm::find call member find when present. This would make it and also llvm::is_contained O(1)/O(log n) even for types that do not expose .contains today. The alternative would be to disallow llvm::find over container with member find as well, but this doesn't seem like the best choice for me.

This is also consistent with how new C++ range functions are defined, e.g., https://en.cppreference.com/w/cpp/ranges/size that calls member .size when available and does iterator subtraction otherwise.

I also considered detecting member contains and triggering a
static_assert instead, but decided against it because it's just as easy
to do the right thing and call .contains.

I think I would prefer this option, so we keep the ability to distinguish O(1) contains from O(n) is_contained at a glance.

The way I was thinking about this is that usually the C++ standard specifies algorithms in terms of worst-case complexity and anything better is a bonus. In a completely generic context, as a user of is_contained my expectation would be that it's O(n) or better.

This situation is a bit more complicated because C++20 is added member contains functions to (unordered_)set/map, so a static assertion would cause compilation failures with some compiler flags and not the other ones. IMO this is not a huge deal but probably annoying because I'd expect the Hyrum law to apply and API usages of is_contained with these containers to creep in over time. The current patch makes this API instability problem better , as there'll be no need to have ways to essentially do the same thing.

Yeah, that'd be unfortunate indeed.

Another thing I intended to change was to make llvm::find call member find when present. This would make it and also llvm::is_contained O(1)/O(log n) even for types that do not expose .contains today. The alternative would be to disallow llvm::find over container with member find as well, but this doesn't seem like the best choice for me.

This one is trickier for me - llvm::is_contained doesn't have a nearby standard equivalent ,so we can choose its behavior more flexibly. But std::find(iter, iter, value) exists in the standard library and there's value in llvm::find(range, value) being pretty close to the same/identical to the standard version - which doesn't optimize to member-find when available.

I would be inclined not to include a member-find optimization in llvm::find for that reason.

I'm undecided about whether that tips in favor of not having member-contains in llvm::is_contained.

This is also consistent with how new C++ range functions are defined, e.g., https://en.cppreference.com/w/cpp/ranges/size that calls member .size when available and does iterator subtraction otherwise.

Fair - that a bunch of free function ranges functions are like this, delegating to members where available maybe tilts things in that direction.

kuhar added a comment.Mar 14 2023, 1:36 PM

This one is trickier for me - llvm::is_contained doesn't have a nearby standard equivalent ,so we can choose its behavior more flexibly. But std::find(iter, iter, value) exists in the standard library and there's value in llvm::find(range, value) being pretty close to the same/identical to the standard version - which doesn't optimize to member-find when available.

I would be inclined not to include a member-find optimization in llvm::find for that reason.

I'm undecided about whether that tips in favor of not having member-contains in llvm::is_contained.

Then I propose we frame it like this: llvm::is_contained is the (llvm-only) canonical/generic way to check if an element is in the range. We make it 'do the right thing' by default and use .contains, .find, and std::find when available, in this order of preference. This will make it work with C++17 containers that do not have .contains, llvm and C++20+ containers with .contains, and any other range via a linear scan. WDYT?

kazu added a comment.Mar 14 2023, 2:05 PM

Then I propose we frame it like this: llvm::is_contained is the (llvm-only) canonical/generic way to check if an element is in the range.

How about something like the (llvm-only) canonical/generic way to check if an element is in the range if a template needs to work with containers that may or may not have the .contains method (like an array and a set).

I'm saying this because I personally find Foo.contains(X) a little more readable than llvm::is_contained(Foo, X), and I would like our code base to look friendly to people new to the LLVM project (with the use of function names present in C++ standards) wherever reasonable to do so. (Sure, std::set::contains is very new, but I expect people to become familiar with this over time.)

kuhar added a comment.Mar 14 2023, 2:59 PM

Then I propose we frame it like this: llvm::is_contained is the (llvm-only) canonical/generic way to check if an element is in the range.

How about something like the (llvm-only) canonical/generic way to check if an element is in the range if a template needs to work with containers that may or may not have the .contains method (like an array and a set).

I'm saying this because I personally find Foo.contains(X) a little more readable than llvm::is_contained(Foo, X), and I would like our code base to look friendly to people new to the LLVM project (with the use of function names present in C++ standards) wherever reasonable to do so. (Sure, std::set::contains is very new, but I expect people to become familiar with this over time.)

Sure, I agree with that. If I have map with a known type like DenseMap I'd rather call .contains(x) directly too. But this is not possible with C++17 containers std::, so I'd argue that until then is_contained(container, x) is more readable than something like container.find(x) != container.end().

nikic added a comment.Mar 14 2023, 3:18 PM

Sure, I agree with that. If I have map with a known type like DenseMap I'd rather call .contains(x) directly too. But this is not possible with C++17 containers std::, so I'd argue that until then is_contained(container, x) is more readable than something like container.find(x) != container.end().

I don't think we should promote usage of is_contained() for anything but the linear scan case. I don't want to be left guessing whether this is_contained() is a linear scan, or just a slightly different way to write container.find().

kuhar added a comment.Mar 14 2023, 3:27 PM

Sure, I agree with that. If I have map with a known type like DenseMap I'd rather call .contains(x) directly too. But this is not possible with C++17 containers std::, so I'd argue that until then is_contained(container, x) is more readable than something like container.find(x) != container.end().

I don't think we should promote usage of is_contained() for anything but the linear scan case. I don't want to be left guessing whether this is_contained() is a linear scan, or just a slightly different way to write container.find().

I can see this being prefered from the perspective of an experience llvm developer, but on the other hand I saw a lot of code that ended up using is_contained and llvm::find with map/set types. I found fixing those inefficiencies and re-explaining this to folks needlessly time consuming. I think defaulting to the more efficient implementation of containment check is the way to go to avoid those non-obvious performance issues, and the member functions we would delegate to are not surprising at all.

kuhar updated this revision to Diff 505300.Mar 14 2023, 3:29 PM

Delegate to .find when .contains is not available. Improve function documentation.

kuhar updated this revision to Diff 505301.Mar 14 2023, 3:31 PM

Fix a typo

kuhar added a comment.Mar 14 2023, 3:40 PM

Here's an example that landed minutes ago in the tree: https://reviews.llvm.org/D145464

Amir added a subscriber: Amir.EditedMar 14 2023, 5:01 PM

I’m fully supportive of this change. BOLT uses this interface which helps maintainabilty as it clearly states the intent and is robust wrt container type change. It’s a clear win to also have the same runtime characteristics as find.

MaskRay accepted this revision.Mar 14 2023, 5:19 PM

Sure, I agree with that. If I have map with a known type like DenseMap I'd rather call .contains(x) directly too. But this is not possible with C++17 containers std::, so I'd argue that until then is_contained(container, x) is more readable than something like container.find(x) != container.end().

I don't think we should promote usage of is_contained() for anything but the linear scan case. I don't want to be left guessing whether this is_contained() is a linear scan, or just a slightly different way to write container.find().

I can see this being prefered from the perspective of an experience llvm developer, but on the other hand I saw a lot of code that ended up using is_contained and llvm::find with map/set types. I found fixing those inefficiencies and re-explaining this to folks needlessly time consuming. I think defaulting to the more efficient implementation of containment check is the way to go to avoid those non-obvious performance issues, and the member functions we would delegate to are not surprising at all.

Agreed with the analysis.

llvm/unittests/ADT/STLExtrasTest.cpp
1039

Add a find to show that contains is preferred.

kuhar updated this revision to Diff 505338.Mar 14 2023, 5:40 PM
kuhar marked an inline comment as done.

Update test to check that .contains(x) is preferred over .find(x) when both are available.
Use std::find instead of llvm::find to make the behavior as clear as possible.

kuhar retitled this revision from [ADT] Make llvm::is_contained call member `contains` when available to [ADT] Make llvm::is_contained call member `contains` or `find` when available.Mar 14 2023, 5:44 PM
kuhar edited the summary of this revision. (Show Details)
nikic added a comment.Mar 15 2023, 1:36 AM

Here's an example that landed minutes ago in the tree: https://reviews.llvm.org/D145464

Oh wow, that's terrible. I guess I withdraw my objection then, I didn't realize that people don't know what is_contained() does.

kazu accepted this revision.Mar 15 2023, 8:54 AM

LGTM. I like the fallback mechanism. We prefer to use X.contains(), but if that's not possible (or people just randomly call llvm::is_contained), llvm::is_contained is there to do the right thing for you.

I ran into a fun thing with this.
In my downstream backend we have a class where we apparently implemented "contains" by calling "llvm::is_contained".
With this patch that's not a very good idea as it then just calls itself over and over again.

Just a heads up in case someone else sees a hanging compiler and bisects back to this patch.

bjope added a subscriber: bjope.Mar 16 2023, 6:58 AM