This is an archive of the discontinued LLVM Phabricator instance.

Go back to using DenseMapInfo<T>::isEqual instead of operator== in SmallSetVector
Needs ReviewPublic

Authored by etcwilde on Jul 21 2023, 4:05 PM.

Details

Summary

SmallSetVector used to use find and insert from from the DenseSet (using DenseMapInfo) to determine equality. Some recent changes (https://reviews.llvm.org/D152497) added a call to std::find, resulting in the use of operator==, which changes semantic behavior of the set. This change changes it back to use DenseMapInfo<value_type>::isEqual to determine equality without requiring operator== be implemented on the type.

Diff Detail

Event Timeline

etcwilde created this revision.Jul 21 2023, 4:05 PM
etcwilde requested review of this revision.Jul 21 2023, 4:05 PM
etcwilde added inline comments.Jul 21 2023, 4:33 PM
llvm/include/llvm/ADT/SetVector.h
276

We can probably reduce this entire function down to a call to contains since that returns a boolean and this only returns 0 or 1. Thoughts?

efriedma requested changes to this revision.Jul 21 2023, 4:42 PM
efriedma added a subscriber: efriedma.

Hash tables don't use hash-value equality; hash collisions are a thing.

This revision now requires changes to proceed.Jul 21 2023, 4:42 PM

Adding onto what @efriedma said, if you do want to use hash-value equality I would suggest having value equality as well, chained using &&. Given that the "small" functionality is meant to prevent us from having to compute a hash value in the first place, I would also suggest you to measure this change on the compile time tracker, although I don't expect it to make the performance better especially because this is quite a performance-sensitive part of code...

llvm/include/llvm/ADT/SetVector.h
276

That makes sense to me, the reason I didn't do that was to keep the diff as insertions-only and the name count also confused me.

Sorry, should have used the the in the DenseMapInfo<T>::isEqual.

So for background, the issue that I'm running into right now is two-fold, this container used to accept types that didn't necessarily have equality (operator==) (except as defined by the DenseMapInfo), which made it very useful for storing unique non-equatable things like non-canonical types in a type system. This change means that that code doesn't compile anymore. Second, the DenseMapInfo<T>::isEqual and DenseMapInfo<T>::getHashValue from the DenseMapInfo used before aren't necessarily the same as T::operator==, which can result in different outcomes.

At least as far as I can tell, DenseMapInfo<T>::isEqual in find_if should result in the same result as before. Is there an issue with using going back to DenseMapInfo<T>::isEqual over T::operator==?

nikic added a comment.Jul 22 2023, 1:01 AM

SetVector is templated over the set type, so using DenseMapInfo is not great.

Sorry, should have used the the in the DenseMapInfo<T>::isEqual.

So for background, the issue that I'm running into right now is two-fold, this container used to accept types that didn't necessarily have equality (operator==) (except as defined by the DenseMapInfo), which made it very useful for storing unique non-equatable things like non-canonical types in a type system. This change means that that code doesn't compile anymore. Second, the DenseMapInfo<T>::isEqual and DenseMapInfo<T>::getHashValue from the DenseMapInfo used before aren't necessarily the same as T::operator==, which can result in different outcomes.

My 2c here is that if you are implementing DenseMapInfo for a type, it needs to match operator== behavior. The only differences it should have from operator== is potentially different handling for empty / tombstone keys.

If you want to use a DenseMap with a different comparison criterion (or over a type that is not inherently comparable), then you should not implement DenseMapInfo for that type, but instead provide a separate DenseMapInfo implementation.

For example, see https://github.com/llvm/llvm-project/blob/c51c607dfc6388050ed1381a32089cd08a36ff9a/clang/include/clang/Sema/Weak.h#L37, which defines DenseMapInfoByAliasOnly for a type that is intentionally not comparable. You can see how it gets use (incidentally inside a SetVector) here: https://github.com/llvm/llvm-project/blob/c51c607dfc6388050ed1381a32089cd08a36ff9a/clang/include/clang/Sema/Sema.h#L1128 This sounds like it would be the proper thing to do in your case.

However, you can also just disable the optimization by passing N = 0 to SetVector (which is the default if you're not going through SmallSetVector). Using SetVector<T, SmallVector<T, N>, SmallDenseSet<T, N>> should give you exactly the same behavior as previously.

etcwilde updated this revision to Diff 543231.Jul 22 2023, 2:56 PM

Updating to use the DenseMapInfo implementation of isEqual instead of the hash value to ensure we're using the same semantics that DenseMap/DenseSet use to determine equality.

etcwilde updated this revision to Diff 543233.Jul 22 2023, 3:54 PM
etcwilde edited the summary of this revision. (Show Details)

Adding test that fails to build with current implementation that used to build with original SetVector insert and contains implementation.

etcwilde retitled this revision from Use `getHashValue` in `SetVector::insert` and `SetVector::contains` to Go back to using DenseMapInfo<T>::isEqual instead of operator== in SmallSetVector.Jul 22 2023, 3:55 PM
etcwilde edited the summary of this revision. (Show Details)
efriedma resigned from this revision.Jul 22 2023, 11:15 PM

Not sure the current version addresses comments from @nikic