This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix wrong implementation of CityHash
ClosedPublic

Authored by oToToT on Sep 18 2022, 12:12 AM.

Details

Reviewers
EricWF
ldionne
Group Reviewers
Restricted Project
Commits
rG0da59bb8650a: [libc++] Fix wrong implementation of CityHash
Summary

As PR56606 stated, the current implementation of CityHash in libc++
would drop some bits unintentionally. Cast the 32bit int to the 64bit
int to avoid this happened.

Diff Detail

Event Timeline

oToToT created this revision.Sep 18 2022, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 12:12 AM
oToToT requested review of this revision.Sep 18 2022, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 12:12 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

As mentioned in PR56606, CityHash seems not being actively maintained anymore.
Do we want to change our hash implementation to FarmHash or other hash functions?

Not sure whether this is a good place for discussion, though.

I'm not familiar with this code, but I find it concerning that the change causes no tests to fail.
I leave the review to somebody more familiar with this code.

As mentioned in PR56606, CityHash seems not being actively maintained anymore.
Do we want to change our hash implementation to FarmHash or other hash functions?

I had a short look at FarmHash. Judging by the bugtracker it doesn't seem to be maintained either.

Not sure whether this is a good place for discussion, though.

#libcxx on Discord is the best place.

I'm not familiar with this code, but I find it concerning that the change causes no tests to fail.

For the tests, it's also possible to add some hash quality tests.
To my knowledge, most of the hash algorithms would provide some kind of hash quality test to convince the user of its quality.
For example, CityHash and FarmHash both use SMHasher to ensure quality. (though it's also not actively maintained now)
Therefore, we can add the referenced test once we decided the algorithm.

As mentioned in PR56606, CityHash seems not being actively maintained anymore.
Do we want to change our hash implementation to FarmHash or other hash functions?

Not sure whether this is a good place for discussion, though.

I think this is really tricky to change, since we need to remain ABI compatible with code previously compiled. In other words, let's say an already compiled library creates an unordered_map with the old hash function and passes that to a program compiled with a new libc++ (with the new hashing function): the newly compiled program would have to be able to manipulate the old unordered_map. It being slower is probably reasonable, but it would have to function correctly.

We could introduce a new hashing algorithm in our unstable ABI, but I think we should still fix this issue with our existing CityHash algorithm.

@oToToT are you able to add a test that would fail with our current (buggy) implementation? https://github.com/llvm/llvm-project/issues/56606 seems to have a reproducer.

oToToT updated this revision to Diff 461290.Sep 19 2022, 11:48 AM

Add basic test cases.

Besides using a new hash algorithm, would we want to upgrade our CityHash version to v1.1 [1]?
The v1.1 version claims to have a better hash quality. (I haven't test it though)
And, IIUC, upgrading to v1.1 wouldn't change the ABI.

[1] https://github.com/google/cityhash/commit/0469694b4609253742bcd3008c0b3f3ba9fcd9ba

ldionne accepted this revision.Sep 20 2022, 3:01 PM

Besides using a new hash algorithm, would we want to upgrade our CityHash version to v1.1 [1]?
The v1.1 version claims to have a better hash quality. (I haven't test it though)
And, IIUC, upgrading to v1.1 wouldn't change the ABI.

[1] https://github.com/google/cityhash/commit/0469694b4609253742bcd3008c0b3f3ba9fcd9ba

Yes, that might be interesting, however we'd need to analyze whether we still find elements in an unordered_map when we change the hash in that way. Technically, changing the hash means that we'd violate http://eel.is/c++draft/hash.requirements#2: if one part of the program has the old CityHash implementation inlined and another part of the program has the new CityHash implementation, then all evaluations of h(k) might not yield the same result. We'd have to analyze whether that is benign UB or actual UB that's going to cause crashes and whatnot.

In all cases, we could find the best hash implementation out there and use that for ABI v2.

libcxx/test/libcxx/utilities/utility/__murmur2_or_cityhash.pass.cpp
10

I assume this test fails without your change?

Edit: I checked locally and it does, great.

This revision is now accepted and ready to land.Sep 20 2022, 3:01 PM

Oh, also, please make sure to fix the CI on AIX before committing this.

Thanks for the patch!

oToToT added a comment.EditedSep 21 2022, 8:37 PM

Technically, changing the hash means that we'd violate http://eel.is/c++draft/hash.requirements#2

Yes, but actually we're also now changing the hash value in this patch. Thus, I think upgrading CityHash might not be that harmful if we can do it in the same release (i.e. libc++ 16.0.0).

On the other side, if we have some concerns about changing the hash value, maybe I also need to add some notice for this patch somewhere?

Oh, also, please make sure to fix the CI on AIX before committing this.

Hmm, I'm still thinking why this break the AIX CI. I'll fix it before I commit this.
Upd: Oh, I know the problem.

oToToT updated this revision to Diff 462261.Sep 22 2022, 12:12 PM

Fix the endianess issue of the test.

This revision was automatically updated to reflect the committed changes.

We ran into an issue while deploying this. This is technically an ABI break as we had noted. I realize I had underestimated the potential for this biting users -- we had users where a std::unordered_map was obtained from a different dylib compiled against old libc++ and they were unable to find a key in code built against new libc++ headers.

I think we'll need to guard this under ABI v2, unfortunately.

oToToT added a comment.EditedFeb 9 2023, 12:10 PM

@ldionne Am I right that I should revert this commit now?

I could guard this under ABI v2, too.
However, I think I could also do a little survey on hash functions and send a patch to discuss which one we should choose.
WDYT?

@ldionne Am I right that I should revert this commit now?

No, you're good. I'll handle everything, but thanks a lot for the suggestion.

I could guard this under ABI v2, too.
However, I think I could also do a little survey on hash functions and send a patch to discuss which one we should choose.
WDYT?

I agree, under ABI v2 we can select an arbitrary hash function, so let's select the best one. For the time being that will be the fixed CityHash, but we can use anything else. I suggest you do that on top of my patch that introduces the ABI v2 switch.

oToToT added a comment.Feb 9 2023, 1:07 PM

Got it. Thanks for your work!