As PR56606 stated, the current implementation of CityHash in libc++
would drop some bits unintentionally. Cast the 32bit int to the 64bit
int to avoid this happened.
Details
- Reviewers
EricWF ldionne - Group Reviewers
Restricted Project - Commits
- rG0da59bb8650a: [libc++] Fix wrong implementation of CityHash
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
As mentioned in PR56606, CityHash seems not being actively maintained anymore.
Do we want to change our hash implementation to FarmHash or other hash functions?
Not sure whether this is a good place for discussion, though.
I'm not familiar with this code, but I find it concerning that the change causes no tests to fail.
I leave the review to somebody more familiar with this code.
I had a short look at FarmHash. Judging by the bugtracker it doesn't seem to be maintained either.
Not sure whether this is a good place for discussion, though.
#libcxx on Discord is the best place.
I'm not familiar with this code, but I find it concerning that the change causes no tests to fail.
For the tests, it's also possible to add some hash quality tests.
To my knowledge, most of the hash algorithms would provide some kind of hash quality test to convince the user of its quality.
For example, CityHash and FarmHash both use SMHasher to ensure quality. (though it's also not actively maintained now)
Therefore, we can add the referenced test once we decided the algorithm.
I think this is really tricky to change, since we need to remain ABI compatible with code previously compiled. In other words, let's say an already compiled library creates an unordered_map with the old hash function and passes that to a program compiled with a new libc++ (with the new hashing function): the newly compiled program would have to be able to manipulate the old unordered_map. It being slower is probably reasonable, but it would have to function correctly.
We could introduce a new hashing algorithm in our unstable ABI, but I think we should still fix this issue with our existing CityHash algorithm.
@oToToT are you able to add a test that would fail with our current (buggy) implementation? https://github.com/llvm/llvm-project/issues/56606 seems to have a reproducer.
Besides using a new hash algorithm, would we want to upgrade our CityHash version to v1.1 [1]?
The v1.1 version claims to have a better hash quality. (I haven't test it though)
And, IIUC, upgrading to v1.1 wouldn't change the ABI.
[1] https://github.com/google/cityhash/commit/0469694b4609253742bcd3008c0b3f3ba9fcd9ba
Yes, that might be interesting, however we'd need to analyze whether we still find elements in an unordered_map when we change the hash in that way. Technically, changing the hash means that we'd violate http://eel.is/c++draft/hash.requirements#2: if one part of the program has the old CityHash implementation inlined and another part of the program has the new CityHash implementation, then all evaluations of h(k) might not yield the same result. We'd have to analyze whether that is benign UB or actual UB that's going to cause crashes and whatnot.
In all cases, we could find the best hash implementation out there and use that for ABI v2.
libcxx/test/libcxx/utilities/utility/__murmur2_or_cityhash.pass.cpp | ||
---|---|---|
10 | I assume this test fails without your change? Edit: I checked locally and it does, great. |
Oh, also, please make sure to fix the CI on AIX before committing this.
Thanks for the patch!
Technically, changing the hash means that we'd violate http://eel.is/c++draft/hash.requirements#2
Yes, but actually we're also now changing the hash value in this patch. Thus, I think upgrading CityHash might not be that harmful if we can do it in the same release (i.e. libc++ 16.0.0).
On the other side, if we have some concerns about changing the hash value, maybe I also need to add some notice for this patch somewhere?
Oh, also, please make sure to fix the CI on AIX before committing this.
Hmm, I'm still thinking why this break the AIX CI. I'll fix it before I commit this.
Upd: Oh, I know the problem.
We ran into an issue while deploying this. This is technically an ABI break as we had noted. I realize I had underestimated the potential for this biting users -- we had users where a std::unordered_map was obtained from a different dylib compiled against old libc++ and they were unable to find a key in code built against new libc++ headers.
I think we'll need to guard this under ABI v2, unfortunately.
@ldionne Am I right that I should revert this commit now?
I could guard this under ABI v2, too.
However, I think I could also do a little survey on hash functions and send a patch to discuss which one we should choose.
WDYT?
No, you're good. I'll handle everything, but thanks a lot for the suggestion.
I could guard this under ABI v2, too.
However, I think I could also do a little survey on hash functions and send a patch to discuss which one we should choose.
WDYT?
I agree, under ABI v2 we can select an arbitrary hash function, so let's select the best one. For the time being that will be the fixed CityHash, but we can use anything else. I suggest you do that on top of my patch that introduces the ABI v2 switch.
I assume this test fails without your change?
Edit: I checked locally and it does, great.