BitVectors and SmallBitVectors with equal contents but different capacities were getting different hashes.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for fixing this!
llvm/include/llvm/ADT/SmallBitVector.h | ||
---|---|---|
671–675 | I didn't want to be too intrusive when I did it. Do you think we could do: ArrayRef<uintptr_t> getData(uintptr_t &Store) const { if (!isSmall()) return getPointer()->getData(); Store = getSmallBits(); return makeArrayRef(Store); } | |
721 | ..and only do V.getData(Store) here with a definition above. | |
llvm/unittests/ADT/BitVectorTest.cpp | ||
1209 | Can you possibly add a non-TYPED_TEST to compare a BitVector and a SmallBitVector, to ensure hashes are the same for a given content? |
I'm unsure about adding a test that BitVector and SmallBitVector hash exactly the same way, because in principle I think this is a consequence of implementation details and not a property that we actually want per se. In future the implementations might diverge such that they must hash differently (or at least, that it would be convenient for them to hash differently), and IMO that would be ok. But I'm willing to add the test if you disagree.
In my sense a hash is a function of the represented data, not of the implementation. I feel that they should both resolve to the same hash, but at the same time I don't have practical example for this. Let's leave this open, until there's a practical application.
Thanks again! LTGM.
[Drive by and meant as a generic comment] If the test states this concern it is still valuable to have it. Once the implementations diverge the test fails, the concern is rediscovered, and the author can decide if the divergence was intentional/good or accidental/bad. In the former case the test might be removed but it at least protects against the accident case.
D77027 made BitVector::operator== return false for BitVectors of different size, one of the argument being that the hashes of different sizes are different, the other being that a vector, like std::vector return false for containing a different number of elements.
With that change, the hash can be different and since its a SmallBitVector I'd suggest to do the same.
@Meinersbur I think you may have misunderstood this change - this change is to make hashing insensitive to capacity, not size. Both were already sensitive to size and still are after this change (and I agree that this is the correct behaviour). Please correct me if I've misunderstood your comment.
You are right, I missed it's to make it capacity-independent.
I am going to commit for you...
I didn't want to be too intrusive when I did it. Do you think we could do: