This is an archive of the discontinued LLVM Phabricator instance.

Fix bug in BitVector and SmallBitVector DenseMap hashing
ClosedPublic

Authored by bmoody on Mar 30 2020, 12:38 AM.

Details

Summary

BitVectors and SmallBitVectors with equal contents but different capacities were getting different hashes.

Diff Detail

Event Timeline

bmoody created this revision.Mar 30 2020, 12:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2020, 12:38 AM
Herald added a subscriber: dexonsmith. · View Herald Transcript
bmoody updated this revision to Diff 257585.Apr 14 2020, 6:47 PM
bmoody retitled this revision from Fix bug in BitVector DenseMap hashing to Fix bug in BitVector and SmallBitVector DenseMap hashing.
bmoody edited the summary of this revision. (Show Details)
bmoody added a reviewer: aganea.

Include fix for SmallBitVector hashing

Thanks for fixing this!

llvm/include/llvm/ADT/SmallBitVector.h
671

I didn't want to be too intrusive when I did it. Do you think we could do:

ArrayRef<uintptr_t> getData(uintptr_t &Store) const {
  if (!isSmall())
    return getPointer()->getData();
  Store = getSmallBits();
  return makeArrayRef(Store);
}
721

..and only do V.getData(Store) here with a definition above.

llvm/unittests/ADT/BitVectorTest.cpp
1234

Can you possibly add a non-TYPED_TEST to compare a BitVector and a SmallBitVector, to ensure hashes are the same for a given content?

bmoody updated this revision to Diff 258466.Apr 17 2020, 6:23 PM

Make suggested changes to SmallBitVector, un-deleting SmallBitVector::getData.

I'm unsure about adding a test that BitVector and SmallBitVector hash exactly the same way, because in principle I think this is a consequence of implementation details and not a property that we actually want per se. In future the implementations might diverge such that they must hash differently (or at least, that it would be convenient for them to hash differently), and IMO that would be ok. But I'm willing to add the test if you disagree.

aganea accepted this revision.Apr 17 2020, 7:07 PM

In my sense a hash is a function of the represented data, not of the implementation. I feel that they should both resolve to the same hash, but at the same time I don't have practical example for this. Let's leave this open, until there's a practical application.

Thanks again! LTGM.

This revision is now accepted and ready to land.Apr 17 2020, 7:07 PM

I'm unsure about adding a test that BitVector and SmallBitVector hash exactly the same way, because in principle I think this is a consequence of implementation details and not a property that we actually want per se. In future the implementations might diverge such that they must hash differently (or at least, that it would be convenient for them to hash differently), and IMO that would be ok. But I'm willing to add the test if you disagree.

[Drive by and meant as a generic comment] If the test states this concern it is still valuable to have it. Once the implementations diverge the test fails, the concern is rediscovered, and the author can decide if the divergence was intentional/good or accidental/bad. In the former case the test might be removed but it at least protects against the accident case.

Great! I don't have commit access so please commit this on my behalf. Thanks!

D77027 made BitVector::operator== return false for BitVectors of different size, one of the argument being that the hashes of different sizes are different, the other being that a vector, like std::vector return false for containing a different number of elements.

With that change, the hash can be different and since its a SmallBitVector I'd suggest to do the same.

@Meinersbur I think you may have misunderstood this change - this change is to make hashing insensitive to capacity, not size. Both were already sensitive to size and still are after this change (and I agree that this is the correct behaviour). Please correct me if I've misunderstood your comment.

You are right, I missed it's to make it capacity-independent.

I am going to commit for you...

This revision was automatically updated to reflect the committed changes.