This is an archive of the discontinued LLVM Phabricator instance.

Make BitVector::operator== return false for different-sized vectors
ClosedPublic

Authored by bmoody on Mar 29 2020, 7:00 PM.

Details

Summary

This behaviour is in line with SmallBitVector and other vector-like types.

Diff Detail

Event Timeline

bmoody created this revision.Mar 29 2020, 7:00 PM
bmoody updated this revision to Diff 253477.Mar 29 2020, 8:12 PM

Run git-clang-format

It's helpful to have one patch per issue (instead of three in a single patch). One of the reasons is that in the commit history, it's unambiguous what diff caused which change.

bmoody updated this revision to Diff 253504.Mar 30 2020, 12:18 AM
bmoody retitled this revision from Fix bugs in BitVector and SmallBitVector to Make BitVector::operator== return false for different-sized vectors.
bmoody edited the summary of this revision. (Show Details)

Change this review to be only for the operator== fix

I'm wondering whether it would make more sense to make this an assertion instead. Comparing bit vectors of different size doesn't seem like a well-defined operation. (We do assert if you compare APInts of different sizes, for example).

Disclaimer: I don't have any particularly familiarity with this piece of code.

Meinersbur added a comment.EditedMar 30 2020, 4:44 AM

Since BitVector is supposed to resemble a std::vector<bool>, this is the behavior I'd expect.

APInts can be interpreted as either signed or unsigned, making comparison of different lengths undefined.

LGTM, but please wait a bit for other opinions before committing.

Agree in theory - but given the behavior has been different for quite a while, I think it might be worth a bit of an inventory of uses of BitVector and of BitVector's op== existing behavior - and maybe a rough explanation for why it's not problematic to change it in this way.

I've reviewed all the uses of BitVector::operator==/!= in llvm and clang by renaming the functions and checking all the code that failed to compile. As far as I can tell, all uses should be either unaffected by the change, or will be more correct with the change applied.


llvm/include/llvm/Analysis/TargetLibraryInfo.h:

bool areInlineCompatible(const TargetLibraryInfo &CalleeTLI,
                         bool AllowCallerSuperset) const {
  if (!AllowCallerSuperset)
    return OverrideAsUnavailable == CalleeTLI.OverrideAsUnavailable;
  BitVector B = OverrideAsUnavailable;
  B |= CalleeTLI.OverrideAsUnavailable;
  // We can inline if the union of the caller and callee's nobuiltin
  // attributes is no stricter than the caller's nobuiltin attributes.
  return B == OverrideAsUnavailable;
}

Not affected by the proposed change - all vectors have constant size NumLibFuncs.


llvm/lib/CodeGen/RegisterClassInfo.cpp:

const BitVector &RR = MF->getRegInfo().getReservedRegs();
if (Reserved.size() != RR.size() || RR != Reserved) {
  Update = true;
  Reserved = RR;
}

Not affected by the proposed change - code has an explicit size check.


llvm/lib/Transforms/Coroutines/CoroFrame.cpp:

Changed |= (S.Kills != SavedKills) || (S.Consumes != SavedConsumes);

if (S.Kills != SavedKills) {
  LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI->getName()
                    << "\n");
  LLVM_DEBUG(dump("S.Kills", S.Kills));
  LLVM_DEBUG(dump("SavedKills", SavedKills));
}
if (S.Consumes != SavedConsumes) {
  LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI << "\n");
  LLVM_DEBUG(dump("S.Consume", S.Consumes));
  LLVM_DEBUG(dump("SavedCons", SavedConsumes));
}

Not affected by the proposed change - all the vectors in this code have the same size. They're initialised above, in the same function:

// Initialize every block so that it consumes itself
for (size_t I = 0; I < N; ++I) {
  auto &B = Block[I];
  B.Consumes.resize(N);
  B.Kills.resize(N);
  B.Consumes.set(I);
}

llvm/tools/llvm-exegesis/lib/MCInstrDescView.cpp:

const BitVector *BitVectorCache::getUnique(BitVector &&BV) const {
  for (const auto &Entry : Cache)
    if (*Entry == BV)
      return Entry.get();
  Cache.push_back(std::make_unique<BitVector>());
  auto &Entry = Cache.back();
  Entry->swap(BV);
  return Entry.get();
}

Looks like the proposed change will fix a (probably latent) bug in this code. Currently it seems like this function could return a pointer to a BitVector with a different size than the one that was passed in, which would be surprising.


llvm/include/llvm/ADT/PackedVector.h:

bool operator==(const PackedVector &RHS) const {
  return Bits == RHS.Bits;
}

bool operator!=(const PackedVector &RHS) const {
  return Bits != RHS.Bits;
}

I think this is another latent bug. PackedVector::operator== doesn't appear to be used anywhere.

Thanks for this!

I've reviewed all the uses of BitVector::operator==/!= in llvm and clang by renaming the functions and checking all the code that failed to compile. As far as I can tell, all uses should be either unaffected by the change, or will be more correct with the change applied.


llvm/include/llvm/Analysis/TargetLibraryInfo.h:

bool areInlineCompatible(const TargetLibraryInfo &CalleeTLI,
                         bool AllowCallerSuperset) const {
  if (!AllowCallerSuperset)
    return OverrideAsUnavailable == CalleeTLI.OverrideAsUnavailable;
  BitVector B = OverrideAsUnavailable;
  B |= CalleeTLI.OverrideAsUnavailable;
  // We can inline if the union of the caller and callee's nobuiltin
  // attributes is no stricter than the caller's nobuiltin attributes.
  return B == OverrideAsUnavailable;
}

Not affected by the proposed change - all vectors have constant size NumLibFuncs.


llvm/lib/CodeGen/RegisterClassInfo.cpp:

const BitVector &RR = MF->getRegInfo().getReservedRegs();
if (Reserved.size() != RR.size() || RR != Reserved) {
  Update = true;
  Reserved = RR;
}

Not affected by the proposed change - code has an explicit size check.

Could you migrate this code to use just the operator in this change since the size check will now be redundant?


llvm/lib/Transforms/Coroutines/CoroFrame.cpp:

Changed |= (S.Kills != SavedKills) || (S.Consumes != SavedConsumes);

if (S.Kills != SavedKills) {
  LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI->getName()
                    << "\n");
  LLVM_DEBUG(dump("S.Kills", S.Kills));
  LLVM_DEBUG(dump("SavedKills", SavedKills));
}
if (S.Consumes != SavedConsumes) {
  LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI << "\n");
  LLVM_DEBUG(dump("S.Consume", S.Consumes));
  LLVM_DEBUG(dump("SavedCons", SavedConsumes));
}

Not affected by the proposed change - all the vectors in this code have the same size. They're initialised above, in the same function:

// Initialize every block so that it consumes itself
for (size_t I = 0; I < N; ++I) {
  auto &B = Block[I];
  B.Consumes.resize(N);
  B.Kills.resize(N);
  B.Consumes.set(I);
}

llvm/tools/llvm-exegesis/lib/MCInstrDescView.cpp:

const BitVector *BitVectorCache::getUnique(BitVector &&BV) const {
  for (const auto &Entry : Cache)
    if (*Entry == BV)
      return Entry.get();
  Cache.push_back(std::make_unique<BitVector>());
  auto &Entry = Cache.back();
  Entry->swap(BV);
  return Entry.get();
}

Looks like the proposed change will fix a (probably latent) bug in this code. Currently it seems like this function could return a pointer to a BitVector with a different size than the one that was passed in, which would be surprising.

I think this code is probably fine - all the BitVectors start by being initialized from "emptyRegisters" (pre-initialized with the necessary size) in exegesis::Instruction::create, I think.


llvm/include/llvm/ADT/PackedVector.h:

bool operator==(const PackedVector &RHS) const {
  return Bits == RHS.Bits;
}

bool operator!=(const PackedVector &RHS) const {
  return Bits != RHS.Bits;
}

I think this is another latent bug. PackedVector::operator== doesn't appear to be used anywhere.

Is it tested? Might be worth leaving a FIXME of "hey, this is untested code".

llvm/include/llvm/ADT/BitVector.h
542

Hmm - looking at this more closely (ie: I'm now actually looking at the code rather than just the commentary/discussion) this code makes me a bit uncomfortable - looks pretty intentionally designed to support potentially equal comparison of differently lengthed BitVectors. Is there any history of how/when this was added/potentially used?

Meinersbur added inline comments.Apr 7 2020, 11:51 AM
llvm/include/llvm/ADT/BitVector.h
542

It was introduced in 2007 by Chris Lattner in r42860 with the commit message "make bitvector &= do the right thing if vectors have mismatched length." and no other changes. Before that, it asserted on mismatching lengths.

By its name, it's a vector and a comparison that does something else than checking the same elements in the same order is confusing.

Meinersbur added inline comments.Apr 7 2020, 11:58 AM
llvm/include/llvm/ADT/BitVector.h
542

Sorry, looked at the wrong operator.

The commit was r42889 also by Chris Lattner in 2007. It's commit message is:

make operator== work with non-equal sized bitvectors, as long as the extra bits are all zeros. This allows "010" and "010000" to be treated as equal.

without any other changes or justifications. Before that, it returned false on different sizes.

Note that "010" and "010000" can be very different depending on its interpretation, e.g. the former represents the decimal number 2, the latter is 16.

bmoody added a comment.Apr 7 2020, 4:29 PM

Regarding PackedVector::operator==, this code is covered by the PackedVector unit test but not thoroughly enough to catch the bug.

And I'd be happy to remove the redundant size check in RegisterClassInfo.cpp - would that go in as part of this change or in a follow-up change?

Also note, that the implementation of BitVector's hash value is:

static unsigned getHashValue(const BitVector &V) {
  return DenseMapInfo<std::pair<unsigned, ArrayRef<uintptr_t>>>::getHashValue(
      std::make_pair(V.size(), V.getData()));
}

that is the current implementation may have different hashes for 'equal' BitVectors.

dblaikie accepted this revision.Apr 9 2020, 12:45 PM
dblaikie added a subscriber: lattner.

Also note, that the implementation of BitVector's hash value is:

static unsigned getHashValue(const BitVector &V) {
  return DenseMapInfo<std::pair<unsigned, ArrayRef<uintptr_t>>>::getHashValue(
      std::make_pair(V.size(), V.getData()));
}

that is the current implementation may have different hashes for 'equal' BitVectors.

Good point!

Yeah, combination of factors makes me feel like this is the best direction. (@lattner in case he has some deep memories of this place - but happy to let that be post-commit review given the current state of BitVector and its usage)

This revision is now accepted and ready to land.Apr 9 2020, 12:45 PM

I agree with this patch, we should definitely return non-equal for different length vectors! Thanks!

bmoody added a comment.Apr 9 2020, 4:11 PM

Great! I don't have commit access so please commit this on my behalf :)

This revision was automatically updated to reflect the committed changes.