This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add CoalescingBitVector, implemented using IntervalMap [1/3]
ClosedPublic

Authored by vsk on Feb 21 2020, 12:13 PM.

Details

Summary

Add CoalescingBitVector to ADT. This is part 1 of a 3-part series to
address a compile-time explosion issue in LiveDebugValues.


CoalescingBitVector is a bitvector that, under the hood, relies on an
IntervalMap to coalesce elements into intervals.

CoalescingBitVector efficiently represents sets which predominantly
contain contiguous ranges (e.g. the VarLocSets in LiveDebugValues,
which are very long sequences that look like {1, 2, 3, ...}). OTOH,
CoalescingBitVector isn't good at representing sets with lots of gaps
between elements. The first N coalesced intervals of set bits are stored
in-place (in the initial heap allocation).

Compared to SparseBitVector, CoalescingBitVector offers more predictable
performance for non-sequential find() operations. This provides a
crucial speedup in LiveDebugValues.

Diff Detail

Event Timeline

vsk created this revision.Feb 21 2020, 12:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2020, 12:13 PM
vsk added a project: debug-info.
vsk updated this revision to Diff 245967.Feb 21 2020, 12:47 PM

Improve comments.

Nice, I can think of a few other places where this could prove useful (the register coalescing code for variable locations for starters). Some comments inline, plus

  • |= / set union could do with being covered in the unittests too,
  • Maybe some allocations could be avoided by using things like IntervalMap iterator setStartUnchecked and similar, but this looks good regardless.

It looks like setting already-set bits and resetting not-set bits isn't a supported operation. set indicates not to do that in the doc comment, reset even goes as far as to assert-fail if the wrong interval is looked up. These are two things that, without any additional context, I would expect something acting like a bitvector to support, it's quite a common operation on flags integers for example, and these things are supported by BitVector and SparseBitVector. Are there technical reasons for these limits, or is it to discourage CoalescingBitVector from being used in scenarios where it won't be useful, or otherwise?

I see that the implementation of operator|= follows the same code path as plain set, i.e. insert(a, b), making it possible that operator|= will set already-set bits. Is that a supported operation for operator|= but not for set, or am I missing something?

llvm/include/llvm/ADT/CoalescingBitVector.h
30

Could we get a

static_assert(std::is_integral<EltT>::value, "Key must be an integer type.");

Plus documentation of the EltT type parameter -- if I understand correctly, it's the storage type for bitvector elements, that's used as the domain type for the interval map?

55–59

I'm not familiar with LLVMs approach to allocators, but is it alright that the RHS allocator isn't carried over into this? This happens for the rvalue assignment operator below.

81

Nit, can this be a range-based loop?

231–232

Just so that I understand: advanceTo is aiming for a small set of use cases (i.e.: just find), and these assertions are to discourage anyone from using advanceTo for other purposes?

Is there any kind of "These are the ADT types in LLVM and when to use them" .rst document that this should be added to?

llvm/include/llvm/ADT/CoalescingBitVector.h
29

Given that this is a bitvector, I'm kind of wondering what EltT is supposed to be (bool?) Can you add that to the class comment?

38

Ah.. it's the *Index* type? Perhaps that's a better name?

79

Is this consistent with how the other bitvector types are naming this function? Then that's fine. In DenseSet, count(Elt) is a contains(Elt) function and I don't know if we want to be consistent with that.

82

So empty ranges are disallowed? Should we assert that? Or is that too slow?

142

document \return

297

Really? If anything should this be part of the dump method instead?

Is there any kind of "These are the ADT types in LLVM and when to use them" .rst document that this should be added to?

http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

vsk updated this revision to Diff 246287.Feb 24 2020, 1:09 PM
vsk marked 15 inline comments as done.

Address review feedback:

  • Describe the new container in the programmer's manual.
  • Relax reset. This can now be used to reset an already-unset bit.
  • Relax operator|=. This can now be used to perform set-union when the sets overlap.
  • Add suggested tests and address various other bits of feedback.
vsk added inline comments.Feb 24 2020, 1:11 PM
llvm/include/llvm/ADT/CoalescingBitVector.h
55–59

initFromRHS does a copy into a pre-existing IntervalMap, which has already been initialized with this->Alloc. I don't think changing the Alloc would be correct, as the lifetime of RHS.Alloc may end before the lifetime of this->Alloc.

79

Yeah, SparseBitVector::count() also counts the number of set bits.

81

I don't think so, as we'd be iterating over the unused mapped values in that case.

82

There isn't a way to construct an empty interval, AFAICT.

142

It looks like the return value from intersectWithComplement is generally ignored, so I've simply removed it.

231–232

Yes, exactly. Actually the OffsetIntoMapIterator == kIteratorAtTheEndOffset check is dead code, let me remove that.

297

Yeah, while debugging this p BV.dump() would result in:

(lldb) p BV.dump()
(lldb)1]}

or something. Incidentally, the same thing happens when doing p SM.dump(SourceLoc) in clang. I filed llvm.org/PR45011.

vsk added a comment.Feb 24 2020, 1:19 PM

@jmorse set doesn't support setting already-set bits in order to discourage this sort of usage of the container. The reason for the behavior difference from BitVector/SparseBitVector is that the cost of a second IntervalMap find() operation is relatively high (compared to, say, indexing into an array). I've tried to document this and provide alternatives, like operator|= and the newly-added test_and_set method.

aprantl added inline comments.Feb 24 2020, 4:33 PM
llvm/docs/ProgrammersManual.rst
2347 ↗(On Diff #246287)

Even with the next paragraph I find anything but "three" confusing here given that the headline also mentions three.

llvm/include/llvm/ADT/CoalescingBitVector.h
110

Why not move the implementation of initFromRHS here and call set() everywhere where initFromRHS is called now? It feels weird to refer to an RHS in something that isn't an operator.

vsk updated this revision to Diff 246344.Feb 24 2020, 4:54 PM

Address Adrian's comments.

Looking good; I now see that IntervalMap will fire an assertion if an overlapping insert is made, I'd previously thought it would just be inefficient, this explains why CoalescingBitVector can't do it.

llvm/include/llvm/ADT/CoalescingBitVector.h
81

I wonder if I'm missing something here -- what are the unused mapped values? Isn't everything in the IntervalMap mapped to a value of zero, and so any interval that's present must correspond to an interval of set bits?

186

As this only considers one overlap at a time, won't this run into difficulties with two overlaps in one RHS interval? Such as:

RHS:       ---------------------------------
LHS:           --------         ---------

If I flip the final Union test in the unit tests:

unionIs({2, 3, 4, 6, 7}, {1, 2, 3, 4, 5, 6, 7, 8}, {1, 2, 3, 4, 5, 6, 7, 8});

It coughs up an "Overlapping insert" assertion failure for me. Plus, if the second overlap is skipped, it would block any other overlap being considered, because we would never see the RHS interval it overlaps with and make progress.

(Maybe something less powerful/expressive than set union would be sufficient, if this proves painful to implement).

aprantl added inline comments.Feb 25 2020, 9:14 AM
llvm/include/llvm/ADT/CoalescingBitVector.h
101

What happens if I set an already-set bit? Will the data structure get inconsistent, less efficient, crash? Should there be an assert(!test(Index)) here?

aprantl accepted this revision.Feb 25 2020, 9:15 AM

LGTM with all outstanding comments addressed.

This revision is now accepted and ready to land.Feb 25 2020, 9:15 AM
vsk updated this revision to Diff 246598.Feb 25 2020, 4:17 PM
vsk marked 4 inline comments as done.

Address latest round of review feedback:

  • Fix a bug in operator|=, see inline comment for more discussion.
  • Upgrade testing for set union and set intersection: each test case is now tested two ways ("LHS <op> RHS = Expected" and "RHS <op> LHS = Expected"), as the ops should be symmetric.
llvm/include/llvm/ADT/CoalescingBitVector.h
81

We're using an IntervalMap<IndexT, char>, so the unused mapped values are those chars. Every interval in the map does correspond to a non-empty range of set bits, yes, but writing for (auto It : *Intervals.get()) doesn't work because:

llvm/include/llvm/ADT/CoalescingBitVector.h:95:21: error: member reference base type 'char' is not a structure or union
      Bits += 1 + It.stop() - It.start();
101

IntervalMap asserts if you try to make an overlapping insert. I've added the suggested assert.

186

You're right, we'd need to consider multiple overlaps per interval borrowed from RHS. I've gone ahead and just fixed this for posterity (I introduced a getNonOverlappingParts helper to do this), but was/remain very tempted to just delete the operator|= implementation.

As LiveDebugValues doesn't require a general set union operation, it's tempting to just punt on this, but it feels like that would be cheating others who may be interested in evaluating the new container.

aprantl added inline comments.Feb 27 2020, 8:27 AM
llvm/include/llvm/ADT/CoalescingBitVector.h
186

but it feels like that would be cheating others who may be interested in evaluating the new container.

I think that when we're adding a new data structure to ADT it should be generally useful even if some of the functionality is only use in unit tests.

This revision was automatically updated to reflect the committed changes.