This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Add InsertionOrderSet, with constant-time insertion and removal.
AbandonedPublic

Authored by efriedma on Jun 22 2018, 12:42 PM.

Details

Reviewers
None
Summary

This is useful in places which need a consistent iteration order like a SetVector, but also need constant-time removal. @efriedma quickly hacked this together as a proof of concept for D48372; there's a lot of room for improvement.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jun 22 2018, 12:42 PM

Thinking out loud: I wonder how often people ask for insertion order but only actually care about having a consistent order.

If the primary use-case we care about is "I inserted X, Y, Z, A, and removed Z. I expect to iterate over X, Y, and A in some order that's consistent across reruns regardless of what these hashed to/etc, but I don't actually care that X comes before Y and A," then we can probably be clever to save some space:

void remove(ValueT value) {
  auto MapFind = Map.find(value);
  if (MapFind != Map.end()) {
    size_t Index = MapFind->second;
    Map.erase(MapFind);
    if (Index != size() - 1) {
      swap(Vec[Index], Vec.back());
      Map.find(Vec[Index])->second = I;
    }
    Vec.pop_back();
  }
}

...Which would also let us get rid of the Optional.

I could have used DenseMapInfo::getTombstoneKey() here instead of Optional, I think. But making remove() swap might still be worthwhile; it would give random-access iteration, and more flexibility with the keys (with a different underlying hashtable, we could allow keys which don't have tombstones).

labrinea added inline comments.Jun 25 2018, 3:27 AM
include/llvm/ADT/InsertionOrderSet.h
63

Since the iterator does not allow random access of the underlying vector we might want something like this:

iterator at(size_t idx) {
  return MakeSetIterator(Vec.begin()+idx);
}

Another thing to consider maybe is the complexity of iteration over the vector. Imagine the case where we insert N elements and we erase M with M being almost the size of N. The iteration will be O(N) and not O(N-M).

Imagine the case where we insert N elements and we erase M with M being almost the size of N

This is essentially the issue I noted with the comment: "It doesn't clean out dead entries in the vector when the set is rehashed". I mean, I guess it's not exactly the same thing, but I think we end up rehashing in almost all the scenarios where this would matter, and we could add a shrink_to_fit method for other cases.

labrinea retitled this revision from [WIP] Add InsertionOrderSet, with constant-time insertion and removal. to Add OrderedSet, with constant-time insertion and removal, and random access iteration..Jul 6 2018, 8:41 AM
labrinea edited the summary of this revision. (Show Details)
labrinea retitled this revision from Add OrderedSet, with constant-time insertion and removal, and random access iteration. to [WIP] Add InsertionOrderSet, with constant-time insertion and removal..Jul 6 2018, 8:51 AM
labrinea edited the summary of this revision. (Show Details)
labrinea edited subscribers, added: efriedma; removed: dexonsmith.

I've accidentally edited the revision instead of just adding a new diff. Anyway, I've reset it to the original content and created https://reviews.llvm.org/D49030 with my changes for review.

efriedma abandoned this revision.Jul 6 2018, 10:44 AM

Closing in favor of https://reviews.llvm.org/D49030, which is probably closer to what we want in most cases.