This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/ADT/
-
llvm/
-
ADT/
4
OrderedSet.h
-
unittests/ADT/
-
ADT/
-
CMakeLists.txt
-
OrderedSetTest.cpp

Differential D49030

Add OrderedSet, with constant-time insertion and removal, and random access iteration.
AbandonedPublic

Authored by labrinea on Jul 6 2018, 8:44 AM.

Download Raw Diff

Details

Reviewers

llvm-commits
george.burgess.iv
efriedma
chandlerc

Summary

Implements a hash table with constant-time insertion and removal, and iteration in some consistent order across runs. Two copies of each element are kept, one is stored in a std::vector for random access iteration, and one is used as a key in a DenseMap, which contains indexes to the vector.

Diff Detail

Event Timeline

labrinea created this revision.Jul 6 2018, 8:44 AM

Herald added subscribers: dexonsmith, mgorny. · View Herald TranscriptJul 6 2018, 8:44 AM

labrinea added a child revision: D48372: [MemorySSAUpdater] Remove deleted trivial Phis from active workset.Jul 6 2018, 8:58 AM

mgrang added a subscriber: mgrang.Jul 6 2018, 11:02 AM

Some nits.

include/llvm/ADT/OrderedSet.h
35	happend -> happened
40	Space needed around arithmetic operator.
42	Space needed around arithmetic operator.
71	Space needed around arithmetic operator.

The extra copy of each value is a bit unfortunate, but it would be a lot of work to fix (you'd basically have to reimplement DenseMap), and it's not a big deal for pointer-sized keys, so should be fine to leave that as-is for now.

I guess this is fine, unless someone else has a better suggestion.

We already have SetVector which is widely used for these patterns. If we need both, we need a clear explanation of the difference and why we need both (IE, why users of one can't use the other).

This revision now requires changes to proceed.Jul 6 2018, 7:12 PM

In D49030#1155000, @chandlerc wrote:

We already have SetVector which is widely used for these patterns. If we need both, we need a clear explanation of the difference and why we need both (IE, why users of one can't use the other).

The remove operation on a SetVector can cost linear time and this might be an issue in some cases. Please have a look at the review comments of https://reviews.llvm.org/D48372

In D49030#1155065, @labrinea wrote:

In D49030#1155000, @chandlerc wrote:

We already have SetVector which is widely used for these patterns. If we need both, we need a clear explanation of the difference and why we need both (IE, why users of one can't use the other).

The remove operation on a SetVector can cost linear time and this might be an issue in some cases. Please have a look at the review comments of https://reviews.llvm.org/D48372

Assuming we end up with two ADTs, we'll need: better names, clear header documentation of what's different, and an explanation in https://llvm.org/docs/ProgrammersManual.html.

Personally, I'd be in favour of changing MapVector to just do this, since it already has an index. I'd also be in favour of adding an index to SetVector to match. There'd be some work to do in auditing/updating users, but I doubt there are many calls to erase() and I'm skeptical that anyone relies on the current order for something other than determinism.

javed.absar added a subscriber: javed.absar.Jul 9 2018, 3:29 AM

In D49030#1155072, @dexonsmith wrote:

Personally, I'd be in favour of changing MapVector to just do this, since it already has an index. I'd also be in favour of adding an index to SetVector to match. There'd be some work to do in auditing/updating users, but I doubt there are many calls to erase() and I'm skeptical that anyone relies on the current order for something other than determinism.

I think we should not change MapVector. Its underlying data structure is a vector with <key,value> pairs but we only need to store a single value. Moreover, both MapVector and SetVector provide insertion order iteration and their users might rely on that. I am not sure we should change that. SmallSet might be a good alternative for lower insertion/deletion complexity, but at the moment it does not provide iteration over the underlying data structures. It uses a SmallVector for small data sets and expands it to an std::set if it grows too much. Is there a way to abstract those underlying iterators? If not I think we should keep this new ADT and rename/document it properly.

In D49030#1157060, @labrinea wrote:

In D49030#1155072, @dexonsmith wrote:

Personally, I'd be in favour of changing MapVector to just do this, since it already has an index. I'd also be in favour of adding an index to SetVector to match. There'd be some work to do in auditing/updating users, but I doubt there are many calls to erase() and I'm skeptical that anyone relies on the current order for something other than determinism.

I think we should not change MapVector. Its underlying data structure is a vector with <key,value> pairs but we only need to store a single value. Moreover, both MapVector and SetVector provide insertion order iteration and their users might rely on that. I am not sure we should change that. SmallSet might be a good alternative for lower insertion/deletion complexity, but at the moment it does not provide iteration over the underlying data structures. It uses a SmallVector for small data sets and expands it to an std::set if it grows too much. Is there a way to abstract those underlying iterators? If not I think we should keep this new ADT and rename/document it properly.

I like @dexonsmith's suggestion. @labrinea, you don't have to change MapVector, but I think it's worth changing SetVector to maintain the vector indices in its map and to make erase O(1). Refactoring them to use the same implementation and implementing this optimization for MapVector can be done later.

labrinea abandoned this revision.May 24 2023, 4:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2023, 4:55 AM

Revision Contents

Path

Size

include/

llvm/

ADT/

OrderedSet.h

85 lines

unittests/

ADT/

CMakeLists.txt

1 line

OrderedSetTest.cpp

95 lines

Diff 154417

include/llvm/ADT/OrderedSet.h

This file was added.

				//===- OrderedSet.h - Hash table --------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a hash table with constant-time insertion and removal,
				// and iteration in some consistent order across runs. Two copies of each
				// element are kept, one is stored in a std::vector for random access iteration,
				// and one is used as a key in a DenseMap, which contains indexes to the vector.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_ADT_ORDEREDSET_H
				#define LLVM_ADT_ORDEREDSET_H

				#include "llvm/ADT/DenseMap.h"
				#include <vector>

				namespace llvm {

				template <class ValueT>
				class OrderedSet {
				DenseMap<ValueT, size_t> Map;
				std::vector<ValueT> Vec;

				public:
				using iterator = typename std::vector<ValueT>::iterator;

				// Insert an element in the OrderedSet unless it's already
				// present. Return a pair indicating its position in the vector
				// and the whether the insertion happend or not.
				mgrangUnsubmitted Not Done Reply Inline Actions happend -> happened mgrang: happend -> happened
				std::pair<iterator, bool> insert(ValueT value) {
				auto SetIns = Map.insert(std::make_pair(value, Vec.size()));
				if (SetIns.second) {
				Vec.push_back(value);
				return { Vec.end()-1, true };
				mgrangUnsubmitted Not Done Reply Inline Actions Space needed around arithmetic operator. mgrang: Space needed around arithmetic operator.
				}
				return { Vec.begin()+SetIns.first->second, false };
				mgrangUnsubmitted Not Done Reply Inline Actions Space needed around arithmetic operator. mgrang: Space needed around arithmetic operator.
				}

				// Random access iterators.
				iterator begin() { return Vec.begin(); }
				iterator end() { return Vec.end(); }

				// Clear all the entries from the OrderedSet.
				void clear() {
				Vec.clear();
				Map.clear();
				}

				// Return the size of the underlying container.
				size_t size() const { return Vec.size(); }

				// Is an element present in the OrderedSet?
				size_t count(ValueT value) const {
				return Map.find(value) != Map.end() ? 1 : 0;
				}

				// Erase an element from the OrderedSet if it's present.
				// Replace it with the last element of the vector and
				// update the DenseMap for that entry.
				bool erase(ValueT value) {
				auto MapFind = Map.find(value);
				if (MapFind != Map.end()) {
				size_t Index = MapFind->second;
				Map.erase(MapFind);
				if (Index != Vec.size()-1) {
				mgrangUnsubmitted Not Done Reply Inline Actions Space needed around arithmetic operator. mgrang: Space needed around arithmetic operator.
				std::swap(Vec[Index], Vec.back());
				Map.find(Vec[Index])->second = Index;
				}
				Vec.pop_back();
				return true;
				}
				return false;
				}

				};

				} // end namespace llvm

				#endif // LLVM_ADT_ORDEREDSET_H

unittests/ADT/CMakeLists.txt

Show All 30 Lines	add_llvm_unittest(ADTTests
IntEqClassesTest.cpp		IntEqClassesTest.cpp
IntervalMapTest.cpp		IntervalMapTest.cpp
IntrusiveRefCntPtrTest.cpp		IntrusiveRefCntPtrTest.cpp
IteratorTest.cpp		IteratorTest.cpp
MakeUniqueTest.cpp		MakeUniqueTest.cpp
MappedIteratorTest.cpp		MappedIteratorTest.cpp
MapVectorTest.cpp		MapVectorTest.cpp
OptionalTest.cpp		OptionalTest.cpp
		OrderedSetTest.cpp
PackedVectorTest.cpp		PackedVectorTest.cpp
PointerEmbeddedIntTest.cpp		PointerEmbeddedIntTest.cpp
PointerIntPairTest.cpp		PointerIntPairTest.cpp
PointerSumTypeTest.cpp		PointerSumTypeTest.cpp
PointerUnionTest.cpp		PointerUnionTest.cpp
PostOrderIteratorTest.cpp		PostOrderIteratorTest.cpp
PriorityWorklistTest.cpp		PriorityWorklistTest.cpp
RangeAdapterTest.cpp		RangeAdapterTest.cpp
Show All 25 Lines

unittests/ADT/OrderedSetTest.cpp

This file was added.

				//===- unittest/ADT/OrderedSetTest.cpp - OrderedSet unit tests --- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/OrderedSet.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				TEST(OrderedSetTest, insert) {
				OrderedSet<unsigned> OS;
				std::pair<OrderedSet<unsigned>::iterator, bool> R;

				// The element was not present upon insertion.
				for (unsigned i=0; i<10; ++i) {
				EXPECT_EQ(OS.count(i), 0u);
				R = OS.insert(i);
				ASSERT_EQ(R.first, OS.end()-1);
				EXPECT_EQ(*R.first, i);
				EXPECT_TRUE(R.second);
				ASSERT_EQ(OS.size(), i+1);
				}

				// The element was already present upon insertion.
				for (unsigned i=0; i<10; ++i) {
				EXPECT_EQ(OS.count(i), 1u);
				R = OS.insert(i);
				ASSERT_EQ(R.first, OS.begin()+i);
				EXPECT_EQ(*R.first, i);
				EXPECT_FALSE(R.second);
				ASSERT_EQ(OS.size(), 10u);
				}
				}

				TEST(OrderedSetTest, erase) {
				OrderedSet<unsigned> OS;

				for (unsigned i=0; i<10; ++i)
				OS.insert(i);
				ASSERT_EQ(OS.size(), 10u);

				// The element was present upon deletion.
				for (unsigned i=0; i<10; ++i) {
				EXPECT_EQ(OS.count(i), 1u);
				EXPECT_TRUE(OS.erase(i));
				ASSERT_EQ(OS.size(), (10-i)-1);
				}

				// The element was not present upon deletion.
				for (unsigned i=0; i<10; ++i) {
				EXPECT_EQ(OS.count(i), 0u);
				EXPECT_FALSE(OS.erase(i));
				ASSERT_EQ(OS.size(), 0u);
				}
				}

				TEST(OrderedSetTest, iterate) {
				OrderedSet<unsigned> OS;

				// Insert {0,1,2,3,4,5,6,7,8,9}
				for (unsigned i=0; i<10; ++i)
				OS.insert(i);
				ASSERT_EQ(OS.size(), 10u);

				// The iteration order is same as insertion order
				// as long as no removals have occured.
				unsigned count=0;
				for (auto I : make_range(OS.begin(), OS.end())) {
				ASSERT_EQ(I, count);
				++count;
				}

				// Removing elements from the OrderedSet can
				// reposition other elements.
				for (unsigned i=0; i<5; ++i)
				EXPECT_TRUE(OS.erase(i));
				ASSERT_EQ(OS.size(), 5u);

				// The iteration order is now {9,8,7,6,5}
				count=9;
				for (auto I : make_range(OS.begin(), OS.end())) {
				ASSERT_EQ(I, count);
				--count;
				}

				// Erase everything.
				OS.clear();
				ASSERT_EQ(OS.size(), 0u);
				ASSERT_EQ(std::distance(OS.begin(), OS.end()), 0u);
				}