This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clangd/index/
-
index/
4/4
Index.h
-
Index.cpp
-
unittests/clangd/
-
clangd/
-
FileIndexTests.cpp
-
IndexTests.cpp
-
SymbolCollectorTests.cpp

Differential D41483

[clangd] Index symbols share storage within a slab.
ClosedPublic

Authored by sammccall on Dec 21 2017, 2:32 AM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rG6c0d0f5775f5: [clangd] Index symbols share storage within a slab.
rL321272: [clangd] Index symbols share storage within a slab.
rCTE321272: [clangd] Index symbols share storage within a slab.

Summary

Symbols are not self-contained - it's only safe to hand them out if you
guarantee the lifetime of the underlying data.

Memory usage tests:
I loaded all LLVM project symbols into a single slab.
Using the old implementation, this was 55MB, plus some malloc overhead for out-of-line string allocations that I couldn't easily measure.
Using the new implementation, this was 27MB - we saved half.
(I switched SymbolSlab from DenseMap to vector for both halves of this test to make measurement easier - this is a change @ilya-biryukov suggested would make sense to do for real in a followup patch)

Diff Detail

Repository: rCTE Clang Tools Extra

Event Timeline

sammccall created this revision.Dec 21 2017, 2:32 AM

Herald added subscribers: cfe-commits, ilya-biryukov, klimek. · View Herald TranscriptDec 21 2017, 2:32 AM

Don't intern unless the symbol was actually inserted.

ilya-biryukov added inline comments.Dec 21 2017, 6:03 AM

clangd/index/Index.h
134	A comment on why we use `BumpPtrAllocator` here might be useful. I.e., it uses more memory than malloc, but we're getting better data locality. (I hope that I got its intention right)

sammccall edited the summary of this revision. (Show Details)Dec 21 2017, 6:39 AM

Nice, LGTM.

clangd/index/Index.h
106–107	Do you want to remove this `FIXME` now or later?
126	nit: worth a comment here.

This revision is now accepted and ready to land.Dec 21 2017, 6:53 AM

sammccall marked an inline comment as done.Dec 21 2017, 6:54 AM

sammccall added inline comments.

clangd/index/Index.h
134	It's strictly better than malloc for us I think: no malloc bookkeeping/padding memory overhead we make lots of tiny allocations, they're much cheaper in CPU the locality you mentioned There's no extra memory usage: stringset doesn't use the allocator to allocate the hashtable, just the nodes. Added a comment.

Comment changes requiested in review.

Closed by commit rCTE321272: [clangd] Index symbols share storage within a slab. (authored by sammccall). · Explain WhyDec 21 2017, 6:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clangd/

index/

Index.h

29 lines

Index.cpp

12 lines

unittests/

clangd/

FileIndexTests.cpp

3 lines

IndexTests.cpp

3 lines

SymbolCollectorTests.cpp

2 lines

Diff 127879

clangd/index/Index.h

	Show All 9 Lines
	#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H			#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H
	#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H			#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_INDEX_INDEX_H

	#include "../Context.h"			#include "../Context.h"
	#include "clang/Index/IndexSymbol.h"			#include "clang/Index/IndexSymbol.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/Hashing.h"			#include "llvm/ADT/Hashing.h"
	#include "llvm/ADT/StringExtras.h"			#include "llvm/ADT/StringExtras.h"
				#include "llvm/ADT/StringSet.h"
	#include <array>			#include <array>
	#include <string>			#include <string>

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {

	struct SymbolLocation {			struct SymbolLocation {
	// The absolute path of the source file where a symbol occurs.			// The absolute path of the source file where a symbol occurs.
	std::string FilePath;			llvm::StringRef FilePath;
	// The 0-based offset to the first character of the symbol from the beginning			// The 0-based offset to the first character of the symbol from the beginning
	// of the source file.			// of the source file.
	unsigned StartOffset;			unsigned StartOffset;
	// The 0-based offset to the last character of the symbol from the beginning			// The 0-based offset to the last character of the symbol from the beginning
	// of the source file.			// of the source file.
	unsigned EndOffset;			unsigned EndOffset;
	};			};

	Show All 31 Lines

	// Construct SymbolID from a hex string.			// Construct SymbolID from a hex string.
	// The HexStr is required to be a 40-bytes hex string, which is encoded from the			// The HexStr is required to be a 40-bytes hex string, which is encoded from the
	// "<<" operator.			// "<<" operator.
	void operator>>(llvm::StringRef HexStr, SymbolID &ID);			void operator>>(llvm::StringRef HexStr, SymbolID &ID);

	// The class presents a C++ symbol, e.g. class, function.			// The class presents a C++ symbol, e.g. class, function.
	//			//
	// FIXME: instead of having own copy fields for each symbol, we can share			// WARNING: Symbols do not own much of their underlying data - typically strings
	// storage from SymbolSlab.			// are owned by a SymbolSlab. They should be treated as non-owning references.
				// Copies are shallow.
				// When adding new unowned data fields to Symbol, remember to update
				// SymbolSlab::insert to copy them to the slab's storage.
	struct Symbol {			struct Symbol {
	// The ID of the symbol.			// The ID of the symbol.
	SymbolID ID;			SymbolID ID;
	// The unqualified name of the symbol, e.g. "bar" (for "n1::n2::bar").			// The unqualified name of the symbol, e.g. "bar" (for "n1::n2::bar").
	std::string Name;			llvm::StringRef Name;
	// The scope (e.g. namespace) of the symbol, e.g. "n1::n2" (for			// The scope (e.g. namespace) of the symbol, e.g. "n1::n2" (for
	// "n1::n2::bar").			// "n1::n2::bar").
	std::string Scope;			llvm::StringRef Scope;
	// The symbol information, like symbol kind.			// The symbol information, like symbol kind.
	index::SymbolInfo SymInfo;			index::SymbolInfo SymInfo;
	// The location of the canonical declaration of the symbol.			// The location of the canonical declaration of the symbol.
	//			//
	// A C++ symbol could have multiple declarations and one definition (e.g.			// A C++ symbol could have multiple declarations and one definition (e.g.
	// a function is declared in ".h" file, and is defined in ".cc" file).			// a function is declared in ".h" file, and is defined in ".cc" file).
	// * For classes, the canonical declaration is usually definition.			// * For classes, the canonical declaration is usually definition.
	// * For non-inline functions, the canonical declaration is a declaration			// * For non-inline functions, the canonical declaration is a declaration
	// (not a definition), which is usually declared in ".h" file.			// (not a definition), which is usually declared in ".h" file.
	SymbolLocation CanonicalDeclaration;			SymbolLocation CanonicalDeclaration;

	// FIXME: add definition location of the symbol.			// FIXME: add definition location of the symbol.
	// FIXME: add all occurrences support.			// FIXME: add all occurrences support.
	// FIXME: add extra fields for index scoring signals.			// FIXME: add extra fields for index scoring signals.
	// FIXME: add code completion information.			// FIXME: add code completion information.
	};			};

	// A symbol container that stores a set of symbols. The container will maintain			// A symbol container that stores a set of symbols. The container will maintain
	// the lifetime of the symbols.			// the lifetime of the symbols.
	//
	// FIXME: Use a space-efficient implementation, a lot of Symbol fields could
	// share the same storage.
	class SymbolSlab {			class SymbolSlab {
				hokeinUnsubmitted Done Reply Inline Actions Do you want to remove this `FIXME` now or later? hokein: Do you want to remove this `FIXME` now or later?
	public:			public:
	using const_iterator = llvm::DenseMap<SymbolID, Symbol>::const_iterator;			using const_iterator = llvm::DenseMap<SymbolID, Symbol>::const_iterator;

	SymbolSlab() = default;			SymbolSlab() = default;

	const_iterator begin() const;			const_iterator begin() const;
	const_iterator end() const;			const_iterator end() const;
	const_iterator find(const SymbolID &SymID) const;			const_iterator find(const SymbolID &SymID) const;

	// Once called, no more symbols would be added to the SymbolSlab. This			// Once called, no more symbols would be added to the SymbolSlab. This
	// operation is irreversible.			// operation is irreversible.
	void freeze();			void freeze();

	void insert(Symbol S);			// Adds the symbol to this slab.
				// This is a deep copy: underlying strings will be owned by the slab.
				void insert(const Symbol& S);

	private:			private:
				// Replaces S with a reference to the same string, owned by this slab.
				hokeinUnsubmitted Done Reply Inline Actions nit: worth a comment here. hokein: nit: worth a comment here.
				void intern(llvm::StringRef &S) {
				S = S.empty() ? llvm::StringRef() : Strings.insert(S).first->getKey();
				}

	bool Frozen = false;			bool Frozen = false;

				// Intern table for strings. Not StringPool as we don't refcount, just insert.
				// We use BumpPtrAllocator to avoid lots of tiny allocations for nodes.
				ilya-biryukovUnsubmitted Done Reply Inline Actions A comment on why we use `BumpPtrAllocator` here might be useful. I.e., it uses more memory than malloc, but we're getting better data locality. (I hope that I got its intention right) ilya-biryukov: A comment on why we use `BumpPtrAllocator` here might be useful. I.e., it uses more memory than…
				sammccallAuthorUnsubmitted Done Reply Inline Actions It's strictly better than malloc for us I think: no malloc bookkeeping/padding memory overhead we make lots of tiny allocations, they're much cheaper in CPU the locality you mentioned There's no extra memory usage: stringset doesn't use the allocator to allocate the hashtable, just the nodes. Added a comment. sammccall: It's strictly better than malloc for us I think: - no malloc bookkeeping/padding memory…
				llvm::StringSet<llvm::BumpPtrAllocator> Strings;
	llvm::DenseMap<SymbolID, Symbol> Symbols;			llvm::DenseMap<SymbolID, Symbol> Symbols;
	};			};

	struct FuzzyFindRequest {			struct FuzzyFindRequest {
	/// \brief A query string for the fuzzy find. This is matched against symbols'			/// \brief A query string for the fuzzy find. This is matched against symbols'
	/// un-qualified identifiers and should not contain qualifiers like "::".			/// un-qualified identifiers and should not contain qualifiers like "::".
	std::string Query;			std::string Query;
	/// \brief If this is non-empty, symbols must be in at least one of the scopes			/// \brief If this is non-empty, symbols must be in at least one of the scopes
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

clangd/index/Index.cpp

	Show All 32 Lines
	SymbolSlab::const_iterator SymbolSlab::end() const { return Symbols.end(); }			SymbolSlab::const_iterator SymbolSlab::end() const { return Symbols.end(); }

	SymbolSlab::const_iterator SymbolSlab::find(const SymbolID &SymID) const {			SymbolSlab::const_iterator SymbolSlab::find(const SymbolID &SymID) const {
	return Symbols.find(SymID);			return Symbols.find(SymID);
	}			}

	void SymbolSlab::freeze() { Frozen = true; }			void SymbolSlab::freeze() { Frozen = true; }

	void SymbolSlab::insert(Symbol S) {			void SymbolSlab::insert(const Symbol &S) {
	assert(!Frozen && "Can't insert a symbol after the slab has been frozen!");			assert(!Frozen && "Can't insert a symbol after the slab has been frozen!");
	Symbols[S.ID] = std::move(S);			auto ItInserted = Symbols.try_emplace(S.ID, S);
				if (!ItInserted.second)
				return;
				auto &Sym = ItInserted.first->second;

				// We inserted a new symbol, so copy the underlying data.
				intern(Sym.Name);
				intern(Sym.Scope);
				intern(Sym.CanonicalDeclaration.FilePath);
	}			}

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang

unittests/clangd/FileIndexTests.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	TEST(FileSymbolsTest, SnapshotAliveAfterRemove) {
EXPECT_THAT(getSymbolNames(*Symbols), UnorderedElementsAre("1", "2", "3"));		EXPECT_THAT(getSymbolNames(*Symbols), UnorderedElementsAre("1", "2", "3"));
}		}

std::vector<std::string> match(const SymbolIndex &I,		std::vector<std::string> match(const SymbolIndex &I,
const FuzzyFindRequest &Req) {		const FuzzyFindRequest &Req) {
std::vector<std::string> Matches;		std::vector<std::string> Matches;
auto Ctx = Context::empty();		auto Ctx = Context::empty();
I.fuzzyFind(Ctx, Req, [&](const Symbol &Sym) {		I.fuzzyFind(Ctx, Req, [&](const Symbol &Sym) {
Matches.push_back(Sym.Scope + (Sym.Scope.empty() ? "" : "::") + Sym.Name);		Matches.push_back(
		(Sym.Scope + (Sym.Scope.empty() ? "" : "::") + Sym.Name).str());
});		});
return Matches;		return Matches;
}		}

/// Create an ParsedAST for \p Code. Returns None if \p Code is empty.		/// Create an ParsedAST for \p Code. Returns None if \p Code is empty.
llvm::Optional<ParsedAST> build(std::string Path, llvm::StringRef Code) {		llvm::Optional<ParsedAST> build(std::string Path, llvm::StringRef Code) {
Context Ctx = Context::empty();		Context Ctx = Context::empty();
if (Code.empty())		if (Code.empty())
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

unittests/clangd/IndexTests.cpp

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	generateNumSymbols(int Begin, int End,
return generateSymbols(Names, WeakSymbols);		return generateSymbols(Names, WeakSymbols);
}		}

std::vector<std::string> match(const SymbolIndex &I,		std::vector<std::string> match(const SymbolIndex &I,
const FuzzyFindRequest &Req) {		const FuzzyFindRequest &Req) {
std::vector<std::string> Matches;		std::vector<std::string> Matches;
auto Ctx = Context::empty();		auto Ctx = Context::empty();
I.fuzzyFind(Ctx, Req, [&](const Symbol &Sym) {		I.fuzzyFind(Ctx, Req, [&](const Symbol &Sym) {
Matches.push_back(Sym.Scope + (Sym.Scope.empty() ? "" : "::") + Sym.Name);		Matches.push_back(
		(Sym.Scope + (Sym.Scope.empty() ? "" : "::") + Sym.Name).str());
});		});
return Matches;		return Matches;
}		}

TEST(MemIndexTest, MemIndexSymbolsRecycled) {		TEST(MemIndexTest, MemIndexSymbolsRecycled) {
MemIndex I;		MemIndex I;
std::weak_ptr<SlabAndPointers> Symbols;		std::weak_ptr<SlabAndPointers> Symbols;
I.build(generateNumSymbols(0, 10, &Symbols));		I.build(generateNumSymbols(0, 10, &Symbols));
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

unittests/clangd/SymbolCollectorTests.cpp

	Show All 27 Lines

	using testing::Eq;			using testing::Eq;
	using testing::Field;			using testing::Field;
	using testing::UnorderedElementsAre;			using testing::UnorderedElementsAre;

	// GMock helpers for matching Symbol.			// GMock helpers for matching Symbol.
	MATCHER_P(QName, Name, "") {			MATCHER_P(QName, Name, "") {
	return (arg.second.Scope + (arg.second.Scope.empty() ? "" : "::") +			return (arg.second.Scope + (arg.second.Scope.empty() ? "" : "::") +
	arg.second.Name) == Name;			arg.second.Name).str() == Name;
	}			}

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {

	namespace {			namespace {
	class SymbolIndexActionFactory : public tooling::FrontendActionFactory {			class SymbolIndexActionFactory : public tooling::FrontendActionFactory {
	public:			public:
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines