This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
ClosedPublic

Authored by sammccall on Aug 29 2018, 6:29 AM.

Details

Summary

This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.

Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).

New hotness: I have a SwapIndex, which holds a shared_ptr<SymbolIndex>, which
holds a MemIndex and also keeps any data backing it alive.
I update by building a new MemIndex and calling SwapIndex::reset().

This resulted in a bunch of interface churn (some places previously using
unique_ptr should now be shared_ptr, and some using MemIndex() + MemIndex::build
should now call the static MemIndex::build factory instead).

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Aug 29 2018, 6:29 AM
ioeric added inline comments.Aug 30 2018, 9:23 AM
clangd/index/FileIndex.h
47 ↗(On Diff #163073)

Maybe avoid hardcoding the index name, so that we could potentially switch to use a different index implementation?

We might also want to allow user to specify different index implementations for file index e.g. main file dynamic index might prefer MemIndex while Dex might be a better choice for the preamble index.

clangd/index/MemIndex.h
30 ↗(On Diff #163073)

(It's a bit unfortunate that this has to return shared_ptr now)

kbobyrev added inline comments.Aug 31 2018, 2:59 AM
clangd/index/Index.h
416 ↗(On Diff #163073)

Do we want these functions to be final? Since SwapIndex is a wrapper around an immutable index structure, I believe it would be unlikely that anything would derive from it.

ioeric added inline comments.Aug 31 2018, 6:33 AM
clangd/index/dex/DexIndex.h
42 ↗(On Diff #163073)

Why is this constructor needed? I think this could restrict the flexibility of DexIndex initialization, in case we need to pass in extra parameters here.

sammccall added inline comments.Aug 31 2018, 6:47 AM
clangd/index/Index.h
416 ↗(On Diff #163073)

Maybe unlikely but I don't see a strong reason to enforce one way or the other. (We rarely use final).

One example of where we'll do it: FileIndex inherits SwapIndex, but it should override estimateMemoryUsage once that's fixed to incorporate backing symbols.

clangd/index/MemIndex.h
30 ↗(On Diff #163073)

Since some of the resources it owns has a shared lifetime, this is really just reflecting reality I think. Whether that's visible or invisible seems like a wash to me.

clangd/index/dex/DexIndex.h
42 ↗(On Diff #163073)

I'm not sure exactly what you mean here.

We need a way to specify the symbols to be indexed.
Because these are now immutable, doing that in the constructor if possible is best.

Previously this was a vector<const Symbol*>, but that sometimes required us to construct that big vector, dereference all those pointers, and throw away the vector. This signature is strictly more general (if you have a vector of pointers, you can pass make_pointee_range<const Symbol>)

in case we need to pass in extra parameters here.

What stops us adding more parameters?

ioeric added inline comments.Aug 31 2018, 7:34 AM
clangd/index/MemIndex.h
30 ↗(On Diff #163073)

This is only true when this is used with SwapIndex right? For example, a static Dex/Mem index would probbaly have unique_ptr ownership.

clangd/index/dex/DexIndex.h
42 ↗(On Diff #163073)

What stops us adding more parameters?

I thought this template was added so that we could use it as a drop-in replacement of MemIndex (with the same signature) e.g. in FileIndex? I might have overthought though.

Thanks for the explanation!

sammccall added inline comments.Aug 31 2018, 8:32 AM
clangd/index/FileIndex.h
47 ↗(On Diff #163073)

I don't think "returns an index of unspecified implementation" is the best contract. MemIndex and DexIndex will both continue to exist, and they have different performance characteristics (roughly fast build vs fast query). So it should be up to the caller, no?

clangd/index/MemIndex.h
30 ↗(On Diff #163073)

Dex and MemIndex don't own the symbols.

If S is a SymbolSlab:

  • MemIndex(S) (or make_unique<MemIndex>(S)) is a MemIndex that doesn't know about the storage and doesn't own the symbols
  • MemIndex::build(move(S)) is a shared_ptr<MemIndex> that owns the storage. It's the shared_ptr, not the MemIndex, that manages the storage.

So for a static index, you could either hand off the ownership of the slab by calling build(), or keep it yourself and call the constructor. The former seems more convenient in ClangdMain.

We could write this differently, e.g. as std::shared_ptr<SymbolIndex> = tieStorageToIndex(std::move(S), make_unique<MemIndex>(S));. The API here (MemIndex::build()) is intended to make common use cases easy, and keep it fairly similar to before for migration purposes. Is it too confusing?

clangd/index/dex/DexIndex.h
42 ↗(On Diff #163073)

Sure, Dex and MemIndex had the same constructor signatures as each other before this patch, and they have the same as each other after this patch.

That makes it convenient to use them interchangeably, but there's no hard requirement (no construction from templates etc).

kbobyrev accepted this revision.Sep 3 2018, 5:05 AM

Other than a hard-coded buildMemIndex() in FileIndex, I don't have any concerns. That's just a tiny piece, if @ioeric doesn't have any concerns about that too, I think it's good to land.

The code should look cleaner in multiple places now, thanks for the patch!

clangd/index/FileIndex.h
47 ↗(On Diff #163073)

We could make the index type a template parameter to allow users decide which one they want (also, this could be have a default value, e.g. MemIndex and later DexIndex). Would that be a viable solution?

This revision is now accepted and ready to land.Sep 3 2018, 5:05 AM
sammccall updated this revision to Diff 163714.Sep 3 2018, 7:15 AM

Instead of returning shared_ptr<SymbolIndex>, our implementations that have
backing data gain the ability to own that data (without coupling to its form).
Based on offline discussion with @ioeric.

Rebase to pick up SymbolOccurrence changes. The SymbolOccurrence parts of this
patch aren't terribly clean, but I think that's a problem for another day.

sammccall updated this revision to Diff 163715.Sep 3 2018, 7:27 AM

Remove some more shared_ptr occurrences that aren't needed anymore
Update comments

OK, I think this is good to go again.
(A couple of seemingly-unrelated fixes to SymbolOccurrenceKind that I ran into in tests)

clangd/index/FileIndex.h
47 ↗(On Diff #163073)

I'm not really sure what problem templates solve there?
If a caller wants a Dex index, we can just add a buildDexIndex() function.
Templating and requiring these to always have the same constructor parameters seems fragile and unneccesary unless the dependency from FileIndex->Dex is really bad for some reason.

ioeric accepted this revision.Sep 3 2018, 7:29 AM
ioeric added inline comments.
clangd/index/FileIndex.h
47 ↗(On Diff #163073)

Provide one interface for each index implementation sounds good to me. Thanks for the clarification!

clangd/index/Index.h
15 ↗(On Diff #163714)

nit: no longer needed?

clangd/index/dex/DexIndex.h
42 ↗(On Diff #163073)

Sounds good.

This revision was automatically updated to reflect the committed changes.