This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Load YAML static index asynchronously.
AbandonedPublic

Authored by ioeric on Aug 30 2018, 12:58 AM.

Details

Summary

Loading static index can be slow (>10s for LLVM index). This allows
the clangd server to be initialized before index is loaded.

Event Timeline

ioeric created this revision.Aug 30 2018, 12:58 AM
ioeric retitled this revision from [clangd] Load YAML static index asyncrhonously. to [clangd] Load YAML static index asynchronously..Aug 30 2018, 1:00 AM
ioeric updated this revision to Diff 163276.Aug 30 2018, 1:03 AM
  • revert unintended change
ilya-biryukov added inline comments.Aug 30 2018, 6:06 AM
clangd/tool/ClangdMain.cpp
103

I believe is a data race (multiple threads may run this line concurrently).
You would want some synchronization around this, std::shared_future could be a good fit

Any reason to not just wait for the index to load? Is this a UX concern or a problem when experimenting?

ioeric updated this revision to Diff 163321.Aug 30 2018, 6:52 AM
  • Fix data race.

Any reason to not just wait for the index to load? Is this a UX concern or a problem when experimenting?

The index loading can be slow. When using LLVM YAML index, I need to wait for >10s before clangd starts giving me anything useful. We could potentially speed up loading (e.g. replacing yaml), but the index can be arbitrary large. I think it's an improvement in general to be able to get clangd running before index is loaded.

clangd/tool/ClangdMain.cpp
103

Nice catch.

I am a bit hesitated about using shared_future though. It could potentially get rid of the mutex but would require much more careful use. For example, Index can be set to the same value repeatedly, which makes me a bit nervous.

The index loading can be slow. When using LLVM YAML index, I need to wait for >10s before clangd starts giving me anything useful. We could potentially speed up loading (e.g. replacing yaml), but the index can be arbitrary large. I think it's an improvement in general to be able to get clangd running before index is loaded.

I would trade-off those 10 seconds for giving consistent experience (i.e. avoiding confusing the users with different modes of completion (with and without index, etc.)).
But not terribly opposed to that.

clangd/tool/ClangdMain.cpp
103

The following pattern should give proper results:

class AsyncIndex : public Index {
public:
  AsyncIndex(std::shared_future<Index> IndexFut);
  //....
private;
  const SymbolIndex *index() const {
     if (IndexFut.wait(0) != ready)
       return nullptr;
     return &IndexFut.get();
  }

  std::shared_future<Index>IndexFut;
};


AsyncIndex AI(std::async([]() { /* load the index here */ return Index; });

The index loading can be slow. When using LLVM YAML index, I need to wait for >10s before clangd starts giving me anything useful. We could potentially speed up loading (e.g. replacing yaml), but the index can be arbitrary large. I think it's an improvement in general to be able to get clangd running before index is loaded.

I would trade-off those 10 seconds for giving consistent experience (i.e. avoiding confusing the users with different modes of completion (with and without index, etc.)).
But not terribly opposed to that.

I think the inconsistency is benign and should be worth 10 seconds (which for me personally is loong...). Besides, this is just LLVM index; the loading could be arbitrary long depending on the corpus size.

clangd/tool/ClangdMain.cpp
103

My concern with this approach is that we are still calling wait even though it's not necessary once Index has been loaded.

ilya-biryukov added inline comments.Aug 30 2018, 7:55 AM
clangd/tool/ClangdMain.cpp
103

Maybe not bother before we know it causes actual performance issues? My bet is that it would never become a bottleneck, given the amount of work we're doing in addition to every call here.

I don't have a strong opinion on async vs sync - startup time is important and we shouldn't block simple AST-based functionality on the index, but this introduces some slightly confusing UX for that speed.

However I think this should be based on D51422 which extracts most of what you need out of MemIndex into the new SwapIndex.
So static index would just be initialized as an empty SwapIndex, then spawn a thread that loads the YAML and calls SwapIndex::reset.
This will get avoid adding nontrivial threading stuff to the main file.

(Sorry for catching this earlier, and that the patch isn't landed yet - feel free to pick up the review, else @kbobyrev will take a first pass tomorrow I think)

ioeric added inline comments.Aug 30 2018, 8:12 AM
clangd/tool/ClangdMain.cpp
103

Sure, the performance overhead is small in both cases. But I would still try to avoid wait e.g. OS can decide to yield even if the timeout is 0.

I don't have a strong opinion on async vs sync - startup time is important and we shouldn't block simple AST-based functionality on the index, but this introduces some slightly confusing UX for that speed.

However I think this should be based on D51422 which extracts most of what you need out of MemIndex into the new SwapIndex.
So static index would just be initialized as an empty SwapIndex, then spawn a thread that loads the YAML and calls SwapIndex::reset.
This will get avoid adding nontrivial threading stuff to the main file.

Oops, didn't realize that. That sounds great. Thanks!

kbobyrev added inline comments.Aug 31 2018, 2:43 AM
clangd/tool/ClangdMain.cpp
48

Also, do we want only static index to be built asynchronously? Do we want it to be used only in our Clangd tool driver? Loading dynamic index might also be useful if we have this kind of behavior for the static one. I'm not completely sure about that, though.

78

Shouldn't this index()->estimateMemoryUsage() when the index is built?

102

Do we want to be more explicit about loading index asynchronously? If we're not that fact might be implicit to the user (e.g. "my Clangd is not frozen, but global completion doesn't work"), CLI flag/documentation entry might solve that issue. Introducing tons of not-so-useful flags is not optimal, though, and I'm also not sure about that.

What do you think?

ioeric abandoned this revision.Sep 4 2018, 9:01 AM

Dropping this in favor of D51638