This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Use thread pool for background indexing.
ClosedPublic

Authored by kadircet on Oct 24 2018, 8:43 AM.

Event Timeline

kadircet created this revision.Oct 24 2018, 8:43 AM

It's fine to spend one thread spinning on background tasks, but if we're going to do a threadpool, we should be more careful to not hurt the performance of foreground tasks. To do that, we should at least:

  • share the semaphore for the number of actively running tasks between TUScheduler and BackgroundIndex.
  • prioritize foreground tasks over background tasks.

Ideally, probably outside the scope of this change, we should make the background tasks cancellable (or find another way to keep them from interfering with the foreground tasks) in addition to that and make sure the scheduling is aware of distinction between foreground and background tasks. (@sammcall suggested simply using thread priorities for that).
Another high-level comment is that we should probably use llvm::ThreadPool here, this would have give code reuse and would make BackgroundIndex more focused on the actual indexing part.

clangd/index/Background.h
80

Why not std::vector? Memory allocs won't ever be a bottleneck here.

It's fine to spend one thread spinning on background tasks, but if we're going to do a threadpool, we should be more careful to not hurt the performance of foreground tasks. To do that, we should at least:

  • share the semaphore for the number of actively running tasks between TUScheduler and BackgroundIndex.
  • prioritize foreground tasks over background tasks.

I don't think I agree with this, at least not without evidence. Can we try thread priorities first?

clangd/index/Background.cpp
89

I always forget the details of how these work :-\
Is it possible for the "one" notification to be consumed by a waiter on blockUntilIdleForTest?

In general I'm not sure whether the notify_one optimization is worth the correctness risk as the code evolves.

clangd/index/Background.h
80

ilya was saying nice things about llvm::ThreadPool recently - worth a look?

Moving forward with priorities then.

clangd/index/Background.cpp
89

Ah, you are right, a thread waiting for done might catch that one as well, but I think it only applies to that one. Is there a possibility of blockUntilIdleForTest and enqueue being called from different threads?

There is still the argument of code evolution, but I don't think we should ever end up in a state in which an enqueue and a wait that will not consume that enqueue should occur concurrently.

clangd/index/Background.h
80

going for llvm::ThreadPool

Update from the offline meeting: decided to start with llvm::ThreadPool for background tasks and lower thread priorities for background tasks.

clangd/index/Background.cpp
89

but I don't think we should ever end up in a state in which an enqueue and a wait that will not consume that enqueue should occur concurrently.

We definitely will, +1 to Sam's comment, please call notify_all here. It's too easy to deadlock with notify_one.
E.g. just from the top of my mind, we can easily start adding tasks from a different thread if we start watching for changes (to compile commands, to header dependencies, etc.)

kadircet updated this revision to Diff 171527.Oct 29 2018, 9:55 AM
  • Use notify_all.
  • Use priorities.
sammccall added inline comments.Oct 30 2018, 2:41 AM
clangd/Threading.cpp
102 ↗(On Diff #171527)

don't you also need to actually include it?

clangd/Threading.h
120 ↗(On Diff #171527)

nit: enum class since this is at clangd scope

121 ↗(On Diff #171527)

nit: llvm tends to spell enum values like Low, Normal

124 ↗(On Diff #171527)

This deserves a comment - in particular that it may be a no-op.

clangd/index/Background.cpp
34

nit: clang-format

36

we discussed having different priority tasks, which would require priorities to be set up differently (to avoid workers never waking up to consume high priority tasks).

OK to leave this for now, but deserves a comment.

clangd/index/Background.h
19

depending on TUScheduler doesn't make sense here. Move the function to Threading.h instead?

80

Any problems with llvm::ThreadPool?

ilya-biryukov added inline comments.Oct 30 2018, 2:46 AM
clangd/Threading.cpp
101 ↗(On Diff #171527)

Maybe put this helper into llvm/Support/Threading.h?

clangd/Threading.h
120 ↗(On Diff #171527)

Maybe use enum class?

clangd/index/Background.h
19

Maybe move the getDefaultAsyncThreadsCount into Threading.h?
Having a dependency on TUScheduler in Background indexing is a bit misleading, since we're not actually sharing any threading infrastructure there.

kadircet updated this revision to Diff 171655.Oct 30 2018, 3:32 AM
kadircet marked 14 inline comments as done.
  • Address comments && Use ThreadPool
kadircet added inline comments.Oct 30 2018, 3:33 AM
clangd/Threading.cpp
102 ↗(On Diff #171527)

Turns out it was included transitively.

clangd/index/Background.h
19

actually there is one in llvm with the same semantics we use(never return zero) llvm::hardware_concurrency

kadircet updated this revision to Diff 171658.Oct 30 2018, 3:35 AM
  • Delete outdated comment.
kadircet updated this revision to Diff 171672.Oct 30 2018, 4:37 AM
  • Address comments.
sammccall accepted this revision.Oct 30 2018, 5:08 AM

Nice! Sorry about the back and forth with threadpool...

clangd/index/Background.cpp
34

(need to clang-format again)

This revision is now accepted and ready to land.Oct 30 2018, 5:08 AM
kadircet updated this revision to Diff 171675.Oct 30 2018, 5:13 AM
kadircet marked 2 inline comments as done.
  • Add comment.
clangd/Threading.cpp
101 ↗(On Diff #171527)

We talked with Sam about that one and rather decided to keep it here for a while and put into LLVM later on when we are sure about it is the right use case for everyone in LLVM with the right API.

kadircet updated this revision to Diff 171676.Oct 30 2018, 5:14 AM
kadircet marked an inline comment as done.
  • Format the code.
This revision was automatically updated to reflect the committed changes.
uabelho added a subscriber: uabelho.Nov 7 2018, 1:31 AM

Hi,

I've got a post-review comment about the use of SCHED_IDLE vs the needed gcc version.

clang-tools-extra/trunk/clangd/Threading.cpp
110 ↗(On Diff #171677)

I noticed that when I compile this with gcc 5.4.0, I get a compilation error about SCHED_IDLE not being defined.
Indeed, if I look in sched.h used with gcc 5.4.0, these are the SCHED_* definitions:

/* Scheduling algorithms.  */
#define SCHED_OTHER	0
#define SCHED_FIFO	1
#define SCHED_RR	2
#ifdef __USE_GNU
# define SCHED_BATCH	3
#endif

I suppose SCHED_IDLE was added in some later version...

On
https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library
it says that gcc 4.8 should work, but it doesn't now then?