In the new threading model clangd creates one thread per file to manage
the AST and one thread to process each of the incoming requests.
The number of actively running threads is bounded by the semaphore to
avoid overloading the system.
Details
Diff Detail
- Repository
- rCTE Clang Tools Extra
- Build Status
Buildable 14660 Build 14660: arc lint + arc unit
Event Timeline
clangd/ClangdServer.h | ||
---|---|---|
101 | These fields should probably be grouped into multiple groups:
|
Thanks, this helps me understand where previous patch is coming from!
I have some comments on the ThreadPool part, which basically amount to trying to represent the same abstract structure with fewer pieces.
But I still want to consider whether that's the right structure (specifically: whether the Queue abstraction makes it awkward to express task interactions like diagnostics debouncing).
So please don't do any work based on these comments yet! More thoughts to come...
clangd/ClangdServer.h | ||
---|---|---|
96–97 | So overall here, I think that we can drop ThreadPool without much impact on the design. The advantage is one less layer here to understand, and an attractive nuisance to tweak over time.
| |
130 | As discussed offline, Queue's semantics are very similar to a thread, except:
In the interests of keeping things simple and familiar, I think we should start out by using std::thread. | |
131 | Similarly, the free requests look a lot like standalone threads, with a few enhancements that are implementable but also possible YAGNI.
[1] I see both advantages and disadvantages, happy to discuss more! |
OK, here's a braindump, probably nothing surprising after our offline discussion.
TL;DR: the only design I *know* is extensible in ways we care about is the one that basically shards Scheduler into a bunch of per-TU schedulers.
I think that's probably the way to go, but if you can show how to cleanly extend one of the other options, we should compare those on the merits.
I think if we do adopt and stick with the sharding design, then a public API that reflects it is slightly nicer overall.
But it's not that important to me, tastes may vary, and above all I don't want to block the API patch on coming to a decision on the question above.
So I think we should agree on a name and then land the API patch with the interface it has now.
clangd/ClangdServer.cpp | ||
---|---|---|
192 ↗ | (On Diff #131567) | So... scheduler and the queue abstraction. (Current diagnostics behavior: we compute and send diagnostics for the syntax error introduced by the first keystroke. This is wasted load, and increases latency for diagnostics on the *last* keystroke, since we have to wait for the previous one to finish) So we should consider some other options... |
193 ↗ | (On Diff #131567) | Option 1: Put all timing-related logic in scheduler. I think this probably means Scheduler itself gets a thread whose job it is to wake up at certain times and schedule or cancel tasks on the queues. This gets complicated by reads after updates that haven't finished yet - you need force the update to happen now, so you need to schedule the thing and prevent the alarm from scheduling it. |
194 ↗ | (On Diff #131567) | Option 2: Share timing-related additions between scheduler and queue, but keep the tasks opaque to the queue. This isn't all that different from option 1, but moving some of the pieces out of scheduler might reduce the ball of hair to a manageable size, and add some conceptual clarity. |
195 ↗ | (On Diff #131567) | Option 3: extend the responsibilities of Queue so this is naturally its concern. The easiest way to express this is that Queue owns a thread and implements the run-loop for actions on a single TU, and Scheduler's operations basically delegate to Queue. FWIW, this can be transformed into a version without dedicated threads. First we write the thread version in this form: runLoop() { while (!shutdown) { nextThing = decideWhatToDo(); // either action, or a sleep duration if (nextThing.action) nextThing.action(); else interrupt.wait(nextThing.sleepDuration); // any update(), withAST etc interrupts } } I don't think it'd be hard to write a threadpool-based executor that runs "workloads" defined by their decideWhatToDo() function and their interactions with the interrupt condition variable. (I don't know exactly what a nice API would look like, but details...) That said, I definitely think the thread/runloop based one is an easier place to start, and suspect it would be good enough. |
196 ↗ | (On Diff #131567) | My conclusion (feel free to disagree!)
My preference/advice would be to take option 3 now, I think it's at least as good as the others and doesn't need additional research. Of course I'm not opposed if you want to dig into options 1/2 and show that they can work. (Separately, I'd prefer to initially use std::thread rather than a threadpool abstraction for the reasons laid out in the previous comment. Especially for option 3, where the API for the threadpool equivalent isn't obvious or standard. But *mostly* these questions are orthogonal) |
clangd/threading/Cancellation.h | ||
33 ↗ | (On Diff #131567) | This is a nice little class! I think callsites will look more natural if this is spelled operator bool() - you can call the variable Cancelled and write if (Cancelled) and Cancelled.set() Maybe there's a case for spelling set as operator= too, and asserting that the arg is true, but i'm less sure of that one! |
- An initial version of thread-per-file approach.
This is by no means a final version, we should definitely move things between files, do some renames, etc. before landing the final version.
Some things are not used anymore (e.g. ThreadPool), but are still in the patch, we'll need to remove them too.
The new version does not drop updates that are immidieately followed by other updates, this seems like an oversight and should be fixed too.
Please take a look at the overall design and let me know what you think. Feel free to add suggestions on how we can improve things, too!
Not a complete review, but this looks pretty good!
You probably want to read the last comment first - the gc thread is the threadpooliest bit of code left, and you may (or may not) want to eliminate it.
clangd/TUScheduler.cpp | ||
---|---|---|
317 | I tend to find thread bodies more readable as a member function, up to you | |
364 | in the spirit of "just spawn a thread, and write direct code"... can we just spawn a shared_ptr<std::thread> to do the work here, and replace GCThread with a vector<weak_ptr<std::thread>>. for (const auto &WeakThread : Cleanups) if (auto SharedThread = WeakThread.lock()) // thread may have died already SharedThread->join(); might be simpler? You need to lock the Cleanups vector, but no fiddling with CVs etc. | |
408 | this seems like where shared_ptr might help us out if we're willing to deal with it. So this would look something like:
Is this simpler? I'm not sure. It's maybe more thematically consistent :-) and it solves your "preamble got built" problem. | |
clangd/TUScheduler.h | ||
36 | Probably belongs in Threading.h instead. Maybe just Semaphore? If we have *two* semaphores in clangd, we've jumped the shark! | |
49 | Maybe FileWorker? this has-a thread, rather than is-a | |
58 | AFAICS, we require callers to setDone() before destroying, and the caller always does this immediately before destroying. Why not just make the destructor do this? | |
62 | do we want to move the callback to be a clangd-global thing, rather than a per-update thing, before this patch or later? (And will that change be mirrored here, or will FileASTThread::update still take a callback?) | |
64 | consider mirroring the runWithAST name | |
69 | (no action required, just random thoughts) one day we should really get the threading annotations (REQUIRES_EXCLUSIVE_LOCK etc macros) set up in LLVM - they're *implemented* in llvm for crying out loud... | |
71 | Having trouble seeing which parts of this are really used. I'd probably expect the 'done' state to be redundant with the one here... If it's just a deque + a lock, I'd lean towards making it really thin, maybe even a plain struct, and putting it in this file. But not sure of your thoughts here. |
clangd/TUScheduler.cpp | ||
---|---|---|
364 | The problem is that stored vector will have unbounded growth, and I think it shouldn't. |
clangd/TUScheduler.h | ||
---|---|---|
62 | I think we're fine both ways. | |
69 | Yeah, they're certainly great. The sole reason we don't have them enabled is because LLVM does not use threading that much? | |
71 | I'll probably remove RequestsQueue or turn it into something really simple like you suggest. |
Cleaned up the patch and added the missing bits.
This is in a much better shape now and should be ready for review.
I'll have a go through the existing review comments to make sure all concerns were addressed.
clangd/ClangdServer.h | ||
---|---|---|
96–97 | I tried fiddling with the idea and ended up abandoning the RequestQueue in favor of the std::queue with explicit locks. | |
130 | Done exactly that. There's a new abstraction in the patch that manages creating threads and waiting for them to finish for us. It seems pretty simple, please take a look and let me know what you think. |
clangd/ClangdServer.h | ||
---|---|---|
131 | We're not setting the priorities in the first version, but certainly agree we should add that later. | |
clangd/TUScheduler.cpp | ||
317 | Moved it as a member function and made a (questionable) decision that clients of the class are responsible for running that function on a separate thread themselves. Happy to chat about it. | |
408 | I ran into related problem with completion threads: we'd have to store somewhere in order to wait for there completion. | |
clangd/TUScheduler.h | ||
36 | Renamed to Semaphore and moved to Threading.h. | |
49 | Renamed it to ASTWorker. It doesn't really know anything really caret about files directly, just manages the AST. | |
58 | I still think that setDone() is useful to check the invariant that new requests are not being scheduled after FileData is removed. | |
71 | queue + lock + done is how we do this in the new patch. RequestQueue is removed |
Really coming together!
clangd/ASTWorker.cpp | ||
---|---|---|
1 ↗ | (On Diff #132570) | This file could really use some high-level comments about the scheduling strategy, throughout |
20 ↗ | (On Diff #132570) | ?! |
102 ↗ | (On Diff #132570) | This strategy has some upsides:
And downsides:
What's the goal here - is this a strategy to keep? Or a placeholder? Or trying to maximize compatibility with the previous code? |
clangd/ASTWorker.h | ||
1 ↗ | (On Diff #132570) | ASTWorker is an implementation detail of TUScheduler. Can we move it to TUScheduler.cpp, instead of exposing it, and remove this header? |
19 ↗ | (On Diff #132570) | This seems like an "implementation header" - nobody should depend on ASTWorker AIUI. So InputsAndAST shouldn't really go here, as it's part of the TUScheduler public interface. |
24 ↗ | (On Diff #132570) | (InputsAndPreamble is entirely unused here, I think) |
29 ↗ | (On Diff #132570) | updated -> updates |
38 ↗ | (On Diff #132570) | As discussed offline, this lifecycle is a bit complicated :-) The difficulty seems to be that we want to TUScheduler to be able to discard ASTWorkers instantly, even though we can't interrupt them.
We still have run() and setDone(), but they're private details. |
43 ↗ | (On Diff #132570) | stop()? (or stopSoon/requestStop to be more explicit) |
45 ↗ | (On Diff #132570) | can we reorder/group these by purpose/sequence? //lifecycle run(); setDone(); //write update() //read runWithAST() getPossiblyStalePreamble() //misc getUsedBytes() |
54 ↗ | (On Diff #132570) | I think this might actually be easier to read with just using Request = UniqueFunction<void()> and then spelling out pair<Request, Context>. up to you though. |
59 ↗ | (On Diff #132570) | nit: "combine ... into one class"? I'd hope we won't end up with ASTWorker, CppFile, FileInputs, *and* another thing? |
69 ↗ | (On Diff #132570) | Why? |
clangd/TUScheduler.cpp | ||
336 | destructor will do this | |
clangd/TUScheduler.h | ||
87 | This would be superseded by ASTWorkerHandle, right? | |
100 | inline this? or come find a more descriptive name :-) | |
clangd/Threading.h | ||
60 | nit: "allows to run tasks" --> "Runs tasks" A little more motivation would be useful here: this describes what is done, but not really why. e.g. Objects that need to spawn threads can own an AsyncTaskRunner to ensure they all complete on destruction. | |
63 | comment | |
65 | any need to expose this separately from the destructor? |
- Removed ASTWorker files, moved all the code to TUScheduler.cpp
- Renamed setDone to stop
- Added a comment to TUScheduler.cpp
- Addressed other review comments
clangd/ASTWorker.cpp | ||
---|---|---|
1 ↗ | (On Diff #132570) | I've added the docs to TUScheduler.cpp |
102 ↗ | (On Diff #132570) | Trying to maximize the compatibility with existing code. There are certainly other strategies to schedule updates. |
clangd/ASTWorker.h | ||
45 ↗ | (On Diff #132570) | I used a different grouping: //lifecycle run(); setDone(); //async write update() // async read runWithAST() // sync reads getPossiblyStalePreamble() getUsedBytes() Does that make sense? Or do you feel read/write vs sync/async is a better distinction? |
54 ↗ | (On Diff #132570) | I'd rather keep it as is. I specifically came up with this typedef because I got annoyed of typing pair<Request, Context>. |
59 ↗ | (On Diff #132570) | Removed the FIXME |
clangd/TUScheduler.cpp | ||
336 | It's much safer to call it explicitly to not depend no the order of fields in the class. | |
clangd/TUScheduler.h | ||
87 | It's still there, stores inputs and ASTWorkerHandle. Inputs in the ASTWorker correspond to the latest processed update, we don't expose them in the API. | |
clangd/Threading.h | ||
65 | It's a useful helper to have. |
Very nice! Thanks for adding the docs to TUScheduler implementation, makes a big difference.
Rest is almost all readability/comment bits. Substantive stuff is:
- getUsedBytes() looks racy
- I'm not sure we're choosing the right preamble
My understanding is the threading-related bits of CppFile (locking, and deferRebuild etc) are obsolete after this patch.
It's fine not to delete them here (I guess there could be some fallout), but maybe you want to add a comment in this patch marking them as obsolete?
Testing: I think this is mostly covered by the existing TUScheduler tests. I'd suggest adding a unit test to AsyncTaskRunner though: e.g. start a bunch of threads while holding a mutex that prevents them from making progress (to verify actually async), then have them increment a counter and verify that the counter is N after waitForAll() returns.
clangd/ASTWorker.cpp | ||
---|---|---|
102 ↗ | (On Diff #132570) | This sounds fine - can you add a little bit of this rationale and things we might want to change? Maybe at the end of the header comment? As it stands, it would be very easy for us to land this, move onto other things for a month, and lose the reasoning why the strategy is this way. |
clangd/TUScheduler.cpp | ||
10 | nit: store -> create? storage is a bit more complicated, but doesn't need to clutter the opening sentence. | |
12 | nit: blank line after "...FIFO order"? separating "this is the hierarchy" from "this is how we manage lifetimes". In fact, I'd consider moving the whole lifetime explanation down and merging it with the ASTWorker class comment. It's more detail than strategy. | |
13 | is shared by? | |
14 | I can't really follow these two sentences. I think we may want to give a little motivation (it's not clear why the destructor needs to not block) before introducing ASTWorkerHandle. e.g. The TUScheduler should discard an ASTWorker when remove() is called, but its thread may | |
98 | ah, this is my confusion - i mixed up "current state of File" with "current state of the file", which are almost opposites! | |
100 | Not totally obvious in this class what the mutex is for - consider a comment. | |
101 | processing. | |
152 | no-op. | |
175 | This deserves a comment for why we don't check isCancelled again... (concretely I guess, why we need it to actually get diagnostics). | |
178 | It'd separate the work from the mechanism a bit more to pull out: startTask(task, isUpdate, args...): if runSync task(args...) else grab context, bind args, push request, notify it looks like all cases fit? | |
202 | why under lock? I thought this is the only thread that can access? (if this locking is a no-op pending cleanup, fixme) | |
233 | I don't think this works - other accesses of File aren't guarded by the mutex? One option is to cache this in a member after each rebuild(), and put that member behind the mutex. It won't catch ASTs that grow over time, though :-( | |
252 | So when Done && !Requests.empty(), we complete the queue even though the file is removed. | |
267 | nit: inlining these trivial methods might aid readability | |
312 | any need for this constructor? | |
315 | This could use a comment and/or a more descriptive variable name to distinguish it from Worker.FileInputs. | |
332 | nit: just "Notify all workers that they need to stop", rather than echoing the code | |
336 | Hmm, not sure it's much safer than putting a comment on a member, but up to you. If we're not using the destructor-blocking behavior, remove the destructor or replace with an assert? | |
339–340 | Maybe clearer than STL it->second stuff: FileData &FD = Files[File] if (!FD.Worker) FD.Worker = Create(...); FD.Inputs = move(Inputs) | |
340–350 | this looks like a second copy of Inputs, we should only need one | |
387–388 | why do we want to make this copy rather than *only* getting the preamble within the task?
Ideally I guess we'd prefer V2, falling back to V1 -> V4 -> null in that order. But we have to get really unlucky for V4 to exist, whereas V2 is quite likely to exist I think. This means that only calling getPossiblyStalePreamble() once, in the task, seems more likely to get this right to me. | |
clangd/TUScheduler.h | ||
90 | nit: this name is pervasive, but it stopped me understanding what these are - PCHOps? | |
92 | Can we make this an optional<AsyncTaskRunner>, empty if we're running synchronously? (And Worker::Create() takes a pointer I guess) | |
clangd/Threading.h | ||
27 | hmm, I'm not sure there's enough value in this FIXME to make it ever worth doing. I think it will live forever and we should just drop it :-) | |
65 | Yep, the testing use case makes sense. Let's replace the dtor with an assert though, if we don't think owning this object is a safe way to guard. |
- Renamed File to AST.
- Introduced startTask().
- Moved small methods of ASTWorkerHandle to have inline definitions.
- Removed constructor of FileData.
- Replaced find() with [].
- Removed RunSync, store optional<TasksRunner> instead.
- Added a test for AsyncTasksRunner.
- Documented approach for cancelling updates.
- Addressed other review comments.
clangd/ASTWorker.cpp | ||
---|---|---|
102 ↗ | (On Diff #132570) | Added a section about cancellation to the comments in the .cpp file. |
clangd/TUScheduler.cpp | ||
14 | Added your comment. | |
98 | SG, renamed. | |
202 | That's the only way to access the AST with the current CppFile interface, I'll do a cleanup with the follow-up patch. | |
233 | It work because CppFile is thread-safe at the moment. We don't even need the lock.
That was the plan for the cleanup of CppFile.
We only report estimated memory usage and it's only accurate when there are no active requests. | |
312 | It allowed us to use llvm::make_unique. | |
315 | Added a comment. Worker.FileInputs are not exposed in the public API (for that specific reason), so hopefully shouldn't cause any confusion | |
336 | I had trouble figuring out what went wrong more than once when hitting similar errors. I also think that an extra waitForAll() in destructor doesn't hurt and makes the interface of the class saner, i.e. makes it easier to use such a class as a local variable. | |
339–340 | Thanks. Looks much better. | |
387–388 | Good point. | |
clangd/Threading.h | ||
27 | Oh, I think there is if we ever want to use it as a more general cancellation mechanism. Removed the FIXME, we can discuss that in more detail if CancellationFlag gets more widely used. | |
65 | I don't think owning the object is a safe way to guard, but waiting in destructor still seems like a nicer interface than assert (see the other comment). |
This is really great. Just one test nit.
unittests/clangd/ThreadingTests.cpp | ||
---|---|---|
35 | The current test passes if runAsync is synchronous. |
So overall here, I think that we can drop ThreadPool without much impact on the design.
Queue would stay, as a wrapper around std::thread that lets you add tasks to its runloop. It would be owned by FileData, and shutdown would be triggered by its destructor.
The advantage is one less layer here to understand, and an attractive nuisance to tweak over time.
The downsides I see: