We have a multi-platform thread priority setting function(last piece
landed with D58683), I wanted to make this available to all llvm community,
there seem to be other users of such functionality with portability fixmes:
lib/Support/CrashRecoveryContext.cpp
tools/clang/tools/libclang/CIndex.cpp
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 29082 Build 29081: arc lint + arc unit
Event Timeline
None of these check the return value of the priority calls. What's expected if it fails?
Do these actually give the same semantics? I'm worried that this will get used and on some platforms a thread might stop doing progress at all whereas on other platforms they'll still make progress. What's the intended use of this?
We can change it to return true on success and false on failure.
Do these actually give the same semantics? I'm worried that this will get used and on some platforms a thread might stop doing progress at all whereas on other platforms they'll still make progress. What's the intended use of this?
These are the functions with closest semantics in those three platforms. It tries to lower thread's priority such that it only runs if there's no non-background task running.
That doesn't answer my question though: if someone sets priority and on the platform the test witness acceptable forward progress, will it happen that another platform provide no forward progress for the same work. For example, some implementations don't let a background thread perform any I/O. In such a case, a mere logging statement would halt the background thread's progress.
What threads does LLVM have that require background priority?
Fair enough. But is this a theoretical concern for some abstract platform that might implement this function in future, or is it a concrete concern for one of the implementations in this review?
The desired semantics for our use cases (see below) is to merely lower the CPU and I/O priority, where possible, so that heavy background work does not degrade the interactive user experience serviced by the threads in the same process.
What threads does LLVM have that require background priority?
Source code indexing in libclang and clangd are the two use cases that we are interested in.
lib/Support/Unix/Threading.inc | ||
---|---|---|
222 | Move into the __linux__ section. |
You're adding a "portable" abstraction to different platform APIs. The onus here is on you to make sure that the abstraction is portable, and to point at documentation which says that each platform you're adding support for has the same behavior. Ideally you'd also have a test which shows that a background thread won't indefinitely hang when this is used, but that's not really feasible IMO. If there are caveats between different platforms, then your new API should document said caveat (e.g. "if you do I/O in such and such manner, the thread many not get any progress if the rest of the application is doing such and such other thing").
My worry is that your new API gets used, and eventually we get bug reports about things mysteriously hanging because people added features to a thread, not knowing the semantics that the API gave to that thread.
The desired semantics for our use cases (see below) is to merely lower the CPU and I/O priority, where possible, so that heavy background work does not degrade the interactive user experience serviced by the threads in the same process.
What threads does LLVM have that require background priority?
Source code indexing in libclang and clangd are the two use cases that we are interested in.
This seems like something you'd want to do for an indexing process, not just a thread? Maybe I misunderstand the setup, but if it's for indexing I'd like @arphaman and @jkorous to chime in.
We are already using PRIO_DARWIN_BG in couple other places - libclang and CrashRecoveryContext in llvm. According to the linked man page this priority causes I/O to be "throttled" but not halted on macOS.
I definitely see the appeal of having a test but I'm afraid we'd have to speculate about in which ways are going to be future supported platforms weird (e. g. " a mere logging statement would halt the background thread's progress") in order to write a test that could actually find some issues. I feel like adding regression tests as we go and encounter issues will be more effective.
I also feel that some absolutely generic API for threading is kind of impossible as differences between platforms are real but don't think we really need that level of genericity.
LGTM.
I would port it to NetBSD... but I don't see use-case for it. I think kernel should be allowed to schedule priorities on its own without manual help.
There is a reasonable limit to portability. For example, someone might come along and object to filesystem APIs that traverse directories on the basis that there can be a system that does not support nested directories, or uses a content-addressable filesystem. (Both of those exist in a real world BTW). However, that is not a valid basis for not adding such APIs.
It depends on the indexing architecture. libclang and clangd's background indexing run indexing in process of the language server, while Apple's (to-be-upstreamed) index-while-building uses indexing in-process of the compiler, out-of-process of the language server.
For such scheduling decisions to be globally good, for example, to figure out that such and such thread should make progress to unblock something, to unblock something, to react to the user input, to draw something on the screen, the kernel needs a lot of smart behavior.
We have observed regressions in UI responsiveness on Linux and Windows when clangd is doing background indexing at normal priority.
I'd love for the kernel to figure it out automatically for us, but the experience tells us that we're not there yet.
Sure, but I'm asking fro something simple here, and I think it's reasonable: please do the research required to show that the supported platforms behave the same. Document what the acceptable and unacceptable uses of the new API are. Are there gotchas? What are their symptoms?
I ask because oftentimes if things like these add a bug it'll manifest weeks later, on someone else's machine, because the new API was used in a new and exciting way which the author didn't foresee. You (the person LGTM'ing the code, and the person who wrote it) won't be the ones debugging the new issue. You're passing on a cost to someone else.
So yeah, there's a reasonable limit to portability, but don't foist a cost on others. Do a bit of homework is all I ask.
For example, someone might come along and object to filesystem APIs that traverse directories on the basis that there can be a system that does not support nested directories, or uses a content-addressable filesystem. (Both of those exist in a real world BTW). However, that is not a valid basis for not adding such APIs.
Agreed that your example isn't relevant to this patch.
Gotchas and symptoms are IMO dependent on the use-case. Since this is a generic API I'd say that documenting the behavior is the best we can do here.
So how about @kadircet just adds a note to the header that platform specific details might be important here (depending on the use-case) and that more information is in comments in relevant implementation? Is that what you mean @jfb?
Realistically nobody reads the comments, so I'm OK not having them. I just want to know that @kadircet read the documentation for the platform-specific APIs being introduced, and has reasonable suspicion that no gotcha is being introduced.
Realistically nobody reads the comments, so I'm OK not having them. I just want to know that @kadircet read the documentation for the platform-specific APIs being introduced, and has reasonable suspicion that no gotcha is being introduced.
Have you checked the latest changes in the patch? I've already added comments for those platform specific behaviors as documented in their relevant web pages(because you've requested before). And yes, I had already read the documentation beforehand, AFAICT no gotcha is being introduced.
Please feel free to either just read the parts of the documentation I included in comments or sources themselves in urls and notify if there are some gotchas in any platforms.
We have observed regressions in UI responsiveness on Linux and Windows when clangd is doing background indexing at normal priority.
I'm not refreshed with the Linux status but there are options for scheduling policies for UI responsiveness. Solaris used 'Interactive (IA)' scheduler designed for these purposes.
On NetBSD this trick is not allowed to manually set the priority level, as POSIX does not mandate this option to be supported; thus all the scheduling of SCHED_OTHER goes through automatic adapting of prioritizes by the kernel scheduler.
If there are issues on Linux and Windows, I would specify it explicitly that this is the reason why to implement it for these OSs (and not expected to be added for all OSs).
I had, and it doesn't really answer my questions. Maybe the documentation is lacking. I'm not sure the point I'm trying to make is getting across, so I'll set my own priority to background.
include/llvm/Support/Threading.h | ||
---|---|---|
182 | I'm not a fan of returning bool. Please return a custom enum class. | |
lib/Support/Unix/Threading.inc | ||
234 | The manpage says "On success, these functions return 0; on error, they return a nonzero error number". | |
251 | The manpage says "The setpriority() call returns 0 if there is no error, or -1 if there is". | |
252 | The function doesn't return in all cases. |
Could you help me get your point then? I would like to address everyone's concerns in such a patch, sorry for not getting it the first time :/
I thought you were asking to make sure all of the platform specific functions I've used is behaving ~same, meaning lowering a thread's priority and letting it run in a way that won't affect foreground threads, without any extra gotchas like "doing IO is forbidden".
This was the thing I first checked while coming up with this patch already and per your request I've also put the sources I've checked into the comments with most relevant details inlined.
include/llvm/Support/Threading.h | ||
---|---|---|
182 | I don't think it is feasible for every function that wants to indicate Failure/Success to come up with its own custom enum. Going with it anyway per you request, but wondering if at least llvm::Error would be a better choice or not ? | |
lib/Support/Unix/Threading.inc | ||
234 | hence the not ? | |
251 | hence the not ? | |
252 | Thanks! |
I'm not a fan of returning bool. Please return a custom enum class.