These algorithms are useful in many contexts, not just LLD. In particular, LLDB currently has need for similar functionality and it would be unfortunate to have to roll our own when similar functionality already exists.
I think it woudl be better if improvements were made independently of a code move. Because if you combine improvements and code moves, the improvements kind of get "lost" in the history.
There's a somewhat annoying bug https://sourceware.org/bugzilla/show_bug.cgi?id=19951 present in all versions of glibc, which can lead to crashes if the thread happens to exit while it is being detached. I see no synchronization here which would prevent that, so I guess whether the crash happens depends on the type of work we will do here.
The workaround is to add some synchronization which makes sure these two things don't happen concurrently (adds some overhead, but if done right it can be negligible), or create the threads in a detached state (impossible through the std::thread API).
Ok, I think the situation is not as bad as I first thought. The synchronization in the Latch destructor should make sure this does not happen for the worker threads.
This just leaves the spawner thread as a question. It may be possible it will be slow enough that it will not trigger the bug, but heavy system load can do wonders to thread scheduling...
We have llvm/include/llvm/Support/ThreadPool.h and llvm/include/llvm/Support/thread.h. Did you take a look at these files? It seems parallel_for and parallel_for_each can be implemented on top of them.
I might be willing to take a stab at using llvm/Support/thread.h, but it should be in a separate patch, because I think this should one not be any functional change. Using llvm/Support/thread.h would potentially allow us to get rid of some of the ugly preprocessor defines in Parallel.h though, so I think we should do it.
Using ThreadPool.h seems a little more risky. It's almost certain there are subtle differences in the semantics between the two things being used, so it would need to be done much more carefully.
Yes, that can be done after moving this code. I'm not sure if I'm the right person to LGTM on adding new file to llvm/Support, but at least for moving these files out of LLD, it LGTM. Thank you for doing this!
So, I understand that this is just moving code from LLD to LLVM, but this is pretty complex and subtle code. I think it needs really careful and thorough review. I'm going to try to plan some time for that, but I wonder -- are there any meaningful splits you can make to introduce this more incrementally to LLVM
If we are going to examine this file, I strongly recommend porting the same functionalities (i.e. parallel_for and parallel_for_each) on top of llvm/Support/ThreadPool.h. I don't think that is hard to do as TheadPool.h seems to provide the same functionality as this file. That would be easier than review this file again.
If we are going to do that, I would definitely rather do it in a follow-up :-/
As this code has been used in LLD for over 4 years, it seems at least to pass the stability bar. So if we want to make improvements by porting to llvm::ThreadPool and then re-reviewing it in its entirety, we can do that, but I don't think it should be part of this change.
One idea might be to put this in early as a straight move, then in a follow-up patch split it up so that:
- All of the LLVM_ENABLE_THREADS=0 stuff is remains in the .h file.
- All of the Windows / ConcRT codepath is in Windows/Parallel.inc
- All of the custom implementations are in Posix/Parallel.inc.
This sort of address the concern Chandler had of splitting it up so it's more incremental, but it does so after the fact. I'm a little hesitant to make functional changes at the same time as a code move, since it loses history, so it would be nice if this could be a straight move.
It's been used for many years in LLD, but that doesn't cover all possible use cases. I think this implementation actually has a bug that if you call parallel_for from a function that is called by parallel_for, something goes wrong, IIRC. I didn't take a look into this at that moment as that use case does not exist in LLD, but that's a valid use case.
So, my concern about reviewing this isn't just about the implementation, it's also about the *API*. We need to make sure that the API design for parallel building blocks that we want to be available for the entire LLVM project are right. That's the particular split I was looking for...
There is also the fact that all of the code is currently still in LLD's old coding convention and needs to be updated to LLVM's coding conventions before it moves IMO.
Fair enough, I'll work on fixing up the coding conventions and segregating the Windows/non-Windows codepaths so it's a little less ugly to look at. In the meantime, feel free to either continue reviewing as time permits or wait for an update.
Talked to chandlerc@ offline about this. I'm going to work on splitting it up a bit inside of LLD and trying to get the API to match the C++ 17 Parallel TS as closely as possible. Once that's done, I'll upload a new patch.
@scott.smith , if you want to use this in LLDB in the interim, you can copy this file into LLDB/Utility.h, change up the namespaces, delete the parallel_sort since we don't need it, and remove as much unused code as possible. Then, once we get it into LLVM, it should at least be a straightforward change in LLDB to port it over to the common version.