This is needed for parallelizing of loading modules symbols in LLDB (D122975). Currently LLDB can parallelize indexing symbols when loading a module, but modules are loaded sequentially. If LLDB index cache is enabled, this means that the cache loading is not parallelized, even though it could. However doing that creates a threadpool-within-threadpool situation, so the number of threads would not be properly limited.
This change adds ThreadPoolTaskGroup as a tag type that can be used with ThreadPool calls to put tasks into groups that can be independently waited for (even recursively from within a task) but still run in the same thread pool.
Adding a mention of the concept of groups here would be valuable as well I think.