- Merge parallel_for_each into parallelForEach (this removes 1 Fn(...) call)
- Change parallelForEach to use parallelForEachN
- Move parallelForEachN into Parallel.cpp
My x86-64 lld executable is 100KiB smaller.
No noticeable difference in performance.
Differential D117510
[Support] Simplify parallelForEach{,N} MaskRay on Jan 17 2022, 12:27 PM. Authored by
Details
My x86-64 lld executable is 100KiB smaller.
Diff Detail
Event Timeline
Comment Actions Cool, thank you for working to improve this. As mentioned on https://reviews.llvm.org/D101699, dropping the special case for 1 element will have catastrophic performance impacts for some workloads (e.g. in the MLIR/CIRCT world) because of the problems that Threading.h has with nested parallelism. Have you tried detemplating this entirely? If something is interesting to parallelize, then the granule of work should not be tiny. I'd consider moving this to take a unique_function<void()>, which would allow moving the implementation details of all of this out of line to a .cpp file.
|
Removing
is still correct (my lld executable will be 9KiB smaller on top of the current decrease) but make the TaskGroup have less parallelism.