This is an archive of the discontinued LLVM Phabricator instance.

[Support] Simplify parallelForEach{,N}
ClosedPublic

Authored by MaskRay on Jan 17 2022, 12:27 PM.

Details

Summary
  • Merge parallel_for_each into parallelForEach (this removes 1 Fn(...) call)
  • Change parallelForEach to use parallelForEachN
  • Move parallelForEachN into Parallel.cpp

My x86-64 lld executable is 100KiB smaller.
No noticeable difference in performance.

Diff Detail

Event Timeline

MaskRay created this revision.Jan 17 2022, 12:27 PM
MaskRay requested review of this revision.Jan 17 2022, 12:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 12:27 PM
MaskRay added inline comments.Jan 17 2022, 12:33 PM
llvm/include/llvm/Support/Parallel.h
246

Removing

for (; Begin != End; ++Begin)
  Fn(Begin);
return;

is still correct (my lld executable will be 9KiB smaller on top of the current decrease) but make the TaskGroup have less parallelism.

Cool, thank you for working to improve this. As mentioned on https://reviews.llvm.org/D101699, dropping the special case for 1 element will have catastrophic performance impacts for some workloads (e.g. in the MLIR/CIRCT world) because of the problems that Threading.h has with nested parallelism.

Have you tried detemplating this entirely? If something is interesting to parallelize, then the granule of work should not be tiny. I'd consider moving this to take a unique_function<void()>, which would allow moving the implementation details of all of this out of line to a .cpp file.

MaskRay updated this revision to Diff 402244.Jan 22 2022, 11:39 AM
MaskRay retitled this revision from [Support] Merge parallel_for_each into parallelForEach to [Support] Simplify parallelForEach{,N}.
MaskRay edited the summary of this revision. (Show Details)

simplify

lattner accepted this revision.Jan 22 2022, 4:59 PM

niiice, thank you!

This revision is now accepted and ready to land.Jan 22 2022, 4:59 PM
This revision was automatically updated to reflect the committed changes.
rnk added inline comments.Jan 24 2022, 8:43 AM
llvm/lib/Support/Parallel.cpp
198 ↗(On Diff #402244)

I believe the old code was templated in an effort to avoid having an indirect call in the inner loop. However, I don't think that actually matters. These constructs have so much overhead that they are only useful for coarse-grained parallelism, not vectorizable inner loops.

lattner added inline comments.Jan 24 2022, 10:45 AM
llvm/lib/Support/Parallel.cpp
198 ↗(On Diff #402244)

Right, I completely agree with you