This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPOpt] Merge parallel regions
ClosedPublic

Authored by ggeorgakoudis on Jul 11 2020, 5:58 PM.

Details

Summary

There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way. This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Diff Detail

Event Timeline

ggeorgakoudis created this revision.Jul 11 2020, 5:58 PM

Note I also update the regression test Transforms/OpenMP/parallel_deletion.ll since the merging optimization, which applies after deletion, changes the output

It's great that you're working on this. It's very important that we allow people to write code, structured and decomposed in a way that makes sense from an engineering and maintenance perspective, and have the compiler combine things later to avoid unnecessary overhead. This is just as much true for expressions of parallelism as it is for other aspects of the code.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
757

Comments in LLVM should be complete sentences and end with appropriate punctuation (here and a few other places).

764

Uneeded {} (here and a few other places).

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14

Do these need to say 'nowait' in order to actually match the code?

Update for comments

It's great that you're working on this. It's very important that we allow people to write code, structured and decomposed in a way that makes sense from an engineering and maintenance perspective, and have the compiler combine things later to avoid unnecessary overhead. This is just as much true for expressions of parallelism as it is for other aspects of the code.

Thank you, Hal! I fully agree with this motivation.

ggeorgakoudis marked 4 inline comments as done.Jul 14 2020, 12:36 PM
ggeorgakoudis added inline comments.
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14

No, the code matches the IR generation ('nowait' is available only for worksharing constructs). The merging transformation emits an explicit barrier in place of the implicit barrier after merging. However, I added a TODO because it could make sense to avoid explicit barrier generation, if the 'nowait' clause is present in an enclosing worksharing construct, and the merged parallel regions are independent.

hfinkel added inline comments.Jul 14 2020, 12:41 PM
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14

Oh, right. Indeed. Thanks.

We should do barrier elimination anyway, so that's certainly a good point that we should make sure that the barrier elimination can apply to these kinds of barriers (or we can not emit them in the first place, although maybe it's easier to do after we have anything together where we can just query the MemorySSA use/def chain or similar to look at underlying dependence information).

I'll have to look at the logic later today. Some initial comments below.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2156

Style: No else after return.

Please commit this separately, LGTM for this change.

llvm/test/Transforms/OpenMP/parallel_deletion.ll
416 ↗(On Diff #277933)

Can you run the update script on the test and commit the change as an NFC right away. Weird that the test claims it is generated but no check lines are here.

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14

FWIW, while my proposal for parallel nowait was briefly discussed, I did not spend enough time on it to get it into OpenMP 5.1. The questions how to synchronize with the threads afterwards and what guarantees you have are interesting.

246

Remove all oft the attributes if possible, at least all the string attributes. It is also weird that we have optnone here and actually do something, I guess OpenMP opt doesn't properly call skipSCC/skipFunction? Remove it here (to make it a problem for tomorrow).

ggeorgakoudis marked an inline comment as done.

Update for comments

  • Push preserved analyses fix in another commit
  • Update parallel_deletion.ll in a NFC commit
  • Update regression tests
ggeorgakoudis marked 3 inline comments as done.Jul 15 2020, 6:23 PM

Update diff for linting warning?

Sorry for my slow review. I added a bunch of comments, mostly minor things and requests for some extra tests.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
611

Nit: Initialize them with nullptr and place an assert at the use site, makes it easier to trace a bug if we ever mess up.

644

We should also emit a remark here. And one pointing at each of the merged parallel regions.

651

Nit: For some reason it is "common" to use abc.dce.xyz as naming scheme for blocks (I think). Maybe we should go with something similar to the OMPIRBuilder, i.a., omp.par.expanded. Just a suggestion.

660

I think we need some tests for these at some point. One with proc-binds and one with combinations of cancellable and non-cancellable regions. I think it "should just work" but we need to make sure the resulted expanded parallel region does what we would expect. I would assume cancellable is actually not a problem at all, proc-bind specified for the first of two that are merged might be. I expect we merge and both now have have the proc-bind now for the entire region. This is not ideal. TBH, the real problem is that __kmpc_push_proc_bind exists and the value is not passed to __kmpc_fork_call. On top of that, we only generate the __kmpc_push_proc_bind if we need it. That leaves the (potential) situation in which the call is present but we don't see it. So we need to either use the ICV tracking facility here or change the way we generate code. I guess always emitting __kmpc_push_proc_bind is the easiest way to handle this. Then we can look for the call and verify the binding is the same. WDYT?

710

Do we need to finalize the builder every time here (or at all)?

734

typo

742

Do we want to print this for each use or once at the end?

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319

The update test script cannot add check lines for new functions yet. Can you manually add some minimal check lines to ensure the new function we create looks as expected, e.g., has regular calls to the former parallel regions?

sstefan1 added inline comments.Jul 18 2020, 12:41 PM
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319

Maybe I can take a look at that since I'm now somewhat familiar with the script.

jdoerfert added inline comments.
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319

there is a patch for the _cc_ script already: D83004. I suggested there to make it generic (common.py) for _cc_ and the opt script, maybe we can do it in 2 steps instead. < @greened

Update for comments

ggeorgakoudis marked 12 inline comments as done.Jul 27 2020, 12:15 PM
ggeorgakoudis added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
517

I have intentionally added re-running deduplication before merging. It helps removing emitted calls to __kmpc_global_thread_num, and that enables merging of parallel regions by removing redundant in-between instructions (at least at this point that merging is very defensive)

611

Asserts on uses within BodyGenCB

644

We emit a remark in line 708 that identifies merged parallel regions. What do you have in mind? Is it a single remark that contains identifiers for all merged regions?

660

This is tricky. What if merged parallel regions have different proc_bind clauses? Is it possible to call __kmpc_push_proc_bind within a parallel region, during parallel execution?

710

This finalize call is indeed unnecessary. The only call to finalize needed is when outlining the new, merged parallel region.

742

I have moved debug prints in the loop of the BB2PRMap data structure and made it more informative (add how many parallel mergeable parallel regions have been found)

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319

I see the patch is accepted but not yet committed. Waiting for it to be upstreamed.

ggeorgakoudis marked 3 inline comments as done.Jul 27 2020, 1:00 PM
ggeorgakoudis marked 3 inline comments as not done.
ggeorgakoudis added inline comments.Jul 27 2020, 1:31 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
660

I had a quick look in the OpenMP runtime implementation to answer my question. The proc_bind setting is handled within the implementation of the fork call at team creation, so calling it within an executing parallel region should have no effect on it. Does it make the most sense to just emit a call to __kmpc_push_proc_bind, falling back to default binding? That will invalidate proc_bind settings on merged parallel regions, is that within OpenMP specification?

jdoerfert added inline comments.Jul 28 2020, 10:39 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
517

looks good.

644

I guess one per parallel region with the first one saying "merged the following parallel region into this one", and the others saying "merged into the parallel region above". The one we have below just says "we merged something" (IIRC).

ggeorgakoudis marked 3 inline comments as done.Sep 18 2020, 3:55 PM

Updates per comments

  • Add option to enable region merging, default is disable
  • Defensively abort merging if there are proc_bind affinities
  • Provide more informative optimization remarks
  • Update tests for cancellable regions, different files per pass manager due to testing incompatibility
ggeorgakoudis marked 3 inline comments as done.Sep 28 2020, 10:01 AM
ggeorgakoudis added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
42

Option to enable parallel region merging

606

Check for explicit proc_bind affinity

653

Updated Remark

744

Updated Remark

Fix for .ll files appearing as binary

ggeorgakoudis marked 3 inline comments as done.Sep 28 2020, 10:26 AM
ggeorgakoudis added inline comments.
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319

Use the new version of the script that includes generated functions

Fix clang-tidy complaints

jdoerfert accepted this revision.Oct 5 2020, 9:16 PM

A few minor nits, otherwise LGTM.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
524

run this conditional on the result of merge:

if (merge...()) {
 deduplicate..
 Changed = true;
}
692

I think both todos are handled at this point.

728

No need for this. Any pass that might use this will make the connection on its own and treating the two arguments special is weird.

732

Can you use the variables instead of 1 here.

823

I think we have to collect the uses of the barrier as well.

This revision is now accepted and ready to land.Oct 5 2020, 9:16 PM

Update for comments

Update for comments (for real)

ggeorgakoudis marked 5 inline comments as done.Oct 6 2020, 8:21 PM
Harbormaster completed remote builds in B74217: Diff 296586.
ggeorgakoudis retitled this revision from [OpenMPOpt][WIP] Merge parallel regions to [OpenMPOpt] Merge parallel regions.

Update commit message

Amend commit message

This revision was automatically updated to reflect the committed changes.