Page MenuHomePhabricator

[OpenMPOpt][WIP] Expand parallel region merging

Authored by ggeorgakoudis on Nov 6 2020, 1:04 AM.



The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.

Diff Detail

Unit TestsFailed

60 mslinux > Clang.CodeGen::lto-newpm-pipeline.c
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O0 /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-O0
360 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
680 mslinux > LLVM.Other::new-pass-manager.ll
Script: -- : 'RUN: at line 8'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-output -disable-verify -debug-pass-manager -passes=no-op-module /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll --check-prefix=CHECK-MODULE-PASS

Event Timeline

ggeorgakoudis created this revision.Nov 6 2020, 1:04 AM
ggeorgakoudis requested review of this revision.Nov 6 2020, 1:04 AM
jdoerfert added inline comments.Nov 6 2020, 9:12 AM

Don't we have to proof the absence of these calls and not "look for them". I mean foobar is not one of them but could call one of them, right?

ggeorgakoudis added inline comments.Nov 6 2020, 12:46 PM

Yup, you're absolutely right. I was short sighted. We either have to check that there is no callpath from an in-between instruction to one of those functions and abort merging the affected regions, or entirely bail out from merging when there is any one of them declared, as we were doing before. I probably need to include omp_proc_bind and look for others too.

Also it got me thinking. What if there's a kpmc_fork_call inside a function call in the sequential region? Can a fork call be called within a master sequential region? Need to find out more.

ggeorgakoudis retitled this revision from [OpenMPOpt] Expand parallel region merging to [OpenMPOpt][WIP] Expand parallel region merging.Nov 6 2020, 1:04 PM
jdoerfert added inline comments.Nov 9 2020, 8:10 AM

You cannot have "any" OpenMP runtime calls in the guarded parts. One easy way is to disallow any call. Next step is to add an Attributor abstract attribute (as part of OpenMPOpt) which will deduce if a function might call an OpenMP runtime function, e.g., if it never calls an external function we don't know it cannot call an OpenMP runtime function. You will also be able to use the omp assume attribtues that will be introduced shortly.

Simplify check for unmergable regions, add flag for extractor sinking in OMPIRBuilder outlining.

ggeorgakoudis marked 2 inline comments as done.Nov 11 2020, 9:58 AM

Update to correctly handle inputs to sequential regions.

Apologies for the delayed review.

Update the commit title and message.

I think we are basically set here. I left some minor comments, all but the Input alloca question and the deleted file are not worth a new round of review, I just want confirmation on those two points.


If this flag is not set there should not be any instructions, right? If so, we should add an assertion for that.

I would not use the iterators as you never actually reach the end() anyway. while (true) { I = BB.front(); or while(BB.size() > 1) {... or something.


Add a comment describing what this is doing on a high-level.


Conversion to pointer happens in the IRBuilder (after D92189 landed). This means we can omit the input alloca stuff below, right?


Range loop :)


We usually recommend to compute the end iterator only once.


Add some comment explaining what this is doing on a high-level.


A lot of heuristics could be used until we have an AAOpenMPRuntimeCalls attribute that determines if a call can result in an OpenMP runtime call.
Lifetime intrinisics is fine, any intrinsic should be OK actually.


If BB->end() stays the same capture it; It = .., End = ...; It != End


Reverse the order ("early exits") to reduce indention:

if terminator
if !call
call handling.

Can we make a helper for this, pass he OMPRTL kind and it does the 3 steps.


Why was this file deleted?

ggeorgakoudis marked 11 inline comments as done.

Updated for comments.


That's correct. Removed.


Makes sense. Updated to accept any intrinsic call as mergable.


There were different test files for the legacy and the new PMs because of them processing changes in the CG in opposite order, so function names for outlined functions wouldn't match. This has been resolved in D90566,

jdoerfert accepted this revision.Jan 6 2021, 10:11 AM

I Left some more comments that can be addressed before the commit. Otherwise LGTM.


If you just want the users of instructions in SeqStartBB outside of that BB we don't need the extractor logic which probably does a lot more than we need. If you agree, maybe something like this:

for (Instruction &I : *SeqStartBB) {
  SmallPtrSet<Instruction *, 4> OutsideUsers;
  for (User &Usr : I.users()) {
    Instruction &UsrI = *cast<Instruction>(Usr);
    if (UsrI.getParent() != SeqStartBB)
  if (OutsideUsers.empty())
  // Do the stuff as below.

Early exit:

if (!isa<Call>)
  return true;

if (!CalledFunction) return false;


const auto &RFI :


Nit: pre-increment everywhere above: ++It

This revision is now accepted and ready to land.Jan 6 2021, 10:11 AM
This revision was landed with ongoing or failed builds.Jan 11 2021, 8:06 AM
This revision was automatically updated to reflect the committed changes.
ggeorgakoudis marked 4 inline comments as done.