Page MenuHomePhabricator

[OpenMPOpt][WIP] Expand parallel region merging
ClosedPublic

Authored by ggeorgakoudis on Nov 6 2020, 1:04 AM.

Details

Summary

The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.

Diff Detail

Unit TestsFailed

TimeTest
40 mslinux > Clang.CodeGen::lto-newpm-pipeline.c
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O0 /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-O0
390 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
760 mslinux > LLVM.Other::new-pass-manager.ll
Script: -- : 'RUN: at line 8'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-output -disable-verify -debug-pass-manager -passes=no-op-module /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll --check-prefix=CHECK-MODULE-PASS
200 mswindows > lld.ELF/invalid::symtab-sh-info.s
Script: -- : 'RUN: at line 4'; c:\ws\w64\llvm-project\premerge-checks\build\bin\yaml2obj.exe --docnum=1 C:\ws\w64\llvm-project\premerge-checks\lld\test\ELF\invalid\symtab-sh-info.s -o C:\ws\w64\llvm-project\premerge-checks\build\tools\lld\test\ELF\invalid\Output\symtab-sh-info.s.tmp.o

Event Timeline

ggeorgakoudis created this revision.Nov 6 2020, 1:04 AM
ggeorgakoudis requested review of this revision.Nov 6 2020, 1:04 AM
jdoerfert added inline comments.Nov 6 2020, 9:12 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
613

Don't we have to proof the absence of these calls and not "look for them". I mean foobar is not one of them but could call one of them, right?

ggeorgakoudis added inline comments.Nov 6 2020, 12:46 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
613

Yup, you're absolutely right. I was short sighted. We either have to check that there is no callpath from an in-between instruction to one of those functions and abort merging the affected regions, or entirely bail out from merging when there is any one of them declared, as we were doing before. I probably need to include omp_proc_bind and look for others too.

Also it got me thinking. What if there's a kpmc_fork_call inside a function call in the sequential region? Can a fork call be called within a master sequential region? Need to find out more.

ggeorgakoudis retitled this revision from [OpenMPOpt] Expand parallel region merging to [OpenMPOpt][WIP] Expand parallel region merging.Nov 6 2020, 1:04 PM
jdoerfert added inline comments.Nov 9 2020, 8:10 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
613

You cannot have "any" OpenMP runtime calls in the guarded parts. One easy way is to disallow any call. Next step is to add an Attributor abstract attribute (as part of OpenMPOpt) which will deduce if a function might call an OpenMP runtime function, e.g., if it never calls an external function we don't know it cannot call an OpenMP runtime function. You will also be able to use the omp assume attribtues that will be introduced shortly.

Simplify check for unmergable regions, add flag for extractor sinking in OMPIRBuilder outlining.

ggeorgakoudis marked 2 inline comments as done.Nov 11 2020, 9:58 AM

Update to correctly handle inputs to sequential regions.

Apologies for the delayed review.

Update the commit title and message.

I think we are basically set here. I left some minor comments, all but the Input alloca question and the deleted file are not worth a new round of review, I just want confirmation on those two points.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
188

If this flag is not set there should not be any instructions, right? If so, we should add an assertion for that.

Style:
I would not use the iterators as you never actually reach the end() anyway. while (true) { I = BB.front(); or while(BB.size() > 1) {... or something.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
638

Add a comment describing what this is doing on a high-level.

670

Conversion to pointer happens in the IRBuilder (after D92189 landed). This means we can omit the input alloca stuff below, right?

795

Range loop :)

801

We usually recommend to compute the end iterator only once.

902

Add some comment explaining what this is doing on a high-level.

922

A lot of heuristics could be used until we have an AAOpenMPRuntimeCalls attribute that determines if a call can result in an OpenMP runtime call.
Lifetime intrinisics is fine, any intrinsic should be OK actually.

932

Reverse the order ("early exits") to reduce indention:

`
if terminator
  ...
if !call
  ...
call handling.
934–937

If BB->end() stays the same capture it; It = .., End = ...; It != End

999

Can we make a helper for this, pass he OMPRTL kind and it does the 3 steps.

llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll
412

Why was this file deleted?

ggeorgakoudis marked 11 inline comments as done.

Updated for comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
670

That's correct. Removed.

922

Makes sense. Updated to accept any intrinsic call as mergable.

llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll
412

There were different test files for the legacy and the new PMs because of them processing changes in the CG in opposite order, so function names for outlined functions wouldn't match. This has been resolved in D90566,

jdoerfert accepted this revision.Jan 6 2021, 10:11 AM

I Left some more comments that can be addressed before the commit. Otherwise LGTM.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
688

If you just want the users of instructions in SeqStartBB outside of that BB we don't need the extractor logic which probably does a lot more than we need. If you agree, maybe something like this:

for (Instruction &I : *SeqStartBB) {
  SmallPtrSet<Instruction *, 4> OutsideUsers;
  for (User &Usr : I.users()) {
    Instruction &UsrI = *cast<Instruction>(Usr);
    if (UsrI.getParent() != SeqStartBB)
      OutsideUsers.insert(&UsrI);
  }
  if (OutsideUsers.empty())
    continue;
  // Do the stuff as below.
}
916

Early exit:

if (!isa<Call>)
  return true;
919

if (!CalledFunction) return false;

925

const auto &RFI :

957

Nit: pre-increment everywhere above: ++It

This revision is now accepted and ready to land.Jan 6 2021, 10:11 AM
This revision was landed with ongoing or failed builds.Jan 11 2021, 8:06 AM
This revision was automatically updated to reflect the committed changes.
ggeorgakoudis marked 4 inline comments as done.