The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
613 | Don't we have to proof the absence of these calls and not "look for them". I mean foobar is not one of them but could call one of them, right? |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
613 | Yup, you're absolutely right. I was short sighted. We either have to check that there is no callpath from an in-between instruction to one of those functions and abort merging the affected regions, or entirely bail out from merging when there is any one of them declared, as we were doing before. I probably need to include omp_proc_bind and look for others too. Also it got me thinking. What if there's a kpmc_fork_call inside a function call in the sequential region? Can a fork call be called within a master sequential region? Need to find out more. |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
613 | You cannot have "any" OpenMP runtime calls in the guarded parts. One easy way is to disallow any call. Next step is to add an Attributor abstract attribute (as part of OpenMPOpt) which will deduce if a function might call an OpenMP runtime function, e.g., if it never calls an external function we don't know it cannot call an OpenMP runtime function. You will also be able to use the omp assume attribtues that will be introduced shortly. |
Simplify check for unmergable regions, add flag for extractor sinking in OMPIRBuilder outlining.
Apologies for the delayed review.
Update the commit title and message.
I think we are basically set here. I left some minor comments, all but the Input alloca question and the deleted file are not worth a new round of review, I just want confirmation on those two points.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
188 | If this flag is not set there should not be any instructions, right? If so, we should add an assertion for that. Style: | |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
641 | Add a comment describing what this is doing on a high-level. | |
673 | Conversion to pointer happens in the IRBuilder (after D92189 landed). This means we can omit the input alloca stuff below, right? | |
796 | Range loop :) | |
802 | We usually recommend to compute the end iterator only once. | |
903 | Add some comment explaining what this is doing on a high-level. | |
923 | A lot of heuristics could be used until we have an AAOpenMPRuntimeCalls attribute that determines if a call can result in an OpenMP runtime call. | |
925–926 | If BB->end() stays the same capture it; It = .., End = ...; It != End | |
933 | Reverse the order ("early exits") to reduce indention: ` if terminator ... if !call ... call handling. | |
991 | Can we make a helper for this, pass he OMPRTL kind and it does the 3 steps. | |
llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll | ||
412 | Why was this file deleted? |
Updated for comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
673 | That's correct. Removed. | |
923 | Makes sense. Updated to accept any intrinsic call as mergable. | |
llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll | ||
412 | There were different test files for the legacy and the new PMs because of them processing changes in the CG in opposite order, so function names for outlined functions wouldn't match. This has been resolved in D90566, |
I Left some more comments that can be addressed before the commit. Otherwise LGTM.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
691 | If you just want the users of instructions in SeqStartBB outside of that BB we don't need the extractor logic which probably does a lot more than we need. If you agree, maybe something like this: for (Instruction &I : *SeqStartBB) { SmallPtrSet<Instruction *, 4> OutsideUsers; for (User &Usr : I.users()) { Instruction &UsrI = *cast<Instruction>(Usr); if (UsrI.getParent() != SeqStartBB) OutsideUsers.insert(&UsrI); } if (OutsideUsers.empty()) continue; // Do the stuff as below. } | |
917 | Early exit: if (!isa<Call>) return true; | |
920 | if (!CalledFunction) return false; | |
926 | const auto &RFI : | |
949 | Nit: pre-increment everywhere above: ++It |
If this flag is not set there should not be any instructions, right? If so, we should add an assertion for that.
Style:
I would not use the iterators as you never actually reach the end() anyway. while (true) { I = BB.front(); or while(BB.size() > 1) {... or something.