This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
28/28
OpenMPOpt.cpp
-
test/Transforms/OpenMP/
-
Transforms/
-
OpenMP/
10/10
parallel_region_merging.ll
-
parallel_region_merging_legacy_pm.ll

Differential D83635

[OpenMPOpt] Merge parallel regions
ClosedPublic

Authored by ggeorgakoudis on Jul 11 2020, 5:58 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
sstefan1
baziotis

Commits

rG3a6bfcf2f902: [OpenMPOpt] Merge parallel regions

Summary

There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way. This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ggeorgakoudis created this revision.Jul 11 2020, 5:58 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJul 11 2020, 5:58 PM

Herald added a reviewer: jdoerfert. · View Herald Transcript

Herald added a reviewer: sstefan1. · View Herald Transcript

Herald added a reviewer: baziotis. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, bbn, aaron.ballman and 5 others. · View Herald Transcript

Note I also update the regression test Transforms/OpenMP/parallel_deletion.ll since the merging optimization, which applies after deletion, changes the output

Harbormaster completed remote builds in B63881: Diff 277269.Jul 11 2020, 6:35 PM

It's great that you're working on this. It's very important that we allow people to write code, structured and decomposed in a way that makes sense from an engineering and maintenance perspective, and have the compiler combine things later to avoid unnecessary overhead. This is just as much true for expressions of parallelism as it is for other aspects of the code.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
757	Comments in LLVM should be complete sentences and end with appropriate punctuation (here and a few other places).
764	Uneeded {} (here and a few other places).
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14	Do these need to say 'nowait' in order to actually match the code?

Update for comments

In D83635#2146375, @hfinkel wrote:

It's great that you're working on this. It's very important that we allow people to write code, structured and decomposed in a way that makes sense from an engineering and maintenance perspective, and have the compiler combine things later to avoid unnecessary overhead. This is just as much true for expressions of parallelism as it is for other aspects of the code.

Thank you, Hal! I fully agree with this motivation.

ggeorgakoudis marked 4 inline comments as done.Jul 14 2020, 12:36 PM

ggeorgakoudis added inline comments.

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14	No, the code matches the IR generation ('nowait' is available only for worksharing constructs). The merging transformation emits an explicit barrier in place of the implicit barrier after merging. However, I added a TODO because it could make sense to avoid explicit barrier generation, if the 'nowait' clause is present in an enclosing worksharing construct, and the merged parallel regions are independent.

hfinkel added inline comments.Jul 14 2020, 12:41 PM

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14	Oh, right. Indeed. Thanks. We should do barrier elimination anyway, so that's certainly a good point that we should make sure that the barrier elimination can apply to these kinds of barriers (or we can not emit them in the first place, although maybe it's easier to do after we have anything together where we can just query the MemorySSA use/def chain or similar to look at underlying dependence information).

Harbormaster completed remote builds in B64207: Diff 277933.Jul 14 2020, 12:41 PM

I'll have to look at the logic later today. Some initial comments below.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2156	Style: No else after return. Please commit this separately, LGTM for this change.
llvm/test/Transforms/OpenMP/parallel_deletion.ll
416 ↗	(On Diff #277933)	Can you run the update script on the test and commit the change as an NFC right away. Weird that the test claims it is generated but no check lines are here.
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
14	FWIW, while my proposal for `parallel nowait` was briefly discussed, I did not spend enough time on it to get it into OpenMP 5.1. The questions how to synchronize with the threads afterwards and what guarantees you have are interesting.
246	Remove all oft the attributes if possible, at least all the string attributes. It is also weird that we have `optnone` here and actually do something, I guess OpenMP opt doesn't properly call skipSCC/skipFunction? Remove it here (to make it a problem for tomorrow).

Update for comments

Push preserved analyses fix in another commit
Update parallel_deletion.ll in a NFC commit
Update regression tests

ggeorgakoudis marked 3 inline comments as done.Jul 15 2020, 6:23 PM

Update diff for linting warning?

Harbormaster failed remote builds in B64446: Diff 278346!Jul 15 2020, 6:58 PM

Harbormaster failed remote builds in B64447: Diff 278347!Jul 15 2020, 7:04 PM

Sorry for my slow review. I added a bunch of comments, mostly minor things and requests for some extra tests.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
611	Nit: Initialize them with `nullptr` and place an assert at the use site, makes it easier to trace a bug if we ever mess up.
644	We should also emit a remark here. And one pointing at each of the merged parallel regions.
651	Nit: For some reason it is "common" to use `abc.dce.xyz` as naming scheme for blocks (I think). Maybe we should go with something similar to the OMPIRBuilder, i.a., `omp.par.expanded`. Just a suggestion.
660	I think we need some tests for these at some point. One with proc-binds and one with combinations of cancellable and non-cancellable regions. I think it "should just work" but we need to make sure the resulted expanded parallel region does what we would expect. I would assume cancellable is actually not a problem at all, proc-bind specified for the first of two that are merged might be. I expect we merge and both now have have the proc-bind now for the entire region. This is not ideal. TBH, the real problem is that `__kmpc_push_proc_bind` exists and the value is not passed to `__kmpc_fork_call`. On top of that, we only generate the `__kmpc_push_proc_bind` if we need it. That leaves the (potential) situation in which the call is present but we don't see it. So we need to either use the ICV tracking facility here or change the way we generate code. I guess always emitting `__kmpc_push_proc_bind` is the easiest way to handle this. Then we can look for the call and verify the binding is the same. WDYT?
710	Do we need to finalize the builder every time here (or at all)?
734	typo
742	Do we want to print this for each use or once at the end?
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319	The update test script cannot add check lines for new functions yet. Can you manually add some minimal check lines to ensure the new function we create looks as expected, e.g., has regular calls to the former parallel regions?

sstefan1 added inline comments.Jul 18 2020, 12:41 PM

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319	Maybe I can take a look at that since I'm now somewhat familiar with the script.

jdoerfert added a subscriber: greened.Jul 18 2020, 5:17 PM

jdoerfert added inline comments.

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319	there is a patch for the _cc_ script already: D83004. I suggested there to make it generic (common.py) for _cc_ and the opt script, maybe we can do it in 2 steps instead. < @greened

jdoerfert mentioned this in D83004: [UpdateCCTestChecks] Include generated functions if asked.Jul 22 2020, 5:27 PM

Update for comments

ggeorgakoudis marked 12 inline comments as done.Jul 27 2020, 12:15 PM

ggeorgakoudis added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
517	I have intentionally added re-running deduplication before merging. It helps removing emitted calls to `__kmpc_global_thread_num`, and that enables merging of parallel regions by removing redundant in-between instructions (at least at this point that merging is very defensive)
611	Asserts on uses within BodyGenCB
644	We emit a remark in line 708 that identifies merged parallel regions. What do you have in mind? Is it a single remark that contains identifiers for all merged regions?
660	This is tricky. What if merged parallel regions have different `proc_bind` clauses? Is it possible to call `__kmpc_push_proc_bind` within a parallel region, during parallel execution?
710	This finalize call is indeed unnecessary. The only call to finalize needed is when outlining the new, merged parallel region.
742	I have moved debug prints in the loop of the BB2PRMap data structure and made it more informative (add how many parallel mergeable parallel regions have been found)
llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319	I see the patch is accepted but not yet committed. Waiting for it to be upstreamed.

ggeorgakoudis marked 3 inline comments as done.Jul 27 2020, 1:00 PM

ggeorgakoudis marked 3 inline comments as not done.

ggeorgakoudis added inline comments.Jul 27 2020, 1:31 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
660	I had a quick look in the OpenMP runtime implementation to answer my question. The `proc_bind` setting is handled within the implementation of the fork call at team creation, so calling it within an executing parallel region should have no effect on it. Does it make the most sense to just emit a call to `__kmpc_push_proc_bind`, falling back to default binding? That will invalidate `proc_bind` settings on merged parallel regions, is that within OpenMP specification?

Harbormaster failed remote builds in B65891: Diff 281003!Jul 27 2020, 2:11 PM

jdoerfert added inline comments.Jul 28 2020, 10:39 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
517	looks good.
644	I guess one per parallel region with the first one saying "merged the following parallel region into this one", and the others saying "merged into the parallel region above". The one we have below just says "we merged something" (IIRC).

ggeorgakoudis marked 3 inline comments as done.Sep 18 2020, 3:55 PM

Updates per comments

Add option to enable region merging, default is disable
Defensively abort merging if there are proc_bind affinities
Provide more informative optimization remarks
Update tests for cancellable regions, different files per pass manager due to testing incompatibility

ggeorgakoudis marked 3 inline comments as done.Sep 28 2020, 10:01 AM

ggeorgakoudis added inline comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
42	Option to enable parallel region merging
606	Check for explicit proc_bind affinity
653	Updated Remark
744	Updated Remark

Harbormaster completed remote builds in B73187: Diff 294736.Sep 28 2020, 10:01 AM

Fix for .ll files appearing as binary

ggeorgakoudis marked 3 inline comments as done.Sep 28 2020, 10:26 AM

ggeorgakoudis added inline comments.

llvm/test/Transforms/OpenMP/parallel_region_merging.ll
319	Use the new version of the script that includes generated functions

Harbormaster completed remote builds in B73195: Diff 294748.Sep 28 2020, 10:36 AM

Fix clang-tidy complaints

Harbormaster completed remote builds in B73254: Diff 294843.Sep 28 2020, 6:19 PM

A few minor nits, otherwise LGTM.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
524	run this conditional on the result of merge: if (merge...()) { deduplicate.. Changed = true; }
692	I think both todos are handled at this point.
728	No need for this. Any pass that might use this will make the connection on its own and treating the two arguments special is weird.
732	Can you use the variables instead of 1 here.
823	I think we have to collect the uses of the barrier as well.

This revision is now accepted and ready to land.Oct 5 2020, 9:16 PM

Update for comments

Update for comments (for real)

ggeorgakoudis marked 5 inline comments as done.Oct 6 2020, 8:21 PM

Harbormaster completed remote builds in B74216: Diff 296585.Oct 6 2020, 9:02 PM

Harbormaster completed remote builds in B74217: Diff 296586.

Update commit message

Harbormaster completed remote builds in B74225: Diff 296600.Oct 6 2020, 10:25 PM

Amend commit message

Harbormaster completed remote builds in B74533: Diff 297129.Oct 9 2020, 1:27 AM

Closed by commit rG3a6bfcf2f902: [OpenMPOpt] Merge parallel regions (authored by ggeorgakoudis). · Explain WhyOct 9 2020, 9:59 AM

This revision was automatically updated to reflect the committed changes.

ggeorgakoudis added a commit: rG3a6bfcf2f902: [OpenMPOpt] Merge parallel regions.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

OpenMPOpt.cpp

256 lines

test/

Transforms/

OpenMP/

parallel_region_merging.ll

412 lines

parallel_region_merging_legacy_pm.ll

412 lines

Diff 297270

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Show All 13 Lines

#include "llvm/Transforms/IPO/OpenMPOpt.h"		#include "llvm/Transforms/IPO/OpenMPOpt.h"

#include "llvm/ADT/EnumeratedArray.h"		#include "llvm/ADT/EnumeratedArray.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Frontend/OpenMP/OMPConstants.h"		#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"		#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/Attributor.h"		#include "llvm/Transforms/IPO/Attributor.h"
		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/CallGraphUpdater.h"		#include "llvm/Transforms/Utils/CallGraphUpdater.h"
#include "llvm/Analysis/ValueTracking.h"

using namespace llvm;		using namespace llvm;
using namespace omp;		using namespace omp;

#define DEBUG_TYPE "openmp-opt"		#define DEBUG_TYPE "openmp-opt"

static cl::opt<bool> DisableOpenMPOptimizations(		static cl::opt<bool> DisableOpenMPOptimizations(
"openmp-opt-disable", cl::ZeroOrMore,		"openmp-opt-disable", cl::ZeroOrMore,
cl::desc("Disable OpenMP specific optimizations."), cl::Hidden,		cl::desc("Disable OpenMP specific optimizations."), cl::Hidden,
cl::init(false));		cl::init(false));

		static cl::opt<bool> EnableParallelRegionMerging(
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Option to enable parallel region merging ggeorgakoudis: Option to enable parallel region merging
		"openmp-opt-enable-merging", cl::ZeroOrMore,
		cl::desc("Enable the OpenMP region merging optimization."), cl::Hidden,
		cl::init(false));

static cl::opt<bool> PrintICVValues("openmp-print-icv-values", cl::init(false),		static cl::opt<bool> PrintICVValues("openmp-print-icv-values", cl::init(false),
cl::Hidden);		cl::Hidden);
static cl::opt<bool> PrintOpenMPKernels("openmp-print-gpu-kernels",		static cl::opt<bool> PrintOpenMPKernels("openmp-print-gpu-kernels",
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> HideMemoryTransferLatency(		static cl::opt<bool> HideMemoryTransferLatency(
"openmp-hide-memory-transfer-latency",		"openmp-hide-memory-transfer-latency",
cl::desc("[WIP] Tries to hide the latency of host to device memory"		cl::desc("[WIP] Tries to hide the latency of host to device memory"
Show All 9 Lines	STATISTIC(NumOpenMPRuntimeFunctionsIdentified,
"Number of OpenMP runtime functions identified");		"Number of OpenMP runtime functions identified");
STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,		STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,
"Number of OpenMP runtime function uses identified");		"Number of OpenMP runtime function uses identified");
STATISTIC(NumOpenMPTargetRegionKernels,		STATISTIC(NumOpenMPTargetRegionKernels,
"Number of OpenMP target region entry points (=kernels) identified");		"Number of OpenMP target region entry points (=kernels) identified");
STATISTIC(		STATISTIC(
NumOpenMPParallelRegionsReplacedInGPUStateMachine,		NumOpenMPParallelRegionsReplacedInGPUStateMachine,
"Number of OpenMP parallel regions replaced with ID in GPU state machines");		"Number of OpenMP parallel regions replaced with ID in GPU state machines");
		STATISTIC(NumOpenMPParallelRegionsMerged,
		"Number of OpenMP parallel regions merged");

#if !defined(NDEBUG)		#if !defined(NDEBUG)
static constexpr auto TAG = "[" DEBUG_TYPE "]";		static constexpr auto TAG = "[" DEBUG_TYPE "]";
#endif		#endif

namespace {		namespace {

struct AAICVTracker;		struct AAICVTracker;
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	bool run() {

Changed \|= rewriteDeviceCodeStateMachine();		Changed \|= rewriteDeviceCodeStateMachine();

Changed \|= runAttributor();		Changed \|= runAttributor();

// Recollect uses, in case Attributor deleted any.		// Recollect uses, in case Attributor deleted any.
OMPInfoCache.recollectUses();		OMPInfoCache.recollectUses();

Changed \|= deduplicateRuntimeCalls();
Changed \|= deleteParallelRegions();		Changed \|= deleteParallelRegions();
if (HideMemoryTransferLatency)		if (HideMemoryTransferLatency)
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions I have intentionally added re-running deduplication before merging. It helps removing emitted calls to `__kmpc_global_thread_num`, and that enables merging of parallel regions by removing redundant in-between instructions (at least at this point that merging is very defensive) ggeorgakoudis: I have intentionally added re-running deduplication before merging. It helps removing emitted…
		jdoerfertUnsubmitted Done Reply Inline Actions looks good. jdoerfert: looks good.
Changed \|= hideMemTransfersLatency();		Changed \|= hideMemTransfersLatency();
if (remarksEnabled())		if (remarksEnabled())
analysisGlobalization();		analysisGlobalization();
		Changed \|= deduplicateRuntimeCalls();
		if (EnableParallelRegionMerging) {
		if (mergeParallelRegions()) {
		deduplicateRuntimeCalls();
		jdoerfertUnsubmitted Done Reply Inline Actions run this conditional on the result of merge: if (merge...()) { deduplicate.. Changed = true; } jdoerfert: run this conditional on the result of merge: ``` if (merge...()) { deduplicate.. Changed =…
		Changed = true;
		}
		}

return Changed;		return Changed;
}		}

/// Print initial ICV values for testing.		/// Print initial ICV values for testing.
/// FIXME: This should be done from the Attributor once it is added.		/// FIXME: This should be done from the Attributor once it is added.
void printICVs() const {		void printICVs() const {
InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel,		InternalControlVar ICVs[] = {ICV_nthreads, ICV_active_levels, ICV_cancel,
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static CallInst *getCallIfRegularCall(
CallInst *CI = dyn_cast<CallInst>(&V);		CallInst *CI = dyn_cast<CallInst>(&V);
if (CI && !CI->hasOperandBundles() &&		if (CI && !CI->hasOperandBundles() &&
(!RFI \|\| CI->getCalledFunction() == RFI->Declaration))		(!RFI \|\| CI->getCalledFunction() == RFI->Declaration))
return CI;		return CI;
return nullptr;		return nullptr;
}		}

private:		private:
		/// Merge parallel regions when it is safe.
		bool mergeParallelRegions() {
		const unsigned CallbackCalleeOperand = 2;
		const unsigned CallbackFirstArgOperand = 3;
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

		// Check if there are any __kmpc_fork_call calls to merge.
		OMPInformationCache::RuntimeFunctionInfo &RFI =
		OMPInfoCache.RFIs[OMPRTL___kmpc_fork_call];

		if (!RFI.Declaration)
		return false;

		// Check if there any __kmpc_push_proc_bind calls for explicit affinities.
		OMPInformationCache::RuntimeFunctionInfo &ProcBindRFI =
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Check for explicit proc_bind affinity ggeorgakoudis: Check for explicit proc_bind affinity
		OMPInfoCache.RFIs[OMPRTL___kmpc_push_proc_bind];

		// Defensively abort if explicit affinities are set.
		// TODO: Track ICV proc_bind to merge when mergable regions have the same
		// affinity.
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: Initialize them with `nullptr` and place an assert at the use site, makes it easier to trace a bug if we ever mess up. jdoerfert: Nit: Initialize them with `nullptr` and place an assert at the use site, makes it easier to…
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Asserts on uses within BodyGenCB ggeorgakoudis: Asserts on uses within BodyGenCB
		if (ProcBindRFI.Declaration)
		return false;

		bool Changed = false;
		LoopInfo *LI = nullptr;
		DominatorTree *DT = nullptr;

		SmallDenseMap<BasicBlock , SmallPtrSet<Instruction , 4>> BB2PRMap;

		BasicBlock StartBB = nullptr, EndBB = nullptr;
		auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationIP) {
		BasicBlock *CGStartBB = CodeGenIP.getBlock();
		BasicBlock *CGEndBB =
		SplitBlock(CGStartBB, &*CodeGenIP.getPoint(), DT, LI);
		assert(StartBB != nullptr && "StartBB should not be null");
		CGStartBB->getTerminator()->setSuccessor(0, StartBB);
		assert(EndBB != nullptr && "EndBB should not be null");
		EndBB->getTerminator()->setSuccessor(0, CGEndBB);
		};

		auto PrivCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		Value &VPtr, Value *&ReplacementValue) -> InsertPointTy {
		ReplacementValue = &VPtr;
		return CodeGenIP;
		};

		auto FiniCB = [&](InsertPointTy CodeGenIP) {};

		// Helper to merge the __kmpc_fork_call calls in MergableCIs. They are all
		// contained in BB and only separated by instructions that can be
		// redundantly executed in parallel. The block BB is split before the first
		// call (in MergableCIs) and after the last so the entire region we merge
		jdoerfertUnsubmitted Done Reply Inline Actions We should also emit a remark here. And one pointing at each of the merged parallel regions. jdoerfert: We should also emit a remark here. And one pointing at each of the merged parallel regions.
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions We emit a remark in line 708 that identifies merged parallel regions. What do you have in mind? Is it a single remark that contains identifiers for all merged regions? ggeorgakoudis: We emit a remark in line 708 that identifies merged parallel regions. What do you have in mind?
		jdoerfertUnsubmitted Done Reply Inline Actions I guess one per parallel region with the first one saying "merged the following parallel region into this one", and the others saying "merged into the parallel region above". The one we have below just says "we merged something" (IIRC). jdoerfert: I guess one per parallel region with the first one saying "merged the following parallel region…
		// into a single parallel region is contained in a single basic block
		// without any other instructions. We use the OpenMPIRBuilder to outline
		// that block and call the resulting function via __kmpc_fork_call.
		auto Merge = [&](SmallVectorImpl<CallInst > &MergableCIs, BasicBlock BB) {
		// TODO: Change the interface to allow single CIs expanded, e.g, to
		// include an outer loop.
		assert(MergableCIs.size() > 1 && "Assumed multiple mergable CIs");
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: For some reason it is "common" to use `abc.dce.xyz` as naming scheme for blocks (I think). Maybe we should go with something similar to the OMPIRBuilder, i.a., `omp.par.expanded`. Just a suggestion. jdoerfert: Nit: For some reason it is "common" to use `abc.dce.xyz` as naming scheme for blocks (I think).

		auto Remark = [&](OptimizationRemark OR) {
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Updated Remark ggeorgakoudis: Updated Remark
		OR << "Parallel region at "
		<< ore::NV("OpenMPParallelMergeFront",
		MergableCIs.front()->getDebugLoc())
		<< " merged with parallel regions at ";
		for (auto *CI :
		llvm::make_range(MergableCIs.begin() + 1, MergableCIs.end())) {
		OR << ore::NV("OpenMPParallelMerge", CI->getDebugLoc());
		jdoerfertUnsubmitted Done Reply Inline Actions I think we need some tests for these at some point. One with proc-binds and one with combinations of cancellable and non-cancellable regions. I think it "should just work" but we need to make sure the resulted expanded parallel region does what we would expect. I would assume cancellable is actually not a problem at all, proc-bind specified for the first of two that are merged might be. I expect we merge and both now have have the proc-bind now for the entire region. This is not ideal. TBH, the real problem is that `__kmpc_push_proc_bind` exists and the value is not passed to `__kmpc_fork_call`. On top of that, we only generate the `__kmpc_push_proc_bind` if we need it. That leaves the (potential) situation in which the call is present but we don't see it. So we need to either use the ICV tracking facility here or change the way we generate code. I guess always emitting `__kmpc_push_proc_bind` is the easiest way to handle this. Then we can look for the call and verify the binding is the same. WDYT? jdoerfert: I think we need some tests for these at some point. One with proc-binds and one with…
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions This is tricky. What if merged parallel regions have different `proc_bind` clauses? Is it possible to call `__kmpc_push_proc_bind` within a parallel region, during parallel execution? ggeorgakoudis: This is tricky. What if merged parallel regions have different `proc_bind` clauses? Is it…
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions I had a quick look in the OpenMP runtime implementation to answer my question. The `proc_bind` setting is handled within the implementation of the fork call at team creation, so calling it within an executing parallel region should have no effect on it. Does it make the most sense to just emit a call to `__kmpc_push_proc_bind`, falling back to default binding? That will invalidate `proc_bind` settings on merged parallel regions, is that within OpenMP specification? ggeorgakoudis: I had a quick look in the OpenMP runtime implementation to answer my question. The `proc_bind`…
		if (CI != MergableCIs.back())
		OR << ", ";
		}
		return OR;
		};

		emitRemark<OptimizationRemark>(MergableCIs.front(),
		"OpenMPParallelRegionMerging", Remark);

		Function *OriginalFn = BB->getParent();
		LLVM_DEBUG(dbgs() << TAG << "Merge " << MergableCIs.size()
		<< " parallel regions in " << OriginalFn->getName()
		<< "\n");

		// Isolate the calls to merge in a separate block.
		EndBB = SplitBlock(BB, MergableCIs.back()->getNextNode(), DT, LI);
		BasicBlock *AfterBB =
		SplitBlock(EndBB, &*EndBB->getFirstInsertionPt(), DT, LI);
		StartBB = SplitBlock(BB, MergableCIs.front(), DT, LI, nullptr,
		"omp.par.merged");

		assert(BB->getUniqueSuccessor() == StartBB && "Expected a different CFG");
		const DebugLoc DL = BB->getTerminator()->getDebugLoc();
		BB->getTerminator()->eraseFromParent();

		OpenMPIRBuilder::LocationDescription Loc(InsertPointTy(BB, BB->end()),
		DL);
		IRBuilder<>::InsertPoint AllocaIP(
		&OriginalFn->getEntryBlock(),
		OriginalFn->getEntryBlock().getFirstInsertionPt());
		// Create the merged parallel region with default proc binding, to
		// avoid overriding binding settings, and without explicit cancellation.
		jdoerfertUnsubmitted Done Reply Inline Actions I think both todos are handled at this point. jdoerfert: I think both todos are handled at this point.
		InsertPointTy AfterIP = OMPInfoCache.OMPBuilder.CreateParallel(
		Loc, AllocaIP, BodyGenCB, PrivCB, FiniCB, nullptr, nullptr,
		OMP_PROC_BIND_default, /* IsCancellable */ false);
		BranchInst::Create(AfterBB, AfterIP.getBlock());

		// Perform the actual outlining.
		OMPInfoCache.OMPBuilder.finalize();

		Function *OutlinedFn = MergableCIs.front()->getCaller();

		// Replace the __kmpc_fork_call calls with direct calls to the outlined
		// callbacks.
		SmallVector<Value *, 8> Args;
		for (auto *CI : MergableCIs) {
		Value *Callee =
		CI->getArgOperand(CallbackCalleeOperand)->stripPointerCasts();
		FunctionType *FT =
		cast<FunctionType>(Callee->getType()->getPointerElementType());
		jdoerfertUnsubmitted Done Reply Inline Actions Do we need to finalize the builder every time here (or at all)? jdoerfert: Do we need to finalize the builder every time here (or at all)?
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions This finalize call is indeed unnecessary. The only call to finalize needed is when outlining the new, merged parallel region. ggeorgakoudis: This finalize call is indeed unnecessary. The only call to finalize needed is when outlining…
		Args.clear();
		Args.push_back(OutlinedFn->getArg(0));
		Args.push_back(OutlinedFn->getArg(1));
		for (unsigned U = CallbackFirstArgOperand, E = CI->getNumArgOperands();
		U < E; ++U)
		Args.push_back(CI->getArgOperand(U));

		CallInst *NewCI = CallInst::Create(FT, Callee, Args, "", CI);
		if (CI->getDebugLoc())
		NewCI->setDebugLoc(CI->getDebugLoc());

		// Forward parameter attributes from the callback to the callee.
		for (unsigned U = CallbackFirstArgOperand, E = CI->getNumArgOperands();
		U < E; ++U)
		for (const Attribute &A : CI->getAttributes().getParamAttributes(U))
		NewCI->addParamAttr(
		U - (CallbackFirstArgOperand - CallbackCalleeOperand), A);

		jdoerfertUnsubmitted Done Reply Inline Actions No need for this. Any pass that might use this will make the connection on its own and treating the two arguments special is weird. jdoerfert: No need for this. Any pass that might use this will make the connection on its own and treating…
		// Emit an explicit barrier to replace the implicit fork-join barrier.
		if (CI != MergableCIs.back()) {
		// TODO: Remove barrier if the merged parallel region includes the
		// 'nowait' clause.
		jdoerfertUnsubmitted Done Reply Inline Actions Can you use the variables instead of 1 here. jdoerfert: Can you use the variables instead of 1 here.
		OMPInfoCache.OMPBuilder.CreateBarrier(
		InsertPointTy(NewCI->getParent(),
		jdoerfertUnsubmitted Done Reply Inline Actions typo jdoerfert: typo
		NewCI->getNextNode()->getIterator()),
		OMPD_parallel);
		}

		auto Remark = [&](OptimizationRemark OR) {
		return OR << "Parallel region at "
		<< ore::NV("OpenMPParallelMerge", CI->getDebugLoc())
		<< " merged with "
		jdoerfertUnsubmitted Done Reply Inline Actions Do we want to print this for each use or once at the end? jdoerfert: Do we want to print this for each use or once at the end?
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions I have moved debug prints in the loop of the BB2PRMap data structure and made it more informative (add how many parallel mergeable parallel regions have been found) ggeorgakoudis: I have moved debug prints in the loop of the BB2PRMap data structure and made it more…
		<< ore::NV("OpenMPParallelMergeFront",
		MergableCIs.front()->getDebugLoc());
		ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Updated Remark ggeorgakoudis: Updated Remark
		};
		if (CI != MergableCIs.front())
		emitRemark<OptimizationRemark>(CI, "OpenMPParallelRegionMerging",
		Remark);

		CI->eraseFromParent();
		}

		assert(OutlinedFn != OriginalFn && "Outlining failed");
		CGUpdater.registerOutlinedFunction(*OutlinedFn);
		CGUpdater.reanalyzeFunction(*OriginalFn);

		NumOpenMPParallelRegionsMerged += MergableCIs.size();
		hfinkelUnsubmitted Done Reply Inline Actions Comments in LLVM should be complete sentences and end with appropriate punctuation (here and a few other places). hfinkel: Comments in LLVM should be complete sentences and end with appropriate punctuation (here and a…

		return true;
		};

		// Helper function that identifes sequences of
		// __kmpc_fork_call uses in a basic block.
		auto DetectPRsCB = [&](Use &U, Function &F) {
		hfinkelUnsubmitted Done Reply Inline Actions Uneeded {} (here and a few other places). hfinkel: Uneeded {} (here and a few other places).
		CallInst *CI = getCallIfRegularCall(U, &RFI);
		BB2PRMap[CI->getParent()].insert(CI);

		return false;
		};

		BB2PRMap.clear();
		RFI.foreachUse(SCC, DetectPRsCB);
		SmallVector<SmallVector<CallInst *, 4>, 4> MergableCIsVector;
		// Find mergable parallel regions within a basic block that are
		// safe to merge, that is any in-between instructions can safely
		// execute in parallel after merging.
		// TODO: support merging across basic-blocks.
		for (auto &It : BB2PRMap) {
		auto &CIs = It.getSecond();
		if (CIs.size() < 2)
		continue;

		BasicBlock *BB = It.getFirst();
		SmallVector<CallInst *, 4> MergableCIs;

		// Find maximal number of parallel region CIs that are safe to merge.
		for (Instruction &I : *BB) {
		if (CIs.count(&I)) {
		MergableCIs.push_back(cast<CallInst>(&I));
		continue;
		}

		if (isSafeToSpeculativelyExecute(&I, &I, DT))
		continue;

		if (MergableCIs.size() > 1) {
		MergableCIsVector.push_back(MergableCIs);
		LLVM_DEBUG(dbgs() << TAG << "Found " << MergableCIs.size()
		<< " parallel regions in block " << BB->getName()
		<< " of function " << BB->getParent()->getName()
		<< "\n";);
		}

		MergableCIs.clear();
		}

		if (!MergableCIsVector.empty()) {
		Changed = true;

		for (auto &MergableCIs : MergableCIsVector)
		Merge(MergableCIs, BB);
		}
		}

		if (Changed) {
		// Update RFI info to set it up for later passes.
		RFI.clearUsesMap();
		OMPInfoCache.collectUses(RFI, /* CollectStats */ false);

		// Collect uses for the emitted barrier call.
		OMPInformationCache::RuntimeFunctionInfo &BarrierRFI =
		OMPInfoCache.RFIs[OMPRTL___kmpc_barrier];
		BarrierRFI.clearUsesMap();
		jdoerfertUnsubmitted Done Reply Inline Actions I think we have to collect the uses of the barrier as well. jdoerfert: I think we have to collect the uses of the barrier as well.
		OMPInfoCache.collectUses(BarrierRFI, /* CollectStats */ false);
		}

		return Changed;
		}

/// Try to delete parallel regions if possible.		/// Try to delete parallel regions if possible.
bool deleteParallelRegions() {		bool deleteParallelRegions() {
const unsigned CallbackCalleeOperand = 2;		const unsigned CallbackCalleeOperand = 2;

OMPInformationCache::RuntimeFunctionInfo &RFI =		OMPInformationCache::RuntimeFunctionInfo &RFI =
OMPInfoCache.RFIs[OMPRTL___kmpc_fork_call];		OMPInfoCache.RFIs[OMPRTL___kmpc_fork_call];

if (!RFI.Declaration)		if (!RFI.Declaration)
▲ Show 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines	PreservedAnalyses OpenMPOptPass::run(LazyCallGraph::SCC &C,

OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);		OpenMPOpt OMPOpt(SCC, CGUpdater, OREGetter, InfoCache, A);
bool Changed = OMPOpt.run();		bool Changed = OMPOpt.run();
if (Changed)		if (Changed)
return PreservedAnalyses::none();		return PreservedAnalyses::none();

return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}

		jdoerfertUnsubmitted Done Reply Inline Actions Style: No else after return. Please commit this separately, LGTM for this change. jdoerfert: Style: No else after return. Please commit this separately, LGTM for this change.
namespace {		namespace {

struct OpenMPOptLegacyPass : public CallGraphSCCPass {		struct OpenMPOptLegacyPass : public CallGraphSCCPass {
CallGraphUpdater CGUpdater;		CallGraphUpdater CGUpdater;
OpenMPInModule OMPInModule;		OpenMPInModule OMPInModule;
static char ID;		static char ID;

OpenMPOptLegacyPass() : CallGraphSCCPass(ID) {		OpenMPOptLegacyPass() : CallGraphSCCPass(ID) {
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/test/Transforms/OpenMP/parallel_region_merging.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
				; RUN: opt -S -passes='attributor,cgscc(openmpopt)' -openmp-opt-enable-merging < %s \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				%struct.ident_t = type { i32, i32, i32, i32, i8* }

				@0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @0, i32 0, i32 0) }, align 8

				; void merge_all() {
				; int a = 1;
				; #pragma omp parallel
				; {
				hfinkelUnsubmitted Done Reply Inline Actions Do these need to say 'nowait' in order to actually match the code? hfinkel: Do these need to say 'nowait' in order to actually match the code?
				ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions No, the code matches the IR generation ('nowait' is available only for worksharing constructs). The merging transformation emits an explicit barrier in place of the implicit barrier after merging. However, I added a TODO because it could make sense to avoid explicit barrier generation, if the 'nowait' clause is present in an enclosing worksharing construct, and the merged parallel regions are independent. ggeorgakoudis: No, the code matches the IR generation ('nowait' is available only for worksharing constructs).
				hfinkelUnsubmitted Done Reply Inline Actions Oh, right. Indeed. Thanks. We should do barrier elimination anyway, so that's certainly a good point that we should make sure that the barrier elimination can apply to these kinds of barriers (or we can not emit them in the first place, although maybe it's easier to do after we have anything together where we can just query the MemorySSA use/def chain or similar to look at underlying dependence information). hfinkel: Oh, right. Indeed. Thanks. We should do barrier elimination anyway, so that's certainly a good…
				jdoerfertUnsubmitted Done Reply Inline Actions FWIW, while my proposal for `parallel nowait` was briefly discussed, I did not spend enough time on it to get it into OpenMP 5.1. The questions how to synchronize with the threads afterwards and what guarantees you have are interesting. jdoerfert: FWIW, while my proposal for `parallel nowait` was briefly discussed, I did not spend enough…
				; a = 2;
				; }
				; #pragma omp parallel
				; {
				; a = 3;
				; }
				; }
				;
				; Merge all parallel regions.
				define dso_local void @merge_all() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par.1 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_all..omp_par.1(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 3, i32* %2, align 4
				ret void
				}

				define internal void @merge_all..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}


				declare i32 @__kmpc_global_thread_num(%struct.ident_t*) local_unnamed_addr

				declare !callback !1 void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...) local_unnamed_addr

				; void merge_none() {
				; int a = 1;
				; #pragma omp parallel
				; {
				; a = 2;
				; }
				; a = 3;
				; #pragma omp parallel
				; {
				; a = 4;
				; }
				; }
				;
				; Does not merge parallel regions, in-between store
				; instruction is unsafe to execute in parallel.
				define dso_local void @merge_none() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_none..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				store i32 3, i32* %1, align 4
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_none..omp_par.2 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_none..omp_par.2(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 4, i32* %2, align 4
				ret void
				}

				define internal void @merge_none..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}

				; void merge_some() {
				; int a = 1;
				; #pragma omp parallel
				; {
				; a = 2;
				; }
				; a = 3;
				; #pragma omp parallel
				; {
				; a = 4;
				; }
				; #pragma omp parallel
				; {
				; a = 5;
				; }
				; }
				;
				; Do not merge first parallel region, due to the
				; unsafe store, but merge the two next parallel
				; regions.
				define dso_local void @merge_some() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				store i32 3, i32* %1, align 4
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.3 to void (i32, i32, ...)), i32 nonnull %1)
				%5 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.4 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_some..omp_par.4(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 5, i32* %2, align 4
				ret void
				}

				define internal void @merge_some..omp_par.3(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 4, i32* %2, align 4
				ret void
				}

				define internal void @merge_some..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}

				; void merge_cancellable_regions(int cancel1, int cancel2)
				; {
				; #pragma omp parallel
				; {
				; if(cancel1) {
				; #pragma omp cancel parallel
				; }
				; }
				;
				; #pragma omp parallel
				; {
				; if (cancel2) {
				; #pragma omp cancel parallel
				; }
				; }
				; }
				;
				; Merge correctly cancellable regions.
				define dso_local void @merge_cancellable_regions(i32 %0, i32 %1) local_unnamed_addr {
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				store i32 %0, i32* %3, align 4
				store i32 %1, i32* %4, align 4
				%5 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_cancellable_regions..omp_par to void (i32, i32, ...)), i32 nonnull %3)
				%6 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_cancellable_regions..omp_par.5 to void (i32, i32, ...)), i32 nonnull %4)
				ret void
				}

				define internal void @merge_cancellable_regions..omp_par.5(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture readonly %2) {
				%4 = load i32, i32* %2, align 4
				%5 = icmp eq i32 %4, 0
				br i1 %5, label %6, label %7

				6: ; preds = %3
				ret void

				7: ; preds = %3
				%8 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				%9 = call i32 @__kmpc_cancel(%struct.ident_t* nonnull @1, i32 %8, i32 1)
				ret void
				}

				define internal void @merge_cancellable_regions..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture readonly %2) {
				%4 = load i32, i32* %2, align 4
				%5 = icmp eq i32 %4, 0
				br i1 %5, label %6, label %7

				6: ; preds = %3
				ret void

				7: ; preds = %3
				%8 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				%9 = call i32 @__kmpc_cancel(%struct.ident_t* nonnull @1, i32 %8, i32 1)
				ret void
				}

				declare i32 @__kmpc_cancel(%struct.ident_t*, i32, i32) local_unnamed_addr


				!llvm.module.flags = !{!0}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!2}
				!2 = !{i64 2, i64 -1, i64 -1, i1 true}
				; CHECK-LABEL: define {{[^@]+}}@merge_all() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1:@.*]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par.3 to void (i32, i32, ...)), i32 [[TMP2]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par.3
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.]]) [[ATTR0:#.]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_all..omp_par(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2:@.*]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_all..omp_par.1(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par.1
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.]]) [[ATTR1:#.]] {
				jdoerfertUnsubmitted Done Reply Inline Actions Remove all oft the attributes if possible, at least all the string attributes. It is also weird that we have `optnone` here and actually do something, I guess OpenMP opt doesn't properly call skipSCC/skipFunction? Remove it here (to make it a problem for tomorrow). jdoerfert: Remove all oft the attributes if possible, at least all the string attributes. It is also weird…
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_none..omp_par to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_none..omp_par.2 to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none..omp_par.2
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 4, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_some..omp_par to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.2 to void (i32, i32, ...)), i32 [[TMP2]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.2
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.*]]) [[ATTR0]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_some..omp_par.3(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_some..omp_par.4(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				jdoerfertUnsubmitted Done Reply Inline Actions The update test script cannot add check lines for new functions yet. Can you manually add some minimal check lines to ensure the new function we create looks as expected, e.g., has regular calls to the former parallel regions? jdoerfert: The update test script cannot add check lines for new functions yet. Can you manually add some…
				sstefan1Unsubmitted Done Reply Inline Actions Maybe I can take a look at that since I'm now somewhat familiar with the script. sstefan1: Maybe I can take a look at that since I'm now somewhat familiar with the script.
				jdoerfertUnsubmitted Done Reply Inline Actions there is a patch for the _cc_ script already: D83004. I suggested there to make it generic (common.py) for _cc_ and the opt script, maybe we can do it in 2 steps instead. < @greened jdoerfert: there is a patch for the _cc_ script already: D83004. I suggested there to make it generic…
				ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions I see the patch is accepted but not yet committed. Waiting for it to be upstreamed. ggeorgakoudis: I see the patch is accepted but not yet committed. Waiting for it to be upstreamed.
				ggeorgakoudisAuthorUnsubmitted Done Reply Inline Actions Use the new version of the script that includes generated functions ggeorgakoudis: Use the new version of the script that includes generated functions
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.4
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 5, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.3
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 4, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions
				; CHECK-SAME: (i32 [[TMP0:%.]], i32 [[TMP1:%.]]) local_unnamed_addr {
				; CHECK-NEXT: [[TMP3:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP4:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP5:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 [[TMP0]], i32* [[TMP4]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TMP5]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, i32, i32)* @merge_cancellable_regions..omp_par.1 to void (i32, i32, ...)), i32 [[TMP4]], i32* [[TMP5]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par.1
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.]], i32 [[TMP1:%.*]]) [[ATTR0]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP2]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_cancellable_regions..omp_par(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_cancellable_regions..omp_par.5(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP1]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par.5
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP2:%.*]]) {
				; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
				; CHECK-NEXT: br i1 [[TMP5]], label [[TMP6:%.]], label [[TMP7:%.]]
				; CHECK: 6:
				; CHECK-NEXT: ret void
				; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull [[GLOB1]])
				; CHECK-NEXT: [[TMP9:%.]] = call i32 @__kmpc_cancel(%struct.ident_t noundef nonnull [[GLOB1]], i32 [[TMP8]], i32 noundef 1)
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP2:%.*]]) {
				; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
				; CHECK-NEXT: br i1 [[TMP5]], label [[TMP6:%.]], label [[TMP7:%.]]
				; CHECK: 6:
				; CHECK-NEXT: ret void
				; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull [[GLOB1]])
				; CHECK-NEXT: [[TMP9:%.]] = call i32 @__kmpc_cancel(%struct.ident_t noundef nonnull [[GLOB1]], i32 [[TMP8]], i32 noundef 1)
				; CHECK-NEXT: ret void
				;

llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
				; RUN: opt -S -attributor -openmpopt -openmp-opt-enable-merging < %s \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				%struct.ident_t = type { i32, i32, i32, i32, i8* }

				@0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
				@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @0, i32 0, i32 0) }, align 8

				; void merge_all() {
				; int a = 1;
				; #pragma omp parallel
				; {
				; a = 2;
				; }
				; #pragma omp parallel
				; {
				; a = 3;
				; }
				; }
				;
				; Merge all parallel regions.
				define dso_local void @merge_all() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par.1 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_all..omp_par.1(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 3, i32* %2, align 4
				ret void
				}

				define internal void @merge_all..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}


				declare i32 @__kmpc_global_thread_num(%struct.ident_t*) local_unnamed_addr

				declare !callback !1 void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...) local_unnamed_addr

				; void merge_none() {
				; int a = 1;
				; #pragma omp parallel
				; {
				; a = 2;
				; }
				; a = 3;
				; #pragma omp parallel
				; {
				; a = 4;
				; }
				; }
				;
				; Does not merge parallel regions, in-between store
				; instruction is unsafe to execute in parallel.
				define dso_local void @merge_none() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_none..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				store i32 3, i32* %1, align 4
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_none..omp_par.2 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_none..omp_par.2(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 4, i32* %2, align 4
				ret void
				}

				define internal void @merge_none..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}

				; void merge_some() {
				; int a = 1;
				; #pragma omp parallel
				; {
				; a = 2;
				; }
				; a = 3;
				; #pragma omp parallel
				; {
				; a = 4;
				; }
				; #pragma omp parallel
				; {
				; a = 5;
				; }
				; }
				;
				; Do not merge first parallel region, due to the
				; unsafe store, but merge the two next parallel
				; regions.
				define dso_local void @merge_some() local_unnamed_addr {
				%1 = alloca i32, align 4
				%2 = bitcast i32* %1 to i8*
				store i32 1, i32* %1, align 4
				%3 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par to void (i32, i32, ...)), i32 nonnull %1)
				store i32 3, i32* %1, align 4
				%4 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.3 to void (i32, i32, ...)), i32 nonnull %1)
				%5 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.4 to void (i32, i32, ...)), i32 nonnull %1)
				ret void
				}

				define internal void @merge_some..omp_par.4(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 5, i32* %2, align 4
				ret void
				}

				define internal void @merge_some..omp_par.3(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 4, i32* %2, align 4
				ret void
				}

				define internal void @merge_some..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture %2) {
				store i32 2, i32* %2, align 4
				ret void
				}

				; void merge_cancellable_regions(int cancel1, int cancel2)
				; {
				; #pragma omp parallel
				; {
				; if(cancel1) {
				; #pragma omp cancel parallel
				; }
				; }
				;
				; #pragma omp parallel
				; {
				; if (cancel2) {
				; #pragma omp cancel parallel
				; }
				; }
				; }
				;
				; Merge correctly cancellable regions.
				define dso_local void @merge_cancellable_regions(i32 %0, i32 %1) local_unnamed_addr {
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				store i32 %0, i32* %3, align 4
				store i32 %1, i32* %4, align 4
				%5 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_cancellable_regions..omp_par to void (i32, i32, ...)), i32 nonnull %3)
				%6 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* nonnull @1, i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_cancellable_regions..omp_par.5 to void (i32, i32, ...)), i32 nonnull %4)
				ret void
				}

				define internal void @merge_cancellable_regions..omp_par.5(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture readonly %2) {
				%4 = load i32, i32* %2, align 4
				%5 = icmp eq i32 %4, 0
				br i1 %5, label %6, label %7

				6: ; preds = %3
				ret void

				7: ; preds = %3
				%8 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				%9 = call i32 @__kmpc_cancel(%struct.ident_t* nonnull @1, i32 %8, i32 1)
				ret void
				}

				define internal void @merge_cancellable_regions..omp_par(i32* noalias nocapture readnone %0, i32* noalias nocapture readnone %1, i32* nocapture readonly %2) {
				%4 = load i32, i32* %2, align 4
				%5 = icmp eq i32 %4, 0
				br i1 %5, label %6, label %7

				6: ; preds = %3
				ret void

				7: ; preds = %3
				%8 = call i32 @__kmpc_global_thread_num(%struct.ident_t* nonnull @1)
				%9 = call i32 @__kmpc_cancel(%struct.ident_t* nonnull @1, i32 %8, i32 1)
				ret void
				}

				declare i32 @__kmpc_cancel(%struct.ident_t*, i32, i32) local_unnamed_addr


				!llvm.module.flags = !{!0}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!2}
				!2 = !{i64 2, i64 -1, i64 -1, i1 true}
				; CHECK-LABEL: define {{[^@]+}}@merge_all() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1:@.*]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_all..omp_par.2 to void (i32, i32, ...)), i32 [[TMP2]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par.2
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.]]) [[ATTR0:#.]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_all..omp_par(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2:@.*]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_all..omp_par.1(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par.1
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.]]) [[ATTR1:#.]] {
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_all..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_none..omp_par to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_none..omp_par.2 to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none..omp_par.2
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 4, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_none..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some() local_unnamed_addr {
				; CHECK-NEXT: [[TMP1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 1, i32* [[TMP2]], align 4
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* noundef nonnull align 8 dereferenceable(24) [[GLOB1]], i32 noundef 1, void (i32, i32, ...)* noundef bitcast (void (i32, i32, i32) @merge_some..omp_par to void (i32, i32, ...)), i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2]])
				; CHECK-NEXT: store i32 3, i32* [[TMP2]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 1, void (i32, i32, ...)* bitcast (void (i32, i32, i32) @merge_some..omp_par.5 to void (i32, i32, ...)), i32 [[TMP2]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.5
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.*]]) [[ATTR0]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_some..omp_par.3(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_some..omp_par.4(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.4
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 5, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par.3
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 4, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_some..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[TMP2:%.*]]) [[ATTR1]] {
				; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions
				; CHECK-SAME: (i32 [[TMP0:%.]], i32 [[TMP1:%.]]) local_unnamed_addr {
				; CHECK-NEXT: [[TMP3:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull align 8 dereferenceable(24) [[GLOB1]])
				; CHECK-NEXT: [[TMP4:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP5:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 [[TMP0]], i32* [[TMP4]], align 4
				; CHECK-NEXT: store i32 [[TMP1]], i32* [[TMP5]], align 4
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: br label [[OMP_PARALLEL:%.*]]
				; CHECK: omp_parallel:
				; CHECK-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* [[GLOB1]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, i32, i32)* @merge_cancellable_regions..omp_par.6 to void (i32, i32, ...)), i32 [[TMP4]], i32* [[TMP5]])
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
				; CHECK: omp.par.outlined.exit:
				; CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
				; CHECK: omp.par.exit.split:
				; CHECK-NEXT: br label [[DOTSPLIT_SPLIT:%.*]]
				; CHECK: .split.split:
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par.6
				; CHECK-SAME: (i32* noalias [[TID_ADDR:%.]], i32 noalias [[ZERO_ADDR:%.]], i32 [[TMP0:%.]], i32 [[TMP1:%.*]]) [[ATTR0]] {
				; CHECK-NEXT: omp.par.entry:
				; CHECK-NEXT: [[TID_ADDR_LOCAL:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TID_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP2]], i32* [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: [[TID:%.]] = load i32, i32 [[TID_ADDR_LOCAL]], align 4
				; CHECK-NEXT: br label [[OMP_PAR_REGION:%.*]]
				; CHECK: omp.par.outlined.exit.exitStub:
				; CHECK-NEXT: ret void
				; CHECK: omp.par.region:
				; CHECK-NEXT: br label [[OMP_PAR_MERGED:%.*]]
				; CHECK: omp.par.merged:
				; CHECK-NEXT: call void @merge_cancellable_regions..omp_par(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP0]])
				; CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t [[GLOB1]])
				; CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* [[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				; CHECK-NEXT: call void @merge_cancellable_regions..omp_par.5(i32* [[TID_ADDR]], i32* [[ZERO_ADDR]], i32* nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP1]])
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: br label [[OMP_PAR_REGION_SPLIT:%.*]]
				; CHECK: omp.par.region.split:
				; CHECK-NEXT: br label [[OMP_PAR_PRE_FINALIZE:%.*]]
				; CHECK: omp.par.pre_finalize:
				; CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT_EXITSTUB:%.*]]
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par.5
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP2:%.*]]) {
				; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
				; CHECK-NEXT: br i1 [[TMP5]], label [[TMP6:%.]], label [[TMP7:%.]]
				; CHECK: 6:
				; CHECK-NEXT: ret void
				; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull [[GLOB1]])
				; CHECK-NEXT: [[TMP9:%.]] = call i32 @__kmpc_cancel(%struct.ident_t noundef nonnull [[GLOB1]], i32 [[TMP8]], i32 noundef 1)
				; CHECK-NEXT: ret void
				;
				;
				; CHECK-LABEL: define {{[^@]+}}@merge_cancellable_regions..omp_par
				; CHECK-SAME: (i32* noalias nocapture nofree readnone [[TMP0:%.]], i32 noalias nocapture nofree readnone [[TMP1:%.]], i32 nocapture noundef nonnull readonly align 4 dereferenceable(4) [[TMP2:%.*]]) {
				; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
				; CHECK-NEXT: br i1 [[TMP5]], label [[TMP6:%.]], label [[TMP7:%.]]
				; CHECK: 6:
				; CHECK-NEXT: ret void
				; CHECK: 7:
				; CHECK-NEXT: [[TMP8:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t noundef nonnull [[GLOB1]])
				; CHECK-NEXT: [[TMP9:%.]] = call i32 @__kmpc_cancel(%struct.ident_t noundef nonnull [[GLOB1]], i32 [[TMP8]], i32 noundef 1)
				; CHECK-NEXT: ret void
				;

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPOpt] Merge parallel regionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 297270

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

llvm/test/Transforms/OpenMP/parallel_region_merging.ll

llvm/test/Transforms/OpenMP/parallel_region_merging_legacy_pm.ll

[OpenMPOpt] Merge parallel regions
ClosedPublic