This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
-
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
2
OMPIRBuilder.cpp
-
unittests/Frontend/
-
Frontend/
-
OpenMPIRBuilderTest.cpp
-
mlir/
-
include/mlir/Target/LLVMIR/
-
mlir/
-
Target/
-
LLVMIR/
-
ModuleTranslation.h
-
lib/Target/LLVMIR/
-
Target/
-
LLVMIR/
-
ModuleTranslation.cpp
-
test/Target/
-
Target/
-
openmp-llvm.mlir

Differential D88706

[OpenMP][MLIR] WIP : Fix for AllocaIP
AbandonedPublic

Authored by kiranchandramohan on Oct 1 2020, 3:26 PM.

Download Raw Diff

Details

Reviewers

SouraVX
jdoerfert
kiranktp
fghanim
ftynse

Summary

Fix for nested parallel regions. Does the following,

Switch to OpenMPIRBuilder version which maintains allocaIP. (An initial version of https://reviews.llvm.org/D82470)
Create jump to continuation block by converting the terminator

Note:

This patch should be applied after reverting 19756ef53a498b7aa1fbac9e3a7cd3aa8e110fad.
This fix is required in the lowering of Master Op (https://reviews.llvm.org/D87247)

Diff Detail

Event Timeline

kiranchandramohan created this revision.Oct 1 2020, 3:26 PM

Herald added a reviewer: ftynse. · View Herald TranscriptOct 1 2020, 3:26 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, tatianashp, msifontes and 16 others. · View Herald Transcript

kiranchandramohan requested review of this revision.Oct 1 2020, 3:26 PM

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald TranscriptOct 1 2020, 3:26 PM

After reading the patch for master you referred to, I don't understand why do we need the OMPBuilder to maintain the insertion point. As far as master is concerned, we will emit any alloca's contained inside its region into the entry block of the enclosing outlined region (e.g. innermost parallel).
FWIW, the master directive in clang already uses the OMPBuilder and just relies on clang to handle the insertion of any non-omp code (including alloca's). Is there a reason why a similar approach wouldn't work here?

If this is indeed needed for master, then please don't create extra IRBuilders needlessly. As you mentioned, D82470 had this exact approach implemented, and we ended up not going through with it.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
724–727	These are unrelated to maintaining alloca insertion point. I don't recall back then whether these were removed completely, or added as part of other (2?) patches that were split off of this one.

Thanks @fghanim for the review. While debugging the nested parallel region issues, we saw some difference between where the allocas are placed by the OpenMP IRBuilder in the clang usage and the MLIR usage. Moving to the version where the OpenMP IRBuilder maintains allocaIP fixed the difference. In MLIR module translation there is no alloca insertion point. So what we can provide as allocaIP is the current insertionPoint. Is that OK? What is the allocaIP used for? Why do we need a separate allocaIP, why cannot we treat it like a normal instruction? Is it because all alloca instructions should be together at the top of the function? The langref for alloca did not have such a requirement.

I have created another review without the alloca changes and that also works correctly for the nested parallel case.
https://reviews.llvm.org/D88720

@fghanim I just saw your following comment in https://reviews.llvm.org/D87247.
"All llvm passes expect Allocas to be in the entry block of the function. In this case, the soon-to-be-outlined region. is builder's current insertion point in the entry block? Also, is it guaranteed to not be empty?"

I belive all allocas in the LLVM dialect will also be in the entry block of the OpenMP operation. But these will added only in the bodygen call-back. So they will be added to omp.par.region (actually omp.par.region1 since a dummy branch is created). But all these are trivial branches (not conditional) and can't they be inlined into the entry block if required? See example below for,

llvm.func @test_omp_parallel_4() -> () {
  omp.parallel {
    omp.barrier
    omp.terminator
  }
  llvm.return
}

define internal void @test_omp_parallel_4..omp_par(i32* noalias %tid.addr, i32* noalias %zero.addr) #0 !dbg !10 {
omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  %0 = load i32, i32* %tid.addr, align 4
  store i32 %0, i32* %tid.addr.local, align 4
  %tid = load i32, i32* %tid.addr.local, align 4
  br label %omp.par.region

omp.par.outlined.exit.exitStub:                   ; preds = %omp.par.pre_finalize
  ret void

omp.par.region:                                   ; preds = %omp.par.entry
  br label %omp.par.region1

omp.par.region1:                                  ; preds = %omp.par.region
  // ALLOCAS IN the OpenMP parallel region will appear HERE.
  %omp_global_thread_num2 = call i32 @__kmpc_global_thread_num(%struct.ident_t* @4)
  call void @__kmpc_barrier(%struct.ident_t* @3, i32 %omp_global_thread_num2)
  br label %omp.par.pre_finalize, !dbg !11

omp.par.pre_finalize:                             ; preds = %omp.par.region1
  br label %omp.par.outlined.exit.exitStub
}

Found that they should be in the entry block for optimizations. @fghanim is this what you are suggesting?
http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas

While debugging the nested parallel region issues, we saw some difference between where the allocas are placed by the OpenMP IRBuilder in the clang usage and the MLIR usage. Moving to the version where the OpenMP IRBuilder maintains allocaIP fixed the difference. In MLIR module translation there is no alloca insertion point. So what we can provide as allocaIP is the current insertionPoint.

Yes, this is the expected behavior. in clang we follow the convention of setting the insertion point for allocas to proper entry block, and let clang handle all of the none OMP code generation, and we let clang pass to us where it wants us to put its alloca instructions. Here, you are passing the current insertion point of your IRBuilder at the time the bodyCB is called, and all allocas are output there.

Is that OK?

Depends on your goal. is it illegal? no it is not. you can put allocas wherever you want
Does it adversely affect performance as you point out in a later comment? absolutely
Is it possible to have an effect on correctness? possible. Please check below for why I believe it may.

What is the allocaIP used for?

to specify where you want all allocas to go.

Why do we need a separate allocaIP, why cannot we treat it like a normal instruction?

two reasons:

there are allocas that go inside the outlined region, and allocas that go outside of it, and we need to be able to chose when to codegen each.
We try to generate allocas into the entry block of the relevant function. and treating it like a normal instruction means we cannot.

Is it because all alloca instructions should be together at the top of the function?

yes, but as I mention above, not just that.

The langref for alloca did not have such a requirement.

It is a requirement for optimization and possible the backend, not for correctness of IR which is what the langref is concerned about.

I have created another review without the alloca changes and that also works correctly for the nested parallel case.
https://reviews.llvm.org/D88720

I saw that this patch resolved the problem you were seeing. I am glad it did. It worked, because, as I and JDoerfert pointed out in the original patch, your problem had nothing to do with alloca locations but has everything to do with your outlined region not being completely contained within the entry and exit blocks of the parallel region. D88720 seems to have fixed that.

I belive all allocas in the LLVM dialect will also be in the entry block of the OpenMP operation. But these will added only in the bodygen call-back. So they will be added to omp.par.region (actually omp.par.region1 since a dummy branch is created). But all these are trivial branches (not conditional) and can't they be inlined into the entry block if required? See example below for,
{ ... code ...}

Sure. but what happens when you generate Copyin for example? the allocas in the body are very likely going to end up after the copyin CFG structure, which means these are not going to be in the entry block. (to get what I mean check the createcopyinblocks in the OMPBuilder)
Furthermore, while there is the CFGSimplify pass which should remove all extra branches, are you guaranteed to run that everytime? what happens when you run the frontend with -O0?

Here is a question regarding allocas in the llvm dialect; by the time they are translated to llvm ir proper, are they already located in the entry block in the dialect itself, or are they located in different places but you have something like an AllocaIP where you move allocas to entry block during the translation to proper llvm IR?
What is FIR planning to do about their stack allocations? if they are not guaranteeing it's in the entry BB then that's an issue.
is OMP-IR meant to also work alongside a future C/C++ dialect? If yes, then clang people are very unlikely to be ok with not having allocas in the entry block, and OMP-IR needs to follow the conventions of the relevant frontend when it comes variable declarations, etc., regardless of what that frontend is.

Found that they should be in the entry block for optimizations. @fghanim is this what you are suggesting?
http://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas

mem2reg pass will look for all allocas in a function and move them into entry block. However, when you run a frontend with -O0, we don't run any optimization passes. I remember working with llvm passes that ignored Allocas not in the entry block. so yes, it is about optimizations but not only that. Many of the various backends take advantage of the fact that allocas are in the entry block to reserve stack space upon entry into a function. Unless you can guarantee that everyone of those has a way to handle not having all allocas in entry block, we should guarantee that for them - at which point it's a correctness issue.

To be clear; I am NOT against keeping track of Alloca insertion points in the OMPBuilder if there is a reason to. As can be seen in D82470 where I suggested multiple other ways to do so. However, I have a huge problem with creating a special IRBuilder just to do so.
I do have a small preference towards not keeping state of anything that we don't need, just out of consistency with other IRBuilders (i.e. you tell an IRBuilder what do you want it to do and where to do it, rather than it knowing that).

SouraVX added inline comments.Oct 4 2020, 11:02 PM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
724	This must be un-intentional! This is what asserted in first place and present in trunk.

In D88706#2309408, @fghanim wrote:

Here is a question regarding allocas in the llvm dialect; by the time they are translated to llvm ir proper, are they already located in the entry block in the dialect itself, or are they located in different places but you have something like an AllocaIP where you move allocas to entry block during the translation to proper llvm IR?
What is FIR planning to do about their stack allocations? if they are not guaranteeing it's in the entry BB then that's an issue.

AFAIK both of these cases we end up having alloca in the entry block.
@schweitz can confirm this WRT FIR.

Overall I think this might be a phase ordering problem, i.e we are outlining late(every thing is already at LLVMIR Dialect level) we've lost the insertionPoint abstraction.
clang handle all this differently.

Thanks @fghanim for the detailed response. Very helpful.

As @SouraVX is suggesting, when FIR is created I believe there are entry blocks where allocas are placed. These would be converted to llvm dialect allocas in MLIR. Since blocks are translated in topological order, the entry block would have been processed (converted to LLVM IR allocas) when we reach an OpenMP operation. So I believe a location in the entry block (last alloca, or first alloca) can be passed as the outer allocaIP. We can also either structure the region inside the OpenMP operation to have an entry block and pass it to the OpenMP IRBuilder as the inner allocaIP.

fir.alloca ops should be hoisted to the entry block. Because Fortran is pass-by-reference, correctness will often simply require stack allocations. However, that said, in cases where alloca ops can be promoted to registers, they will be although that is disabled at the moment.

In D88706#2313091, @schweitz wrote:

fir.alloca ops should be hoisted to the entry block. Because Fortran is pass-by-reference, correctness will often simply require stack allocations. However, that said, in cases where alloca ops can be promoted to registers, they will be although that is disabled at the moment.

I guess this is "often" correct but that is beyond the point. The OpenMP-IR-Builder introduces the allocas in question and they cannot go into the function entry block. That is simply not sound.
The allocas need to be placed at the last alloca point provided by the OpenMP-IR-Builder since those points will become the entry blocks of new functions and those entry blocks might be executed by more threads making "pass-by-reference" reuse unsound.

That said, I doubt that the allocas caused by "user Fortran code" can *always* go into the function entry either, though that is a discussion for another day.

kiranchandramohan retitled this revision from [OpenMP][MLIR] WIP : Fix for nested parallel region to [OpenMP][MLIR] WIP : Fix for AllocaIP.Nov 20 2020, 7:28 AM

Herald added subscribers: teijeong, rdzhabarov. · View Herald TranscriptNov 20 2020, 7:28 AM

kiranchandramohan mentioned this in D87247: [MLIR,OpenMP] Added support for lowering MasterOp to LLVMIR.Nov 20 2020, 8:36 AM

ftynse resigned from this revision.Aug 27 2021, 12:17 AM

Herald added subscribers: wrengr, Chia-hungDuan, dcaballe, cota. · View Herald TranscriptAug 27 2021, 12:17 AM

No plans to pursue this.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 5:18 AM

Herald added subscribers: awarzynski, sdasgup3, wenzhicui. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPIRBuilder.h

6 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

39 lines

unittests/

Frontend/

OpenMPIRBuilderTest.cpp

198 lines

mlir/

include/

mlir/

Target/

LLVMIR/

ModuleTranslation.h

4 lines

lib/

Target/

LLVMIR/

ModuleTranslation.cpp

33 lines

test/

Target/

openmp-llvm.mlir

23 lines

Diff 295680

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show All 22 Lines

/// An interface to create LLVM-IR for OpenMP directives.		/// An interface to create LLVM-IR for OpenMP directives.
///		///
/// Each OpenMP directive has a corresponding public generator method.		/// Each OpenMP directive has a corresponding public generator method.
class OpenMPIRBuilder {		class OpenMPIRBuilder {
public:		public:
/// Create a new OpenMPIRBuilder operating on the given module \p M. This will		/// Create a new OpenMPIRBuilder operating on the given module \p M. This will
/// not have an effect on \p M (see initialize).		/// not have an effect on \p M (see initialize).
OpenMPIRBuilder(Module &M) : M(M), Builder(M.getContext()) {}		OpenMPIRBuilder(Module &M)
		: M(M), Builder(M.getContext()), AllocaBuilder(M.getContext()) {}

/// Initialize the internal state, this will put structures types and		/// Initialize the internal state, this will put structures types and
/// potentially other helpers into the underlying module. Must be called		/// potentially other helpers into the underlying module. Must be called
/// before any other method and only once!		/// before any other method and only once!
void initialize();		void initialize();

/// Finalize the underlying module, e.g., by outlining regions.		/// Finalize the underlying module, e.g., by outlining regions.
void finalize();		void finalize();
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
Value getOrCreateThreadID(Value Ident);		Value getOrCreateThreadID(Value Ident);

/// The underlying LLVM-IR module		/// The underlying LLVM-IR module
Module &M;		Module &M;

/// The LLVM-IR Builder used to create IR.		/// The LLVM-IR Builder used to create IR.
IRBuilder<> Builder;		IRBuilder<> Builder;

		/// The LLVM-IR Builder used to create alloca instructions.
		IRBuilder<> AllocaBuilder;

/// Map to remember source location strings		/// Map to remember source location strings
StringMap<Constant *> SrcLocStrMap;		StringMap<Constant *> SrcLocStrMap;

/// Map to remember existing ident_t*.		/// Map to remember existing ident_t*.
DenseMap<std::pair<Constant , uint64_t>, Value > IdentMap;		DenseMap<std::pair<Constant , uint64_t>, Value > IdentMap;

/// Helper that contains information about regions we need to outline		/// Helper that contains information about regions we need to outline
/// during finalization.		/// during finalization.
▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel(

BasicBlock *InsertBB = Builder.GetInsertBlock();		BasicBlock *InsertBB = Builder.GetInsertBlock();
Function *OuterFn = InsertBB->getParent();		Function *OuterFn = InsertBB->getParent();

// Vector to remember instructions we used only during the modeling but which		// Vector to remember instructions we used only during the modeling but which
// we want to delete at the end.		// we want to delete at the end.
SmallVector<Instruction *, 4> ToBeDeleted;		SmallVector<Instruction *, 4> ToBeDeleted;

Builder.SetInsertPoint(OuterFn->getEntryBlock().getFirstNonPHI());		// The alloca builder is managed internally basically like a stack. The
AllocaInst *TIDAddr = Builder.CreateAlloca(Int32, nullptr, "tid.addr");		// insertion point guards keep the old top value alive while we update it for
AllocaInst *ZeroAddr = Builder.CreateAlloca(Int32, nullptr, "zero.addr");		// the body.
		//
		// TODO: We now have an internal AllocaBuilder and the AllocaIP in the
		// callback, one might suffice.
		IRBuilder<>::InsertPointGuard AIPG(AllocaBuilder);

		// For the first outermost region we need to initialize the alloca builder.
		if (!AllocaBuilder.GetInsertBlock())
		AllocaBuilder.SetInsertPoint(OuterFn->getEntryBlock().getFirstNonPHI());

		// Use the debug location of the pragma for alloca related code as well.
		AllocaBuilder.SetCurrentDebugLocation(Loc.DL);

		AllocaInst *TIDAddr = AllocaBuilder.CreateAlloca(Int32, nullptr, "tid.addr");
		AllocaInst *ZeroAddr =
		AllocaBuilder.CreateAlloca(Int32, nullptr, "zero.addr");

// If there is an if condition we actually use the TIDAddr and ZeroAddr in the		// If there is an if condition we actually use the TIDAddr and ZeroAddr in the
// program, otherwise we only need them for modeling purposes to get the		// program, otherwise we only need them for modeling purposes to get the
// associated arguments in the outlined function. In the former case,		// associated arguments in the outlined function. In the former case,
// initialize the allocas properly, in the latter case, delete them later.		// initialize the allocas properly, in the latter case, delete them later.
if (IfCondition) {		if (IfCondition) {
Builder.CreateStore(Constant::getNullValue(Int32), TIDAddr);		AllocaBuilder.CreateStore(Constant::getNullValue(Int32), TIDAddr);
Builder.CreateStore(Constant::getNullValue(Int32), ZeroAddr);		AllocaBuilder.CreateStore(Constant::getNullValue(Int32), ZeroAddr);
} else {		} else {
ToBeDeleted.push_back(TIDAddr);		ToBeDeleted.push_back(TIDAddr);
ToBeDeleted.push_back(ZeroAddr);		ToBeDeleted.push_back(ZeroAddr);
}		}

// Create an artificial insertion point that will also ensure the blocks we		// Create an artificial insertion point that will also ensure the blocks we
// are about to split are not degenerated.		// are about to split are not degenerated.
auto *UI = new UnreachableInst(Builder.getContext(), InsertBB);		auto *UI = new UnreachableInst(Builder.getContext(), InsertBB);
Show All 27 Lines	IRBuilder<>::InsertPoint OpenMPIRBuilder::CreateParallel(
};		};

FinalizationStack.push_back({FiniCBWrapper, OMPD_parallel, IsCancellable});		FinalizationStack.push_back({FiniCBWrapper, OMPD_parallel, IsCancellable});

// Generate the privatization allocas in the block that will become the entry		// Generate the privatization allocas in the block that will become the entry
// of the outlined function.		// of the outlined function.
InsertPointTy AllocaIP(PRegEntryBB,		InsertPointTy AllocaIP(PRegEntryBB,
PRegEntryBB->getTerminator()->getIterator());		PRegEntryBB->getTerminator()->getIterator());
Builder.restoreIP(AllocaIP);		AllocaBuilder.restoreIP(AllocaIP);
AllocaInst *PrivTIDAddr =		AllocaInst *PrivTIDAddr =
Builder.CreateAlloca(Int32, nullptr, "tid.addr.local");		AllocaBuilder.CreateAlloca(Int32, nullptr, "tid.addr.local");
Instruction *PrivTID = Builder.CreateLoad(PrivTIDAddr, "tid");		Instruction *PrivTID = AllocaBuilder.CreateLoad(PrivTIDAddr, "tid");

// Add some fake uses for OpenMP provided arguments.		// Add some fake uses for OpenMP provided arguments.
ToBeDeleted.push_back(Builder.CreateLoad(TIDAddr, "tid.addr.use"));		ToBeDeleted.push_back(AllocaBuilder.CreateLoad(TIDAddr, "tid.addr.use"));
ToBeDeleted.push_back(Builder.CreateLoad(ZeroAddr, "zero.addr.use"));		ToBeDeleted.push_back(AllocaBuilder.CreateLoad(ZeroAddr, "zero.addr.use"));

// ThenBB		// ThenBB
// \|		// \|
// V		// V
// PRegionEntryBB <- Privatization allocas are placed here.		// PRegionEntryBB <- Privatization allocas are placed here.
// \|		// \|
// V		// V
// PRegionBodyBB <- BodeGen is invoked here.		// PRegionBodyBB <- BodeGen is invoked here.
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	auto PrivHelper = [&](Value &V) {
for (Use *UPtr : Uses)		for (Use *UPtr : Uses)
UPtr->set(ReplacementValue);		UPtr->set(ReplacementValue);
};		};

for (Value *Input : Inputs) {		for (Value *Input : Inputs) {
LLVM_DEBUG(dbgs() << "Captured input: " << *Input << "\n");		LLVM_DEBUG(dbgs() << "Captured input: " << *Input << "\n");
PrivHelper(*Input);		PrivHelper(*Input);
}		}
		LLVM_DEBUG({
		SouraVXUnsubmitted Not Done Reply Inline Actions This must be un-intentional! This is what asserted in first place and present in trunk. SouraVX: This must be un-intentional! This is what asserted in first place and present in trunk.
		for (Value *Output : Outputs)
		LLVM_DEBUG(dbgs() << "Captured output: " << *Output << "\n");
		});
		fghanimUnsubmitted Not Done Reply Inline Actions These are unrelated to maintaining alloca insertion point. I don't recall back then whether these were removed completely, or added as part of other (2?) patches that were split off of this one. fghanim: These are unrelated to maintaining alloca insertion point. I don't recall back then whether…
assert(Outputs.empty() &&		assert(Outputs.empty() &&
"OpenMP outlining should not produce live-out values!");		"OpenMP outlining should not produce live-out values!");

LLVM_DEBUG(dbgs() << "After privatization: " << *OuterFn << "\n");		LLVM_DEBUG(dbgs() << "After privatization: " << *OuterFn << "\n");
LLVM_DEBUG({		LLVM_DEBUG({
for (auto *BB : Blocks)		for (auto *BB : Blocks)
dbgs() << " PBR: " << BB->getName() << "\n";		dbgs() << " PBR: " << BB->getName() << "\n";
});		});
▲ Show 20 Lines • Show All 488 Lines • Show Last 20 Lines

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

//===- llvm/unittest/IR/OpenMPIRBuilderTest.cpp - OpenMPIRBuilder tests ---===//		//===- llvm/unittest/IR/OpenMPIRBuilderTest.cpp - OpenMPIRBuilder tests ---===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"		#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"

using namespace llvm;		using namespace llvm;
using namespace omp;		using namespace omp;

namespace {		namespace {
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	TEST_F(OpenMPIRBuilderTest, ParallelSimple) {
EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);		EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);
EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));		EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));
EXPECT_EQ(ForkCI->getArgOperand(1),		EXPECT_EQ(ForkCI->getArgOperand(1),
ConstantInt::get(Type::getInt32Ty(Ctx), 1U));		ConstantInt::get(Type::getInt32Ty(Ctx), 1U));
EXPECT_EQ(ForkCI->getArgOperand(2), Usr);		EXPECT_EQ(ForkCI->getArgOperand(2), Usr);
EXPECT_EQ(ForkCI->getArgOperand(3), F->arg_begin());		EXPECT_EQ(ForkCI->getArgOperand(3), F->arg_begin());
}		}

		TEST_F(OpenMPIRBuilderTest, ParallelNested) {
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
		OpenMPIRBuilder OMPBuilder(*M);
		OMPBuilder.initialize();
		F->setName("func");
		IRBuilder<> Builder(BB);

		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});

		unsigned NumInnerBodiesGenerated = 0;
		unsigned NumOuterBodiesGenerated = 0;
		unsigned NumFinalizationPoints = 0;

		auto InnerBodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationIP) {
		++NumInnerBodiesGenerated;
		};

		auto PrivCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		Value &VPtr, Value *&ReplacementValue) -> InsertPointTy {
		// Trivial copy (=firstprivate).
		Builder.restoreIP(AllocaIP);
		Type *VTy = VPtr.getType()->getPointerElementType();
		Value *V = Builder.CreateLoad(VTy, &VPtr, VPtr.getName() + ".reload");
		ReplacementValue = Builder.CreateAlloca(VTy, 0, VPtr.getName() + ".copy");
		Builder.restoreIP(CodeGenIP);
		Builder.CreateStore(V, ReplacementValue);
		return CodeGenIP;
		};

		auto FiniCB = [&](InsertPointTy CodeGenIP) { ++NumFinalizationPoints; };

		auto OuterBodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationIP) {
		++NumOuterBodiesGenerated;
		Builder.restoreIP(CodeGenIP);
		BasicBlock *CGBB = CodeGenIP.getBlock();
		BasicBlock NewBB = SplitBlock(CGBB, &CodeGenIP.getPoint());
		CGBB->getTerminator()->eraseFromParent();
		;

		IRBuilder<>::InsertPoint AfterIP = OMPBuilder.CreateParallel(
		InsertPointTy(CGBB, CGBB->end()), InnerBodyGenCB, PrivCB, FiniCB,
		nullptr, nullptr, OMP_PROC_BIND_default, false);

		Builder.restoreIP(AfterIP);
		Builder.CreateBr(NewBB);
		};

		IRBuilder<>::InsertPoint AfterIP =
		OMPBuilder.CreateParallel(Loc, OuterBodyGenCB, PrivCB, FiniCB, nullptr,
		nullptr, OMP_PROC_BIND_default, false);

		EXPECT_EQ(NumInnerBodiesGenerated, 1U);
		EXPECT_EQ(NumOuterBodiesGenerated, 1U);
		EXPECT_EQ(NumFinalizationPoints, 2U);

		Builder.restoreIP(AfterIP);
		Builder.CreateRetVoid();

		OMPBuilder.finalize();

		EXPECT_EQ(M->size(), 5U);
		for (Function &OutlinedFn : *M) {
		if (F == &OutlinedFn \|\| OutlinedFn.isDeclaration())
		continue;
		EXPECT_FALSE(verifyModule(*M, &errs()));
		EXPECT_TRUE(OutlinedFn.hasFnAttribute(Attribute::NoUnwind));
		EXPECT_TRUE(OutlinedFn.hasFnAttribute(Attribute::NoRecurse));
		EXPECT_TRUE(OutlinedFn.hasParamAttribute(0, Attribute::NoAlias));
		EXPECT_TRUE(OutlinedFn.hasParamAttribute(1, Attribute::NoAlias));

		EXPECT_TRUE(OutlinedFn.hasInternalLinkage());
		EXPECT_EQ(OutlinedFn.arg_size(), 2U);

		EXPECT_EQ(OutlinedFn.getNumUses(), 1U);
		User *Usr = OutlinedFn.user_back();
		ASSERT_TRUE(isa<ConstantExpr>(Usr));
		CallInst *ForkCI = dyn_cast<CallInst>(Usr->user_back());
		ASSERT_NE(ForkCI, nullptr);

		EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");
		EXPECT_EQ(ForkCI->getNumArgOperands(), 3U);
		EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));
		EXPECT_EQ(ForkCI->getArgOperand(1),
		ConstantInt::get(Type::getInt32Ty(Ctx), 0U));
		EXPECT_EQ(ForkCI->getArgOperand(2), Usr);
		}
		}

		TEST_F(OpenMPIRBuilderTest, ParallelNested2Inner) {
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
		OpenMPIRBuilder OMPBuilder(*M);
		OMPBuilder.initialize();
		F->setName("func");
		IRBuilder<> Builder(BB);

		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});

		unsigned NumInnerBodiesGenerated = 0;
		unsigned NumOuterBodiesGenerated = 0;
		unsigned NumFinalizationPoints = 0;

		auto InnerBodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationIP) {
		++NumInnerBodiesGenerated;
		};

		auto PrivCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		Value &VPtr, Value *&ReplacementValue) -> InsertPointTy {
		// Trivial copy (=firstprivate).
		Builder.restoreIP(AllocaIP);
		Type *VTy = VPtr.getType()->getPointerElementType();
		Value *V = Builder.CreateLoad(VTy, &VPtr, VPtr.getName() + ".reload");
		ReplacementValue = Builder.CreateAlloca(VTy, 0, VPtr.getName() + ".copy");
		Builder.restoreIP(CodeGenIP);
		Builder.CreateStore(V, ReplacementValue);
		return CodeGenIP;
		};

		auto FiniCB = [&](InsertPointTy CodeGenIP) { ++NumFinalizationPoints; };

		auto OuterBodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationIP) {
		++NumOuterBodiesGenerated;
		Builder.restoreIP(CodeGenIP);
		BasicBlock *CGBB = CodeGenIP.getBlock();
		BasicBlock NewBB1 = SplitBlock(CGBB, &CodeGenIP.getPoint());
		BasicBlock NewBB2 = SplitBlock(NewBB1, &NewBB1->getFirstInsertionPt());
		CGBB->getTerminator()->eraseFromParent();
		;
		NewBB1->getTerminator()->eraseFromParent();
		;

		IRBuilder<>::InsertPoint AfterIP1 = OMPBuilder.CreateParallel(
		InsertPointTy(CGBB, CGBB->end()), InnerBodyGenCB, PrivCB, FiniCB,
		nullptr, nullptr, OMP_PROC_BIND_default, false);

		Builder.restoreIP(AfterIP1);
		Builder.CreateBr(NewBB1);

		IRBuilder<>::InsertPoint AfterIP2 = OMPBuilder.CreateParallel(
		InsertPointTy(NewBB1, NewBB1->end()), InnerBodyGenCB, PrivCB, FiniCB,
		nullptr, nullptr, OMP_PROC_BIND_default, false);

		Builder.restoreIP(AfterIP2);
		Builder.CreateBr(NewBB2);
		};

		IRBuilder<>::InsertPoint AfterIP =
		OMPBuilder.CreateParallel(Loc, OuterBodyGenCB, PrivCB, FiniCB, nullptr,
		nullptr, OMP_PROC_BIND_default, false);

		EXPECT_EQ(NumInnerBodiesGenerated, 2U);
		EXPECT_EQ(NumOuterBodiesGenerated, 1U);
		EXPECT_EQ(NumFinalizationPoints, 3U);

		Builder.restoreIP(AfterIP);
		Builder.CreateRetVoid();

		OMPBuilder.finalize();

		EXPECT_EQ(M->size(), 6U);
		for (Function &OutlinedFn : *M) {
		if (F == &OutlinedFn \|\| OutlinedFn.isDeclaration())
		continue;
		EXPECT_FALSE(verifyModule(*M, &errs()));
		EXPECT_TRUE(OutlinedFn.hasFnAttribute(Attribute::NoUnwind));
		EXPECT_TRUE(OutlinedFn.hasFnAttribute(Attribute::NoRecurse));
		EXPECT_TRUE(OutlinedFn.hasParamAttribute(0, Attribute::NoAlias));
		EXPECT_TRUE(OutlinedFn.hasParamAttribute(1, Attribute::NoAlias));

		EXPECT_TRUE(OutlinedFn.hasInternalLinkage());
		EXPECT_EQ(OutlinedFn.arg_size(), 2U);

		unsigned NumAllocas = 0;
		for (Instruction &I : instructions(OutlinedFn))
		NumAllocas += isa<AllocaInst>(I);
		EXPECT_EQ(NumAllocas, 1U);

		EXPECT_EQ(OutlinedFn.getNumUses(), 1U);
		User *Usr = OutlinedFn.user_back();
		ASSERT_TRUE(isa<ConstantExpr>(Usr));
		CallInst *ForkCI = dyn_cast<CallInst>(Usr->user_back());
		ASSERT_NE(ForkCI, nullptr);

		EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");
		EXPECT_EQ(ForkCI->getNumArgOperands(), 3U);
		EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));
		EXPECT_EQ(ForkCI->getArgOperand(1),
		ConstantInt::get(Type::getInt32Ty(Ctx), 0U));
		EXPECT_EQ(ForkCI->getArgOperand(2), Usr);
		}
		}

TEST_F(OpenMPIRBuilderTest, ParallelIfCond) {		TEST_F(OpenMPIRBuilderTest, ParallelIfCond) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
F->setName("func");		F->setName("func");
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);

OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
▲ Show 20 Lines • Show All 485 Lines • Show Last 20 Lines

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	private:
std::unique_ptr<detail::DebugTranslation> debugTranslation;		std::unique_ptr<detail::DebugTranslation> debugTranslation;

/// Builder for LLVM IR generation of OpenMP constructs.		/// Builder for LLVM IR generation of OpenMP constructs.
std::unique_ptr<llvm::OpenMPIRBuilder> ompBuilder;		std::unique_ptr<llvm::OpenMPIRBuilder> ompBuilder;
/// Precomputed pointer to OpenMP dialect. Note this can be nullptr if the		/// Precomputed pointer to OpenMP dialect. Note this can be nullptr if the
/// OpenMP dialect hasn't been loaded (it is always loaded if there are OpenMP		/// OpenMP dialect hasn't been loaded (it is always loaded if there are OpenMP
/// operations in the module though).		/// operations in the module though).
const Dialect *ompDialect;		const Dialect *ompDialect;
		/// Stack which stores the target block to which a branch a must be added when
		/// a terminator is seen. A stack is required to handle nested OpenMP parallel
		/// regions.
		llvm::SmallVector<llvm::BasicBlock *, 4> ompContinuationIPStack;

/// Mappings between llvm.mlir.global definitions and corresponding globals.		/// Mappings between llvm.mlir.global definitions and corresponding globals.
DenseMap<Operation , llvm::GlobalValue > globalsMapping;		DenseMap<Operation , llvm::GlobalValue > globalsMapping;

/// A stateful object used to translate types.		/// A stateful object used to translate types.
TypeToLLVMIRTranslator typeTranslator;		TypeToLLVMIRTranslator typeTranslator;

protected:		protected:
Show All 10 Lines

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	ModuleTranslation::convertOmpParallel(Operation &opInst,
using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;

auto bodyGenCB = [&](InsertPointTy allocaIP, InsertPointTy codeGenIP,		auto bodyGenCB = [&](InsertPointTy allocaIP, InsertPointTy codeGenIP,
llvm::BasicBlock &continuationIP) {		llvm::BasicBlock &continuationIP) {
llvm::LLVMContext &llvmContext = llvmModule->getContext();		llvm::LLVMContext &llvmContext = llvmModule->getContext();

llvm::BasicBlock *codeGenIPBB = codeGenIP.getBlock();		llvm::BasicBlock *codeGenIPBB = codeGenIP.getBlock();
llvm::Instruction *codeGenIPBBTI = codeGenIPBB->getTerminator();		llvm::Instruction *codeGenIPBBTI = codeGenIPBB->getTerminator();
		ompContinuationIPStack.push_back(&continuationIP);

builder.SetInsertPoint(codeGenIPBB);
// ParallelOp has only `1` region associated with it.		// ParallelOp has only `1` region associated with it.
auto &region = cast<omp::ParallelOp>(opInst).getRegion();		auto &region = cast<omp::ParallelOp>(opInst).getRegion();
for (auto &bb : region) {		for (auto &bb : region) {
auto *llvmBB = llvm::BasicBlock::Create(		auto *llvmBB = llvm::BasicBlock::Create(
llvmContext, "omp.par.region", codeGenIP.getBlock()->getParent());		llvmContext, "omp.par.region", codeGenIP.getBlock()->getParent());
blockMapping[&bb] = llvmBB;		blockMapping[&bb] = llvmBB;
}		}

// Then, convert blocks one by one in topological order to ensure		// Then, convert blocks one by one in topological order to ensure
// defs are converted before uses.		// defs are converted before uses.
llvm::SetVector<Block *> blocks = topologicalSort(region);		llvm::SetVector<Block *> blocks = topologicalSort(region);
for (auto indexedBB : llvm::enumerate(blocks)) {		for (auto indexedBB : llvm::enumerate(blocks)) {
Block *bb = indexedBB.value();		Block *bb = indexedBB.value();
llvm::BasicBlock *curLLVMBB = blockMapping[bb];		llvm::BasicBlock *curLLVMBB = blockMapping[bb];
if (bb->isEntryBlock())		if (bb->isEntryBlock()) {
		assert(codeGenIPBBTI->getNumSuccessors() == 1 &&
		"OpenMPIRBuilder provided entry block has multiple successors");
codeGenIPBBTI->setSuccessor(0, curLLVMBB);		codeGenIPBBTI->setSuccessor(0, curLLVMBB);
		}

// TODO: Error not returned up the hierarchy		// TODO: Error not returned up the hierarchy
if (failed(convertBlock(bb, /ignoreArguments=*/indexedBB.index() == 0)))		if (failed(convertBlock(bb, /ignoreArguments=*/indexedBB.index() == 0)))
return;		return;

// If this block has the terminator then add a jump to
// continuation bb
for (auto &op : *bb) {
if (isa<omp::TerminatorOp>(op)) {
builder.SetInsertPoint(curLLVMBB);
builder.CreateBr(&continuationIP);
}
}
}		}

		ompContinuationIPStack.pop_back();

// Finally, after all blocks have been traversed and values mapped,		// Finally, after all blocks have been traversed and values mapped,
// connect the PHI nodes to the results of preceding blocks.		// connect the PHI nodes to the results of preceding blocks.
connectPHINodes(region, valueMapping, blockMapping);		connectPHINodes(region, valueMapping, blockMapping);
};		};

// TODO: Perform appropriate actions according to the data-sharing		// TODO: Perform appropriate actions according to the data-sharing
// attribute (shared, private, firstprivate, ...) of variables.		// attribute (shared, private, firstprivate, ...) of variables.
// Currently defaults to shared.		// Currently defaults to shared.
Show All 18 Lines	ModuleTranslation::convertOmpParallel(Operation &opInst,
llvm::omp::ProcBindKind pbKind = llvm::omp::OMP_PROC_BIND_default;		llvm::omp::ProcBindKind pbKind = llvm::omp::OMP_PROC_BIND_default;
if (auto bind = cast<omp::ParallelOp>(opInst).proc_bind_val())		if (auto bind = cast<omp::ParallelOp>(opInst).proc_bind_val())
pbKind = llvm::omp::getProcBindKind(bind.getValue());		pbKind = llvm::omp::getProcBindKind(bind.getValue());
// TODO: Is the Parallel construct cancellable?		// TODO: Is the Parallel construct cancellable?
bool isCancellable = false;		bool isCancellable = false;
// TODO: Determine the actual alloca insertion point, e.g., the function		// TODO: Determine the actual alloca insertion point, e.g., the function
// entry or the alloca insertion point as provided by the body callback		// entry or the alloca insertion point as provided by the body callback
// above.		// above.
llvm::OpenMPIRBuilder::InsertPointTy allocaIP(builder.saveIP());		// llvm::OpenMPIRBuilder::InsertPointTy allocaIP(builder.saveIP());
builder.restoreIP(		builder.restoreIP(ompBuilder->CreateParallel(builder, bodyGenCB, privCB,
ompBuilder->CreateParallel(builder, allocaIP, bodyGenCB, privCB, finiCB,		finiCB, ifCond, numThreads,
ifCond, numThreads, pbKind, isCancellable));		pbKind, isCancellable));
return success();		return success();
}		}

/// Given an OpenMP MLIR operation, create the corresponding LLVM IR		/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
/// (including OpenMP runtime calls).		/// (including OpenMP runtime calls).
LogicalResult		LogicalResult
ModuleTranslation::convertOmpOperation(Operation &opInst,		ModuleTranslation::convertOmpOperation(Operation &opInst,
llvm::IRBuilder<> &builder) {		llvm::IRBuilder<> &builder) {
Show All 21 Lines	return llvm::TypeSwitch<Operation *, LogicalResult>(&opInst)
// "An implementation may implement a flush with a list by ignoring		// "An implementation may implement a flush with a list by ignoring
// the list, and treating it the same as a flush without a list."		// the list, and treating it the same as a flush without a list."
//		//
// The argument list is discarded so that, flush with a list is treated		// The argument list is discarded so that, flush with a list is treated
// same as a flush without a list.		// same as a flush without a list.
ompBuilder->CreateFlush(builder.saveIP());		ompBuilder->CreateFlush(builder.saveIP());
return success();		return success();
})		})
.Case([&](omp::TerminatorOp) { return success(); })		.Case([&](omp::TerminatorOp) {
		llvm::BranchInst::Create(ompContinuationIPStack.back(),
		builder.GetInsertBlock());
		return success();
		})
.Case(		.Case(
[&](omp::ParallelOp) { return convertOmpParallel(opInst, builder); })		[&](omp::ParallelOp) { return convertOmpParallel(opInst, builder); })
.Default([&](Operation *inst) {		.Default([&](Operation *inst) {
return inst->emitError("unsupported OpenMP operation: ")		return inst->emitError("unsupported OpenMP operation: ")
<< inst->getName();		<< inst->getName();
});		});
}		}

▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

mlir/test/Target/openmp-llvm.mlir

Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	llvm.func @test_omp_parallel_3() -> () {
}		}

llvm.return		llvm.return
}		}

// CHECK: define internal void @[[OMP_OUTLINED_FN_3_3]]		// CHECK: define internal void @[[OMP_OUTLINED_FN_3_3]]
// CHECK: define internal void @[[OMP_OUTLINED_FN_3_2]]		// CHECK: define internal void @[[OMP_OUTLINED_FN_3_2]]
// CHECK: define internal void @[[OMP_OUTLINED_FN_3_1]]		// CHECK: define internal void @[[OMP_OUTLINED_FN_3_1]]

		// CHECK-LABEL: define void @test_omp_parallel_4()
		llvm.func @test_omp_parallel_4() -> () {
		// CHECK: call void {{.}}@__kmpc_fork_call{{.}} @[[OMP_OUTLINED_FN_4_1:.*]] to
		// CHECK: define internal void @[[OMP_OUTLINED_FN_4_1]]
		// CHECK: call void @__kmpc_barrier
		// CHECK: call void {{.}}@__kmpc_fork_call{{.}} @[[OMP_OUTLINED_FN_4_1_1:.*]] to
		// CHECK: call void @__kmpc_barrier
		omp.parallel {
		omp.barrier

		// CHECK: define internal void @[[OMP_OUTLINED_FN_4_1_1]]
		// CHECK: call void @__kmpc_barrier
		omp.parallel {
		omp.barrier
		omp.terminator
		}

		omp.barrier
		omp.terminator
		}
		llvm.return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][MLIR] WIP : Fix for AllocaIPAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 295680

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

mlir/test/Target/openmp-llvm.mlir

[OpenMP][MLIR] WIP : Fix for AllocaIP
AbandonedPublic