This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Target/LLVMIR/Dialect/OpenMP/
-
Target/
-
LLVMIR/
-
Dialect/
-
OpenMP/
9/10
OpenMPToLLVMIRTranslation.cpp
-
test/Target/LLVMIR/
-
Target/
-
LLVMIR/
3/3
openmp-llvm.mlir

Differential D101307

[mlir] OpenMP-to-LLVM: properly set outer alloca insertion point
ClosedPublic

Authored by ftynse on Apr 26 2021, 10:00 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
kiranchandramohan
wsmoses
chelini
kumasento

Commits

rG72d013dd73f4: [mlir] OpenMP-to-LLVM: properly set outer alloca insertion point

Summary

Previously, the OpenMP to LLVM IR conversion was setting the alloca insertion
point to the same position as the main compuation when converting OpenMP
parallel operations. This is problematic if, for example, the parallel
operation is placed inside a loop and would keep allocating on stack on each
iteration leading to stack overflow.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Apr 26 2021, 10:00 AM

Herald added subscribers: dcaballe, cota, teijeong and 18 others. · View Herald TranscriptApr 26 2021, 10:00 AM

ftynse requested review of this revision.Apr 26 2021, 10:00 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptApr 26 2021, 10:00 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

ftynse added a reviewer: kiranchandramohan.Apr 26 2021, 10:01 AM

ftynse added reviewers: wsmoses, chelini, kumasento.

Thanks @ftynse for this patch.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	I was planning to a few questions in this area. Will this getParent work correctly with nested parallelism and the if is in the nested parallel operation? Should we be marking the omp.parallel operation with some of the function attributes like Function-Like, AutomaticAllocationScope? llvm.func fn() { omp.parallel { omp.parallel if() { } } }
mlir/test/Target/LLVMIR/openmp-llvm.mlir
155	Nit: funciton -> function
158–159	Is this enough to check that it is in the entry? Or should there be a check for a subsequent block name or number?

kiranchandramohan added inline comments.Apr 26 2021, 10:47 AM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	And if the above does work correctly, consider the following. llvm.func fn() { omp.parallel { omp.master { //<- The region of this operation will not be outlined. omp.parallel if() { } } } }

kumasento added inline comments.Apr 27 2021, 2:45 PM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if `getParent()` doesn't give a function?
140	It seems to me that if `bodyGenStatus` was failed earlier than `restoreIP`, the old logic will just abort but this new version will still do the restore. Would this cause any issue, or was the old logic actually problematic?

I skimmed this and if I get it right this won't work reliably.
The OMPIRBuilder provides an alloca insertion point as part of the body code generation callback, that is the one to use.
The user needs to keep a stack for all alloca insertion points and push/pop them as the body callbacks are entered/left.
The same has to happen for other outlined operators, I'm imagining fir.concurrent.do or something like that.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
92	^^ allocaIP is not used but should be.

In D101307#2721716, @jdoerfert wrote:

The OMPIRBuilder provides an alloca insertion point as part of the body code generation callback, that is the one to use.

Let me clarify, the alloca insertion point given to the body generation callback should be used for further calls _inside_ the body, right? The outermost, e.g. createParallel, still needs an alloca insertion point, which should normally be the function entry block.

The user needs to keep a stack for all alloca insertion points and push/pop them as the body callbacks are entered/left.

In MLIR, this will be the actual call stack of nested calls to convertOmpParallel, but yes, it needs to carry the insertion point as extra argument.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	@kiranchandramohan have we added the support for nested `omp.parallel` yet? Where should we place allocations in the nested case, in the AllocaIP of the body of the surrounding parallel operation? I can just condition this new behavior to only work for the outermost `omp.parallel` because that's the case I am hitting right now... Should we be marking the omp.parallel operation with some of the function attributes like Function-Like, AutomaticAllocationScope? Why? It's definitely not `FunctionLike` (it isn't a symbol, there are no arguments and results, etc.). `AutomaticAllocationScope` is useless at this point, we don't have any special handling for such operations. @shabalin was trying to add more generic allocation scoping, but was blocked on discussions. Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if getParent() doesn't give a function? @kumasento, it's LLVM IR, the parent of a block is always a function.
140	It was wrong before. `bodyGenStatus` can only be modified inside `createParallel` that actually calls `bodyGenCB`, but the code was checking it _before_ and ignoring it after the modification.
mlir/test/Target/LLVMIR/openmp-llvm.mlir
158–159	We are checking that these are placed before the `icmp` which is the first instruction in the function. Should be enough, and there is no block number of the entry block and no block separation after this.

kiranchandramohan added inline comments.Apr 28 2021, 7:24 AM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not solved. https://github.com/llvm/llvm-project/blob/3d974ac9fc489ac3fec194f324be55e42d1ea4fa/mlir/test/Target/LLVMIR/openmp-llvm.mlir#L230 I can just condition this new behavior to only work for the outermost omp.parallel because that's the case I am hitting right now... This is OK with me for now. Where should we place allocations in the nested case, in the AllocaIP of the body of the surrounding parallel operation? Why? It's definitely not FunctionLike (it isn't a symbol, there are no arguments and results, etc.). AutomaticAllocationScope is useless at this point, we don't have any special handling for such operations. @shabalin was trying to add more generic allocation scoping, but was blocked on discussions. In the OpenMP dialect (and possibly others like FIR) we have to distinguish operations with regions which will be outlined (like parallel, task) and which will not (like master, single) be outlined. The operations which will be outlined have some properties similar to functions, like having an entry-block for allocas and finally existing as a function in LLVM IR. Ideally, I want a function (getAllocaBlock) which if I call will traverse up through the nested operations and get me the entry-block of the operation which will be outlined or if there is no operation like that then the function entry-block.

In D101307#2722384, @ftynse wrote:

In D101307#2721716, @jdoerfert wrote:

The OMPIRBuilder provides an alloca insertion point as part of the body code generation callback, that is the one to use.

Let me clarify, the alloca insertion point given to the body generation callback should be used for further calls _inside_ the body, right? The outermost, e.g. createParallel, still needs an alloca insertion point, which should normally be the function entry block.

The user needs to keep a stack for all alloca insertion points and push/pop them as the body callbacks are entered/left.

In MLIR, this will be the actual call stack of nested calls to convertOmpParallel, but yes, it needs to carry the insertion point as extra argument.

If I remember correctly there are two allocaIPs, outerAllocaIP which is passed to the createParallel function of the OpenMPIRBuilder and the innerAllocaIP which is the allocaIP of the outlined region which the OpenMPIRBuilder creates. The BodyGenCallback function is called with the innerAllocaIP as argument.
You are correct that for the outermost OpenMP Operation it is the function entry block that should be the allocaIP. But for any nested Operations it should be the allocaIP provided by the BodyGenCallback that should be the AllocaIP.

We alos need some changes so that there is only a single allocaBlock. I believe we need to either,

Set the block corresponding to the entry-block of the omp.parallel operation as the innerAllocaIP provided by the OpenMPIRBuilder in the bodyGenCallBack. https://github.com/llvm/llvm-project/blob/ab5823867c4aee7f3e02ddfaa217905c87471bf9/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp#L40
Reserve the entry-block of the omp.parallel operation in MLIR for allocas and pass it to the OpenMPIRBuilder in the createParallel Function. And change the OpenMPIRBuilder to work with this as the inner-alloca entry block of the outlined region.

In D101307#2722384, @ftynse wrote:

In D101307#2721716, @jdoerfert wrote:

The OMPIRBuilder provides an alloca insertion point as part of the body code generation callback, that is the one to use.

Let me clarify, the alloca insertion point given to the body generation callback should be used for further calls _inside_ the body, right? The outermost, e.g. createParallel, still needs an alloca insertion point, which should normally be the function entry block.

Right. I was expecting lowering should keep a stack, each new function pushes the entry block, each time a operator gives you a new alloca-IP you push/pop it appropriately.

The user needs to keep a stack for all alloca insertion points and push/pop them as the body callbacks are entered/left.

In MLIR, this will be the actual call stack of nested calls to convertOmpParallel, but yes, it needs to carry the insertion point as extra argument.

I assume some extra state is preferable as there might be various places, called in arbitrary nesting, that need to push/pop a alloca-IP. If you prefer an
argument passed through all of lowering instead of a "global stack", that should work too. However, alloca-IP probably will influence, and be influenced, by
non-OpenMP operators too.

I can just condition this new behavior to only work for the outermost omp.parallel because that's the case I am hitting right now...

This is OK with me for now.

Just re-iterating that special casing this only for the outermost omp.parallel is fine with me.

This revision now requires changes to proceed.May 5 2021, 6:12 AM

Handle nested scopes.

In D101307#2723445, @jdoerfert wrote:

In D101307#2722384, @ftynse wrote:

In D101307#2721716, @jdoerfert wrote:

The OMPIRBuilder provides an alloca insertion point as part of the body code generation callback, that is the one to use.

Let me clarify, the alloca insertion point given to the body generation callback should be used for further calls _inside_ the body, right? The outermost, e.g. createParallel, still needs an alloca insertion point, which should normally be the function entry block.

Right. I was expecting lowering should keep a stack, each new function pushes the entry block, each time a operator gives you a new alloca-IP you push/pop it appropriately.

The user needs to keep a stack for all alloca insertion points and push/pop them as the body callbacks are entered/left.

In MLIR, this will be the actual call stack of nested calls to convertOmpParallel, but yes, it needs to carry the insertion point as extra argument.

I assume some extra state is preferable as there might be various places, called in arbitrary nesting, that need to push/pop a alloca-IP. If you prefer an
argument passed through all of lowering instead of a "global stack", that should work too. However, alloca-IP probably will influence, and be influenced, by
non-OpenMP operators too.

Yeah, the trick is to do this in a sufficiently modular way for MLIR. That is, the main translation engine must not care about OpenMP specifically. And the rest of the dialects are not concerned with regions (FWIW, I would have required regions to be flattened and the OpenMP dialect to be translated to the LLVM dialect hadn't there been the argument of code reused with OpenMPIRBuilder) or alloca insertion points, all that should happen before translation. I defined a globally visible stack, it is currently only used by the OpenMP dialect, but if others need it, we can factor out the OpenMP stack frame class to be a common alloca insertion point stack frame class.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
130	Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if `getParent()` doesn't give a function? It's LLVM IR, getParent on a Block is always a function.
130	Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not solved. Now it is. In the OpenMP dialect (and possibly others like FIR) we have to distinguish operations with regions which will be outlined (like parallel, task) and which will not (like master, single) be outlined. The operations which will be outlined have some properties similar to functions, like having an entry-block for allocas and finally existing as a function in LLVM IR. This sounds like you want an interface for outlinable OpenMP dialect ops. That is perfectly doable, it just isn't FunctionLike but a different interface. Ideally, I want a function (getAllocaBlock) which if I call will traverse up through the nested operations and get me the entry-block of the operation which will be outlined or if there is no operation like that then the function entry-block. This can be done with the interface above. Just go up the region tree until you hit an op that is either OutlineableOpenMPOp or an LLVMFuncOp, and take the first block of the first region. For translation purposes, this isn't necessary though. We can keep the stack as @jdoerfert suggested.

In D101307#2738760, @kiranchandramohan wrote:

I can just condition this new behavior to only work for the outermost omp.parallel because that's the case I am hitting right now...

This is OK with me for now.

Just re-iterating that special casing this only for the outermost omp.parallel is fine with me.

Sorry, I was busy with other things.

Nits.

Harbormaster completed remote builds in B103187: Diff 343663.May 7 2021, 6:48 AM

LGTM. This looks great. More general than what I thought it would be.

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
194 ↗	(On Diff #343663)	Nit: spelling operations, ModuleTranslation.

This revision is now accepted and ready to land.May 9 2021, 2:22 PM

Address review.

This revision was landed with ongoing or failed builds.May 10 2021, 1:05 AM

Closed by commit rG72d013dd73f4: [mlir] OpenMP-to-LLVM: properly set outer alloca insertion point (authored by ftynse). · Explain Why

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rG72d013dd73f4: [mlir] OpenMP-to-LLVM: properly set outer alloca insertion point.

Harbormaster completed remote builds in B103442: Diff 343990.May 10 2021, 1:35 AM

Revision Contents

Path

Size

mlir/

lib/

Target/

LLVMIR/

Dialect/

OpenMP/

OpenMPToLLVMIRTranslation.cpp

16 lines

test/

Target/

LLVMIR/

openmp-llvm.mlir

7 lines

Diff 340573

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
static LogicalResult		static LogicalResult
convertOmpParallel(Operation &opInst, llvm::IRBuilderBase &builder,		convertOmpParallel(Operation &opInst, llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation) {		LLVM::ModuleTranslation &moduleTranslation) {
using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
// TODO: support error propagation in OpenMPIRBuilder and use it instead of		// TODO: support error propagation in OpenMPIRBuilder and use it instead of
// relying on captured variables.		// relying on captured variables.
LogicalResult bodyGenStatus = success();		LogicalResult bodyGenStatus = success();

auto bodyGenCB = [&](InsertPointTy allocaIP, InsertPointTy codeGenIP,		auto bodyGenCB = [&](InsertPointTy allocaIP, InsertPointTy codeGenIP,
jdoerfertUnsubmitted Not Done Reply Inline Actions ^^ allocaIP is not used but should be. jdoerfert: ^^ allocaIP is not used but should be.
llvm::BasicBlock &continuationBlock) {		llvm::BasicBlock &continuationBlock) {
// ParallelOp has only one region associated with it.		// ParallelOp has only one region associated with it.
auto &region = cast<omp::ParallelOp>(opInst).getRegion();		auto &region = cast<omp::ParallelOp>(opInst).getRegion();
convertOmpOpRegions(region, "omp.par.region", *codeGenIP.getBlock(),		convertOmpOpRegions(region, "omp.par.region", *codeGenIP.getBlock(),
continuationBlock, builder, moduleTranslation,		continuationBlock, builder, moduleTranslation,
bodyGenStatus);		bodyGenStatus);
};		};

Show All 18 Lines	convertOmpParallel(Operation &opInst, llvm::IRBuilderBase &builder,
llvm::Value *numThreads = nullptr;		llvm::Value *numThreads = nullptr;
if (auto numThreadsVar = cast<omp::ParallelOp>(opInst).num_threads_var())		if (auto numThreadsVar = cast<omp::ParallelOp>(opInst).num_threads_var())
numThreads = moduleTranslation.lookupValue(numThreadsVar);		numThreads = moduleTranslation.lookupValue(numThreadsVar);
llvm::omp::ProcBindKind pbKind = llvm::omp::OMP_PROC_BIND_default;		llvm::omp::ProcBindKind pbKind = llvm::omp::OMP_PROC_BIND_default;
if (auto bind = cast<omp::ParallelOp>(opInst).proc_bind_val())		if (auto bind = cast<omp::ParallelOp>(opInst).proc_bind_val())
pbKind = llvm::omp::getProcBindKind(bind.getValue());		pbKind = llvm::omp::getProcBindKind(bind.getValue());
// TODO: Is the Parallel construct cancellable?		// TODO: Is the Parallel construct cancellable?
bool isCancellable = false;		bool isCancellable = false;
// TODO: Determine the actual alloca insertion point, e.g., the function
// entry or the alloca insertion point as provided by the body callback		// Insert allocas at the entry block of the current function.
// above.		llvm::BasicBlock &funcEntryBlock =
llvm::OpenMPIRBuilder::InsertPointTy allocaIP(builder.saveIP());		builder.GetInsertBlock()->getParent()->getEntryBlock();
		kiranchandramohanUnsubmitted Done Reply Inline Actions I was planning to a few questions in this area. Will this getParent work correctly with nested parallelism and the if is in the nested parallel operation? Should we be marking the omp.parallel operation with some of the function attributes like Function-Like, AutomaticAllocationScope? llvm.func fn() { omp.parallel { omp.parallel if() { } } } kiranchandramohan: I was planning to a few questions in this area. Will this getParent work correctly with nested…
		kiranchandramohanUnsubmitted Done Reply Inline Actions And if the above does work correctly, consider the following. llvm.func fn() { omp.parallel { omp.master { //<- The region of this operation will not be outlined. omp.parallel if() { } } } } kiranchandramohan: And if the above does work correctly, consider the following. ``` llvm.func fn() { omp.
		ftynseAuthorUnsubmitted Done Reply Inline Actions @kiranchandramohan have we added the support for nested `omp.parallel` yet? Where should we place allocations in the nested case, in the AllocaIP of the body of the surrounding parallel operation? I can just condition this new behavior to only work for the outermost `omp.parallel` because that's the case I am hitting right now... Should we be marking the omp.parallel operation with some of the function attributes like Function-Like, AutomaticAllocationScope? Why? It's definitely not `FunctionLike` (it isn't a symbol, there are no arguments and results, etc.). `AutomaticAllocationScope` is useless at this point, we don't have any special handling for such operations. @shabalin was trying to add more generic allocation scoping, but was blocked on discussions. Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if getParent() doesn't give a function? @kumasento, it's LLVM IR, the parent of a block is always a function. ftynse: @kiranchandramohan have we added the support for nested `omp.parallel` yet? Where should we…
		kiranchandramohanUnsubmitted Done Reply Inline Actions Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not solved. https://github.com/llvm/llvm-project/blob/3d974ac9fc489ac3fec194f324be55e42d1ea4fa/mlir/test/Target/LLVMIR/openmp-llvm.mlir#L230 I can just condition this new behavior to only work for the outermost omp.parallel because that's the case I am hitting right now... This is OK with me for now. Where should we place allocations in the nested case, in the AllocaIP of the body of the surrounding parallel operation? Why? It's definitely not FunctionLike (it isn't a symbol, there are no arguments and results, etc.). AutomaticAllocationScope is useless at this point, we don't have any special handling for such operations. @shabalin was trying to add more generic allocation scoping, but was blocked on discussions. In the OpenMP dialect (and possibly others like FIR) we have to distinguish operations with regions which will be outlined (like parallel, task) and which will not (like master, single) be outlined. The operations which will be outlined have some properties similar to functions, like having an entry-block for allocas and finally existing as a function in LLVM IR. Ideally, I want a function (getAllocaBlock) which if I call will traverse up through the nested operations and get me the entry-block of the operation which will be outlined or if there is no operation like that then the function entry-block. kiranchandramohan: Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not solved.
		ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not solved. Now it is. In the OpenMP dialect (and possibly others like FIR) we have to distinguish operations with regions which will be outlined (like parallel, task) and which will not (like master, single) be outlined. The operations which will be outlined have some properties similar to functions, like having an entry-block for allocas and finally existing as a function in LLVM IR. This sounds like you want an interface for outlinable OpenMP dialect ops. That is perfectly doable, it just isn't FunctionLike but a different interface. Ideally, I want a function (getAllocaBlock) which if I call will traverse up through the nested operations and get me the entry-block of the operation which will be outlined or if there is no operation like that then the function entry-block. This can be done with the interface above. Just go up the region tree until you hit an op that is either OutlineableOpenMPOp or an LLVMFuncOp, and take the first block of the first region. For translation purposes, this isn't necessary though. We can keep the stack as @jdoerfert suggested. ftynse: > Yes, nested parallel operations work. We have tests also. But the allocaIP issue is not…
		kumasentoUnsubmitted Done Reply Inline Actions Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if `getParent()` doesn't give a function? kumasento: Not familiar with this field, but I was thinking about a case similar to (or not) above: what…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Not familiar with this field, but I was thinking about a case similar to (or not) above: what would happen if `getParent()` doesn't give a function? It's LLVM IR, getParent on a Block is always a function. ftynse: > Not familiar with this field, but I was thinking about a case similar to (or not) above: what…
if (failed(bodyGenStatus))		llvm::OpenMPIRBuilder::InsertPointTy allocaIP(
return failure();		&funcEntryBlock, funcEntryBlock.getFirstInsertionPt());

llvm::OpenMPIRBuilder::LocationDescription ompLoc(		llvm::OpenMPIRBuilder::LocationDescription ompLoc(
builder.saveIP(), builder.getCurrentDebugLocation());		builder.saveIP(), builder.getCurrentDebugLocation());
builder.restoreIP(moduleTranslation.getOpenMPBuilder()->createParallel(		builder.restoreIP(moduleTranslation.getOpenMPBuilder()->createParallel(
ompLoc, allocaIP, bodyGenCB, privCB, finiCB, ifCond, numThreads, pbKind,		ompLoc, allocaIP, bodyGenCB, privCB, finiCB, ifCond, numThreads, pbKind,
isCancellable));		isCancellable));
return success();
		return bodyGenStatus;
		kumasentoUnsubmitted Done Reply Inline Actions It seems to me that if `bodyGenStatus` was failed earlier than `restoreIP`, the old logic will just abort but this new version will still do the restore. Would this cause any issue, or was the old logic actually problematic? kumasento: It seems to me that if `bodyGenStatus` was failed earlier than `restoreIP`, the old logic will…
		ftynseAuthorUnsubmitted Done Reply Inline Actions It was wrong before. `bodyGenStatus` can only be modified inside `createParallel` that actually calls `bodyGenCB`, but the code was checking it _before_ and ignoring it after the modification. ftynse: It was wrong before. `bodyGenStatus` can only be modified inside `createParallel` that actually…
}		}

/// Converts an OpenMP 'master' operation into LLVM IR using OpenMPIRBuilder.		/// Converts an OpenMP 'master' operation into LLVM IR using OpenMPIRBuilder.
static LogicalResult		static LogicalResult
convertOmpMaster(Operation &opInst, llvm::IRBuilderBase &builder,		convertOmpMaster(Operation &opInst, llvm::IRBuilderBase &builder,
LLVM::ModuleTranslation &moduleTranslation) {		LLVM::ModuleTranslation &moduleTranslation) {
using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
// TODO: support error propagation in OpenMPIRBuilder and use it instead of		// TODO: support error propagation in OpenMPIRBuilder and use it instead of
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

mlir/test/Target/LLVMIR/openmp-llvm.mlir

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	// CHECK: define internal void @[[OMP_OUTLINED_FN_NUM_THREADS_3_2]]
// CHECK: call void @__kmpc_barrier		// CHECK: call void @__kmpc_barrier

// CHECK: define internal void @[[OMP_OUTLINED_FN_NUM_THREADS_3_1]]		// CHECK: define internal void @[[OMP_OUTLINED_FN_NUM_THREADS_3_1]]
// CHECK: call void @__kmpc_barrier		// CHECK: call void @__kmpc_barrier

// CHECK: define void @test_omp_parallel_if_1(i32 %[[IF_VAR_1:.*]])		// CHECK: define void @test_omp_parallel_if_1(i32 %[[IF_VAR_1:.*]])
llvm.func @test_omp_parallel_if_1(%arg0: i32) -> () {		llvm.func @test_omp_parallel_if_1(%arg0: i32) -> () {

		// Check that the allocas are emitted by the OpenMPIRBuilder at the top of the
		// funciton, before the condition. Allocas are only emitted by the builder when
		kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: funciton -> function kiranchandramohan: Nit: funciton -> function
		// the `if` clause is present. We match specific SSA value names since LLVM
		// actually produces those names.
		// CHECK: %tid.addr{{.*}} = alloca i32
		// CHECK: %zero.addr{{.*}} = alloca i32
		kiranchandramohanUnsubmitted Done Reply Inline Actions Is this enough to check that it is in the entry? Or should there be a check for a subsequent block name or number? kiranchandramohan: Is this enough to check that it is in the entry? Or should there be a check for a subsequent…
		ftynseAuthorUnsubmitted Done Reply Inline Actions We are checking that these are placed before the `icmp` which is the first instruction in the function. Should be enough, and there is no block number of the entry block and no block separation after this. ftynse: We are checking that these are placed before the `icmp` which is the first instruction in the…

// CHECK: %[[IF_COND_VAR_1:.*]] = icmp slt i32 %[[IF_VAR_1]], 0		// CHECK: %[[IF_COND_VAR_1:.*]] = icmp slt i32 %[[IF_VAR_1]], 0
%0 = llvm.mlir.constant(0 : index) : i32		%0 = llvm.mlir.constant(0 : index) : i32
%1 = llvm.icmp "slt" %arg0, %0 : i32		%1 = llvm.icmp "slt" %arg0, %0 : i32

// CHECK: %[[GTN_IF_1:.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[SI_VAR_IF_1:.*]])		// CHECK: %[[GTN_IF_1:.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[SI_VAR_IF_1:.*]])
// CHECK: br i1 %[[IF_COND_VAR_1]], label %[[IF_COND_TRUE_BLOCK_1:.]], label %[[IF_COND_FALSE_BLOCK_1:.]]		// CHECK: br i1 %[[IF_COND_VAR_1]], label %[[IF_COND_TRUE_BLOCK_1:.]], label %[[IF_COND_FALSE_BLOCK_1:.]]
// CHECK: [[IF_COND_TRUE_BLOCK_1]]:		// CHECK: [[IF_COND_TRUE_BLOCK_1]]:
// CHECK: br label %[[OUTLINED_CALL_IF_BLOCK_1:.*]]		// CHECK: br label %[[OUTLINED_CALL_IF_BLOCK_1:.*]]
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines