This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1/4
CGOpenMPRuntimeNVPTX.cpp
-
test/OpenMP/
-
OpenMP/
-
nvptx_data_sharing.cpp
-
nvptx_parallel_codegen.cpp
-
nvptx_target_codegen.cpp
-
nvptx_target_teams_codegen.cpp
-
nvptx_target_teams_distribute_codegen.cpp
-
llvm/include/llvm/Frontend/OpenMP/
-
include/
-
llvm/
-
Frontend/
-
OpenMP/
-
OMPKinds.def
-
openmp/libomptarget/deviceRTLs/
-
libomptarget/
-
deviceRTLs/
-
common/src/
-
src/
-
parallel.cu
-
interface.h

Differential D83268

[OpenMP][NFC] Remove unused (always fixed) arguments
ClosedPublic

Authored by jdoerfert on Jul 6 2020, 6:05 PM.

Download Raw Diff

Details

Reviewers

jhuber6
fghanim
JonChesterfield
grokos
AndreyChurbanov
ye-luo
tianshilei1992
ggeorgakoudis

Commits

rGc98699582a63: [OpenMP][NFC] Remove unused (always fixed) arguments

Summary

There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Jul 6 2020, 6:05 PM

Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 6 2020, 6:05 PM

Herald added subscribers: llvm-commits, cfe-commits, sstefan1 and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B63106: Diff 275871.Jul 6 2020, 7:02 PM

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

In D83268#2135081, @Hahnfeld wrote:

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

In D83268#2135081, @Hahnfeld wrote:

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why:

Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically.
Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions.

In D83268#2135655, @ABataev wrote:

In D83268#2135081, @Hahnfeld wrote:

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

Same response as above. What is the use case here which we want to continue to support?

In D83268#2135724, @jdoerfert wrote:

In D83268#2135081, @Hahnfeld wrote:

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

This is the device RTL. I am not aware we (want to) keep the API stable. If we are, I'm not sure why:

Dynamic linking (among other things) is not really an option so people that link against the device runtime (should) do so statically.

Linking against an old device runtime with a new clang seems unreasonable to me. If you replace clang you must replace the static runtime as the new clang might use new functions.

In D83268#2135655, @ABataev wrote:

In D83268#2135081, @Hahnfeld wrote:

This is definitely not NFC and breaks API compatibility (but apparently nobody cares anymore?).

+1. Better to introduce new entry points and mark these ones as deprecated.

Same response as above. What is the use case here which we want to continue to support?

Use of the new library with the previous version of the compiler.

ABataev added inline comments.Jul 7 2020, 5:56 AM

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42	I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled.

I'm not sure we have a consensus on api stability. Usually llvm allows mixing libraries and compilers from different sources, e.g. the various libunwind or compiler-rt vs libgcc. Libomptarget in general appears to be considered fixed and has external users (intel, maybe gcc).

The device runtime would be ill served by this default. This is the only openmp device runtime library which works with llvm. It's statically linked, usually as bitcode when performance is important. The code used to handle target offloading for nvptx is spread across the compiler and the runtime, probably not optimally.

I'm not familiar with the gcc-nvptx-openmp implementation. Reading through https://gcc.gnu.org/wiki/Offloading strongly suggests a totally independent implementation to this one. I don't think gcc can be using this runtime library for nvptx. It definitely doesn't for amdgcn. Proprietary compilers might be using this code, but we can have no duty of care to toolchains that use this code without telling us they're doing so.

Therefore the only backwards/forwards compatibility we can strive for is between different versions of clang and the device runtime. That seems potentially useful - one could use a release clang binary while working on the deviceRTL or vice versa. It's an expensive developer convenience though.

We would pay with things like the API rot fixed above. Introducing a faster lowering for an openmp construct would mean a redundant path through clang and some version checking to guess which runtime library we're targeting, which is not presently versioned. Likewise moving code between compiler and runtime becomes much more expensive to implement. Getting it wrong is inevitable given our test coverage.

I think the project is much better served by assuming that the runtime library used by clang is the one from the same hash in the monorepo. That leaves us free to fix debt and improve performance, at the price of needing to build clang from (near to) trunk while developing the rtl.

Perhaps we can embrace API stability later on, once we have things like versioning and a solid optimisation pipeline in place, especially if gcc wants to use the deviceRTL for nvptx. Now is too early.

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42	If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls. Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised.

This revision is now accepted and ready to land.Jul 7 2020, 6:25 AM

In D83268#2135930, @JonChesterfield wrote:

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42	No, I don't think you can do this in all cases

I don't think gcc can be using this runtime library for nvptx.

Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon).

Use of the new library with the previous version of the compiler.

Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
42	We can detect all we want but switching it does not have any effect.

In D83268#2135938, @ABataev wrote:

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

Yes, the existing v# naming and deprecated comments should also go.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

In D83268#2135955, @jdoerfert wrote:

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

In D83268#2135954, @JonChesterfield wrote:

In D83268#2135938, @ABataev wrote:

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

No, I'm not. Long before that, we relied on the API stability and already have some runtime calls marked as deprecated. Especially taking into account, that libomp can be built separately.

Yes, the existing v# naming and deprecated comments should also go.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Answered already: the previous version of the compiler with the new version of the runtime.

In D83268#2135989, @ABataev wrote:

In D83268#2135954, @JonChesterfield wrote:

What can libomp be built by separately? Nvcc and gcc don't use this runtime. That leaves us with downstream proprietary compilers derived from clang that are already stuck carrying extensive compatibility patches and usually ship as one large toolchain blob which only needs to be internally self consistent.

Answered already: the previous version of the compiler with the new version of the runtime.

Still cannot be expected to work: https://reviews.llvm.org/D83268#2135951
Are there other use cases?

In D83268#2135988, @ABataev wrote:

In D83268#2135955, @jdoerfert wrote:

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

No they are not.

In D83268#2136021, @jdoerfert wrote:

In D83268#2135988, @ABataev wrote:

In D83268#2135955, @jdoerfert wrote:

Especially taking into account, that libomp can be built separately.

This is *not* affecting libomp in any way.

libomptarget and device runtimes are part of libomp. If you're going to remove some params, you'll need to modify the runtime functions too, no?

No they are not.

llvm-project/openmp/libomptarget

In D83268#2135951, @jdoerfert wrote:

I don't think gcc can be using this runtime library for nvptx.

Yes, and: We are (going to) use clang specific intrinsics to avoid CUDA (soon).

Use of the new library with the previous version of the compiler.

Except that you cannot generally expect this to work. In our supported use case the library is kept as bitcode (LLVM-IR). Bitcode is not backward compatible. An old toolchain (clang, llvm-link, ...) cannot be fed new IR and be expected to work. So, we are already not able to give a stability guarantee here, why pretend we do. The bitcode runtime has to be kept in-sync with the toolchain.

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level? Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

In D83268#2136031, @ABataev wrote:

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level?

That is the point. They might or might not be, right? There is no guarantee they are.

Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

That mode is deprecated.

In D83268#2136029, @ABataev wrote:

llvm-project/openmp/libomptarget

Please use more words.

In D83268#2136054, @jdoerfert wrote:

In D83268#2136029, @ABataev wrote:

llvm-project/openmp/libomptarget

Please use more words.

libomptarget is part of libomp

In D83268#2136049, @jdoerfert wrote:

In D83268#2136031, @ABataev wrote:

There is still compatibility between clang10 and clang11. Or they are incompatible in LLVM IR level?

That is the point. They might or might not be, right? There is no guarantee they are.

Better to ask the users. Maybe, send an RFC to openmp-devs?

Also, there was a mode (I don't remember if it was removed or not) where the runtime library could be linked as .a library, without LLVM IR inlining.

That mode is deprecated.

In D83268#2136055, @ABataev wrote:

In D83268#2136054, @jdoerfert wrote:

In D83268#2136029, @ABataev wrote:

llvm-project/openmp/libomptarget

Please use more words.

libomptarget is part of libomp

As mentioned before, no it is not.

Despite the similarity in name, libomp and libomptarget are distinct libraries, this was a conscious design choice.
FWIW, this patch does *not* modify libomptarget either. This modifies *the device runtime*, aka. libomptarget-nvptx-sm_XXX.bc.

In D83268#2135930, @JonChesterfield wrote:

Aside from the API stability concern this looks uncontentious. Removes dead arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current interface is not worth the development cost?

I'm neither, and I've long argued that being able to build the OpenMP runtime(s) without Clang trunk is an important use case. These arguments have gone largely unheard, so I'll not join this discussion once more.

In D83268#2136060, @ABataev wrote:

Better to ask the users. Maybe, send an RFC to openmp-devs?

Sure: http://lists.llvm.org/pipermail/openmp-dev/2020-July/003531.html

__kmpc_spmd_kernel_init is always called with RequiresDataSharing == 0
Specifically, it's only called from clang, and emitSPMDEntryHeader unconditionally passes zero to it

I.e. I think there's more stuff that can be cleaned up in the theme of the above, suggest in later patches

Closed by commit rGc98699582a63: [OpenMP][NFC] Remove unused (always fixed) arguments (authored by jdoerfert). · Explain WhyJul 10 2020, 10:53 PM

This revision was automatically updated to reflect the committed changes.

This patch breaks compilation of previously working code.

I added the following to openmp/libomptarget/test/offloading/offloading_success.c:

+// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda

which results in

# command stderr:
ptxas offloading_success-openmp-nvptx64-nvidia-cuda.s, line 173; error   : Call has wrong number of parameters
ptxas fatal   : Ptx assembly aborted due to errors
clang-12: error: ptxas command failed with exit code 255 (use -v to see invocation)

The file offloading_success-openmp-nvptx64-nvidia-cuda.s contains:

.func  (.param .b32 func_retval0) __kmpc_kernel_parallel
(
        .param .b64 __kmpc_kernel_parallel_param_0,
        .param .b32 __kmpc_kernel_parallel_param_1
)
;

The same file then calls the function with one argument:

call.uni (retval0), 
__kmpc_kernel_parallel, 
(
param0
);

For the clang 11 release, we should either fix the codegen or revert this patch.

I carefully made sure, that the freshly built clang was used to execute the test. I opened https://bugs.llvm.org/show_bug.cgi?id=46836 to track the issue and made it release blocker.

This revision is now accepted and ready to land.Jul 24 2020, 10:18 AM

In D83268#2172748, @protze.joachim wrote:

I carefully made sure, that the freshly built clang was used to execute the test. I opened https://bugs.llvm.org/show_bug.cgi?id=46836 to track the issue and made it release blocker.

The question is if you picked up a fresly build device runtime as well.

jdoerfert closed this revision.Jul 27 2020, 5:48 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntimeNVPTX.cpp

20 lines

test/

OpenMP/

nvptx_data_sharing.cpp

4 lines

nvptx_parallel_codegen.cpp

10 lines

nvptx_target_codegen.cpp

2 lines

nvptx_target_teams_codegen.cpp

4 lines

nvptx_target_teams_distribute_codegen.cpp

2 lines

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPKinds.def

5 lines

openmp/

libomptarget/

deviceRTLs/

common/

src/

parallel.cu

9 lines

interface.h

6 lines

Diff 277218

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Show All 32 Lines	enum OpenMPRTLFunctionNVPTX {
/// Call to void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized);		/// Call to void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized);
OMPRTL_NVPTX__kmpc_kernel_deinit,		OMPRTL_NVPTX__kmpc_kernel_deinit,
/// Call to void __kmpc_spmd_kernel_init(kmp_int32 thread_limit,		/// Call to void __kmpc_spmd_kernel_init(kmp_int32 thread_limit,
/// int16_t RequiresOMPRuntime, int16_t RequiresDataSharing);		/// int16_t RequiresOMPRuntime, int16_t RequiresDataSharing);
OMPRTL_NVPTX__kmpc_spmd_kernel_init,		OMPRTL_NVPTX__kmpc_spmd_kernel_init,
/// Call to void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime);		/// Call to void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime);
OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2,		OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2,
/// Call to void __kmpc_kernel_prepare_parallel(void		/// Call to void __kmpc_kernel_prepare_parallel(void
/// *outlined_function, int16_t		/// *outlined_function);
/// IsOMPRuntimeInitialized);
ABataevUnsubmitted Not Done Reply Inline Actions I think, instead the optimizer can try to detect if the runtime library is used by the kernel and switch this flag to `0` if no runtime calls are used in the kernel. For non-SPMD mode in most cases, the runtime is required, but in some cases, it can be disabled. ABataev: I think, instead the optimizer can try to detect if the runtime library is used by the kernel…
JonChesterfieldUnsubmitted Not Done Reply Inline Actions If we can detect that no runtime calls are used, we should be able to do better than passing a different argument. E.g. delete some setup calls. Failing that, if we want to pass an argument which says 'actually don't do any work', it shouldn't be the same argument used to check whether the runtime has been initialised. JonChesterfield: If we can detect that no runtime calls are used, we should be able to do better than passing a…
ABataevUnsubmitted Not Done Reply Inline Actions No, I don't think you can do this in all cases ABataev: No, I don't think you can do this in all cases
jdoerfertAuthorUnsubmitted Done Reply Inline Actions We can detect all we want but switching it does not have any effect. jdoerfert: We can detect all we want but switching it does not have any effect.
OMPRTL_NVPTX__kmpc_kernel_prepare_parallel,		OMPRTL_NVPTX__kmpc_kernel_prepare_parallel,
/// Call to bool __kmpc_kernel_parallel(void **outlined_function,		/// Call to bool __kmpc_kernel_parallel(void **outlined_function);
/// int16_t IsOMPRuntimeInitialized);
OMPRTL_NVPTX__kmpc_kernel_parallel,		OMPRTL_NVPTX__kmpc_kernel_parallel,
/// Call to void __kmpc_kernel_end_parallel();		/// Call to void __kmpc_kernel_end_parallel();
OMPRTL_NVPTX__kmpc_kernel_end_parallel,		OMPRTL_NVPTX__kmpc_kernel_end_parallel,
/// Call to void __kmpc_serialized_parallel(ident_t *loc, kmp_int32		/// Call to void __kmpc_serialized_parallel(ident_t *loc, kmp_int32
/// global_tid);		/// global_tid);
OMPRTL_NVPTX__kmpc_serialized_parallel,		OMPRTL_NVPTX__kmpc_serialized_parallel,
/// Call to void __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32		/// Call to void __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32
/// global_tid);		/// global_tid);
▲ Show 20 Lines • Show All 1,407 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeNVPTX::emitWorkerLoop(CodeGenFunction &CGF,
Address WorkFn =		Address WorkFn =
CGF.CreateDefaultAlignTempAlloca(CGF.Int8PtrTy, /Name=/"work_fn");		CGF.CreateDefaultAlignTempAlloca(CGF.Int8PtrTy, /Name=/"work_fn");
Address ExecStatus =		Address ExecStatus =
CGF.CreateDefaultAlignTempAlloca(CGF.Int8Ty, /Name=/"exec_status");		CGF.CreateDefaultAlignTempAlloca(CGF.Int8Ty, /Name=/"exec_status");
CGF.InitTempAlloca(ExecStatus, Bld.getInt8(/C=/0));		CGF.InitTempAlloca(ExecStatus, Bld.getInt8(/C=/0));
CGF.InitTempAlloca(WorkFn, llvm::Constant::getNullValue(CGF.Int8PtrTy));		CGF.InitTempAlloca(WorkFn, llvm::Constant::getNullValue(CGF.Int8PtrTy));

// TODO: Optimize runtime initialization and pass in correct value.		// TODO: Optimize runtime initialization and pass in correct value.
llvm::Value *Args[] = {WorkFn.getPointer(),		llvm::Value *Args[] = {WorkFn.getPointer()};
/RequiresOMPRuntime=/Bld.getInt16(1)};
llvm::Value *Ret = CGF.EmitRuntimeCall(		llvm::Value *Ret = CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_parallel), Args);		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_parallel), Args);
Bld.CreateStore(Bld.CreateZExt(Ret, CGF.Int8Ty), ExecStatus);		Bld.CreateStore(Bld.CreateZExt(Ret, CGF.Int8Ty), ExecStatus);

// On termination condition (workid == 0), exit loop.		// On termination condition (workid == 0), exit loop.
llvm::Value *WorkID = Bld.CreateLoad(WorkFn);		llvm::Value *WorkID = Bld.CreateLoad(WorkFn);
llvm::Value *ShouldTerminate = Bld.CreateIsNull(WorkID, "should_terminate");		llvm::Value *ShouldTerminate = Bld.CreateIsNull(WorkID, "should_terminate");
Bld.CreateCondBr(ShouldTerminate, ExitBB, SelectWorkersBB);		Bld.CreateCondBr(ShouldTerminate, ExitBB, SelectWorkersBB);
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	case OMPRTL_NVPTX__kmpc_spmd_kernel_deinit_v2: {
llvm::Type *TypeParams[] = {CGM.Int16Ty};		llvm::Type *TypeParams[] = {CGM.Int16Ty};
auto *FnTy =		auto *FnTy =
llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg/ false);		llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg/ false);
RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_spmd_kernel_deinit_v2");		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_spmd_kernel_deinit_v2");
break;		break;
}		}
case OMPRTL_NVPTX__kmpc_kernel_prepare_parallel: {		case OMPRTL_NVPTX__kmpc_kernel_prepare_parallel: {
/// Build void __kmpc_kernel_prepare_parallel(		/// Build void __kmpc_kernel_prepare_parallel(
/// void *outlined_function, int16_t IsOMPRuntimeInitialized);		/// void *outlined_function);
llvm::Type *TypeParams[] = {CGM.Int8PtrTy, CGM.Int16Ty};		llvm::Type *TypeParams[] = {CGM.Int8PtrTy};
auto *FnTy =		auto *FnTy =
llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg/ false);		llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg/ false);
RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_prepare_parallel");		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_prepare_parallel");
break;		break;
}		}
case OMPRTL_NVPTX__kmpc_kernel_parallel: {		case OMPRTL_NVPTX__kmpc_kernel_parallel: {
/// Build bool __kmpc_kernel_parallel(void **outlined_function,		/// Build bool __kmpc_kernel_parallel(void **outlined_function);
/// int16_t IsOMPRuntimeInitialized);		llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy};
llvm::Type *TypeParams[] = {CGM.Int8PtrPtrTy, CGM.Int16Ty};
llvm::Type *RetTy = CGM.getTypes().ConvertType(CGM.getContext().BoolTy);		llvm::Type *RetTy = CGM.getTypes().ConvertType(CGM.getContext().BoolTy);
auto *FnTy =		auto *FnTy =
llvm::FunctionType::get(RetTy, TypeParams, /isVarArg/ false);		llvm::FunctionType::get(RetTy, TypeParams, /isVarArg/ false);
RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_parallel");		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_kernel_parallel");
break;		break;
}		}
case OMPRTL_NVPTX__kmpc_kernel_end_parallel: {		case OMPRTL_NVPTX__kmpc_kernel_end_parallel: {
/// Build void __kmpc_kernel_end_parallel();		/// Build void __kmpc_kernel_end_parallel();
▲ Show 20 Lines • Show All 947 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeNVPTX::emitNonSPMDParallelCall(
auto &&L0ParallelGen = [this, CapturedVars, Fn](CodeGenFunction &CGF,		auto &&L0ParallelGen = [this, CapturedVars, Fn](CodeGenFunction &CGF,
PrePostActionTy &Action) {		PrePostActionTy &Action) {
CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
llvm::Function *WFn = WrapperFunctionsMap[Fn];		llvm::Function *WFn = WrapperFunctionsMap[Fn];
assert(WFn && "Wrapper function does not exist!");		assert(WFn && "Wrapper function does not exist!");
llvm::Value *ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy);		llvm::Value *ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy);

// Prepare for parallel region. Indicate the outlined function.		// Prepare for parallel region. Indicate the outlined function.
llvm::Value Args[] = {ID, /RequiresOMPRuntime=*/Bld.getInt16(1)};		llvm::Value *Args[] = {ID};
CGF.EmitRuntimeCall(		CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_prepare_parallel),		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_kernel_prepare_parallel),
Args);		Args);

// Create a private scope that will globalize the arguments		// Create a private scope that will globalize the arguments
// passed from the outside of the target region.		// passed from the outside of the target region.
CodeGenFunction::OMPPrivateScope PrivateArgScope(CGF);		CodeGenFunction::OMPPrivateScope PrivateArgScope(CGF);

▲ Show 20 Lines • Show All 2,671 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_data_sharing.cpp

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8), i64 [[SIZE]], i16 [[SHARED_MEM_FLAG]], i8* addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))			// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8), i64 [[SIZE]], i16 [[SHARED_MEM_FLAG]], i8* addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
	// SEQ: [[KERNEL_RD:%.+]] = load i8, i8 addrspace(3)* [[KERNEL_PTR]],			// SEQ: [[KERNEL_RD:%.+]] = load i8, i8 addrspace(3)* [[KERNEL_PTR]],
	// SEQ: [[GLOBALSTACK:%.+]] = getelementptr inbounds i8, i8* [[KERNEL_RD]], i64 0			// SEQ: [[GLOBALSTACK:%.+]] = getelementptr inbounds i8, i8* [[KERNEL_RD]], i64 0
	// PAR: [[GLOBALSTACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 8, i16 1)			// PAR: [[GLOBALSTACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 8, i16 1)
	// CK1: [[GLOBALSTACK2:%.+]] = bitcast i8* [[GLOBALSTACK]] to %struct._globalized_locals_ty*			// CK1: [[GLOBALSTACK2:%.+]] = bitcast i8* [[GLOBALSTACK]] to %struct._globalized_locals_ty*
	// CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0			// CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0
	// CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1			// CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1
	// CK1: store i32 10, i32* [[A]]			// CK1: store i32 10, i32* [[A]]
	// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1)			// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})
	// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS1]], i64 1)			// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS1]], i64 1)
	// CK1: [[SHARGSTMP1:%.+]] = load i8, i8* [[SHAREDARGS1]]			// CK1: [[SHARGSTMP1:%.+]] = load i8, i8* [[SHAREDARGS1]]
	// CK1: [[SHARGSTMP2:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP1]], i64 0			// CK1: [[SHARGSTMP2:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP1]], i64 0
	// CK1: [[SHAREDVAR:%.+]] = bitcast i32* [[A]] to i8*			// CK1: [[SHAREDVAR:%.+]] = bitcast i32* [[A]] to i8*
	// CK1: store i8* [[SHAREDVAR]], i8** [[SHARGSTMP2]]			// CK1: store i8* [[SHAREDVAR]], i8** [[SHARGSTMP2]]
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_end_sharing_variables()			// CK1: call void @__kmpc_end_sharing_variables()
	// CK1: store i32 100, i32* [[B]]			// CK1: store i32 100, i32* [[B]]
	// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}}, i16 1)			// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})
	// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS2]], i64 2)			// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS2]], i64 2)
	// CK1: [[SHARGSTMP3:%.+]] = load i8, i8* [[SHAREDARGS2]]			// CK1: [[SHARGSTMP3:%.+]] = load i8, i8* [[SHAREDARGS2]]
	// CK1: [[SHARGSTMP4:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 0			// CK1: [[SHARGSTMP4:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 0
	// CK1: [[SHAREDVAR1:%.+]] = bitcast i32* [[B]] to i8*			// CK1: [[SHAREDVAR1:%.+]] = bitcast i32* [[B]] to i8*
	// CK1: store i8* [[SHAREDVAR1]], i8** [[SHARGSTMP4]]			// CK1: store i8* [[SHAREDVAR1]], i8** [[SHARGSTMP4]]
	// CK1: [[SHARGSTMP12:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 1			// CK1: [[SHARGSTMP12:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 1
	// CK1: [[SHAREDVAR2:%.+]] = bitcast i32* [[A]] to i8*			// CK1: [[SHAREDVAR2:%.+]] = bitcast i32* [[A]] to i8*
	// CK1: store i8* [[SHAREDVAR2]], i8** [[SHARGSTMP12]]			// CK1: store i8* [[SHAREDVAR2]], i8** [[SHARGSTMP12]]
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_parallel_codegen.cpp

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,			// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,
	// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,			// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,
	// CHECK: store i8* null, i8** [[OMP_WORK_FN]],			// CHECK: store i8* null, i8** [[OMP_WORK_FN]],
	// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],			// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],
	// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]			// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]
	//			//
	// CHECK: [[AWAIT_WORK]]			// CHECK: [[AWAIT_WORK]]
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) #[[#CONVERGENT:]]			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0) #[[#CONVERGENT:]]
	// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]]			// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]])
	// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8			// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8
	// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1			// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1
	// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],			// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],
	// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null			// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null
	// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]			// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]
	//			//
	// CHECK: [[SEL_WORKERS]]			// CHECK: [[SEL_WORKERS]]
	// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]			// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	// CHECK: [[IS_MASTER:%.+]] = icmp eq i32 [[CMTID]],			// CHECK: [[IS_MASTER:%.+]] = icmp eq i32 [[CMTID]],
	// CHECK: br i1 [[IS_MASTER]], label {{%?}}[[MASTER:.+]], label {{%?}}[[EXIT]]			// CHECK: br i1 [[IS_MASTER]], label {{%?}}[[MASTER:.+]], label {{%?}}[[EXIT]]
	//			//
	// CHECK: [[MASTER]]			// CHECK: [[MASTER]]
	// CHECK-DAG: [[MNTH:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()			// CHECK-DAG: [[MNTH:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
	// CHECK-DAG: [[MWS:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.warpsize()			// CHECK-DAG: [[MWS:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.warpsize()
	// CHECK: [[MTMP1:%.+]] = sub nuw i32 [[MNTH]], [[MWS]]			// CHECK: [[MTMP1:%.+]] = sub nuw i32 [[MNTH]], [[MWS]]
	// CHECK: call void @__kmpc_kernel_init(i32 [[MTMP1]]			// CHECK: call void @__kmpc_kernel_init(i32 [[MTMP1]]
	// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN1]]_wrapper to i8*),			// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN1]]_wrapper to i8*))
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_serialized_parallel(			// CHECK: call void @__kmpc_serialized_parallel(
	// CHECK: {{call\|invoke}} void [[PARALLEL_FN3:@.+]](			// CHECK: {{call\|invoke}} void [[PARALLEL_FN3:@.+]](
	// CHECK: call void @__kmpc_end_serialized_parallel(			// CHECK: call void @__kmpc_end_serialized_parallel(
	// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN2]]_wrapper to i8*),			// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN2]]_wrapper to i8*))
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK-64-DAG: load i32, i32* [[REF_A]]			// CHECK-64-DAG: load i32, i32* [[REF_A]]
	// CHECK-32-DAG: load i32, i32* [[LOCAL_A]]			// CHECK-32-DAG: load i32, i32* [[LOCAL_A]]
	// CHECK: br label {{%?}}[[TERMINATE:.+]]			// CHECK: br label {{%?}}[[TERMINATE:.+]]
	//			//
	// CHECK: [[TERMINATE]]			// CHECK: [[TERMINATE]]
	// CHECK: call void @__kmpc_kernel_deinit(			// CHECK: call void @__kmpc_kernel_deinit(
	Show All 22 Lines
	// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,			// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,
	// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,			// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,
	// CHECK: store i8* null, i8** [[OMP_WORK_FN]],			// CHECK: store i8* null, i8** [[OMP_WORK_FN]],
	// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],			// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],
	// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]			// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]
	//			//
	// CHECK: [[AWAIT_WORK]]			// CHECK: [[AWAIT_WORK]]
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]],			// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]])
	// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8			// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8
	// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1			// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1
	// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],			// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],
	// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null			// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null
	// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]			// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]
	//			//
	// CHECK: [[SEL_WORKERS]]			// CHECK: [[SEL_WORKERS]]
	// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]			// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	// CHECK: [[MTMP1:%.+]] = sub nuw i32 [[MNTH]], [[MWS]]			// CHECK: [[MTMP1:%.+]] = sub nuw i32 [[MNTH]], [[MWS]]
	// CHECK: call void @__kmpc_kernel_init(i32 [[MTMP1]]			// CHECK: call void @__kmpc_kernel_init(i32 [[MTMP1]]
	// CHECK-64: [[N:%.+]] = load i32, i32* [[REF_N]],			// CHECK-64: [[N:%.+]] = load i32, i32* [[REF_N]],
	// CHECK-32: [[N:%.+]] = load i32, i32* [[LOCAL_N]],			// CHECK-32: [[N:%.+]] = load i32, i32* [[LOCAL_N]],
	// CHECK: [[CMP:%.+]] = icmp sgt i32 [[N]], 1000			// CHECK: [[CMP:%.+]] = icmp sgt i32 [[N]], 1000
	// CHECK: br i1 [[CMP]], label {{%?}}[[IF_THEN:.+]], label {{%?}}[[IF_ELSE:.+]]			// CHECK: br i1 [[CMP]], label {{%?}}[[IF_THEN:.+]], label {{%?}}[[IF_ELSE:.+]]
	//			//
	// CHECK: [[IF_THEN]]			// CHECK: [[IF_THEN]]
	// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN4]]_wrapper to i8*),			// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* [[PARALLEL_FN4]]_wrapper to i8*))
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: br label {{%?}}[[IF_END:.+]]			// CHECK: br label {{%?}}[[IF_END:.+]]
	//			//
	// CHECK: [[IF_ELSE]]			// CHECK: [[IF_ELSE]]
	// CHECK: call void @__kmpc_serialized_parallel(			// CHECK: call void @__kmpc_serialized_parallel(
	// CHECK: {{call\|invoke}} void [[PARALLEL_FN4]](			// CHECK: {{call\|invoke}} void [[PARALLEL_FN4]](
	// CHECK: call void @__kmpc_end_serialized_parallel(			// CHECK: call void @__kmpc_end_serialized_parallel(
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_target_codegen.cpp

	Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines
	// CHECK: icmp ne i16 [[RES]], 0			// CHECK: icmp ne i16 [[RES]], 0
	// CHECK: br i1			// CHECK: br i1

	// CHECK: call void @__kmpc_serialized_parallel(%struct.ident_t* [[UNKNOWN]], i32 [[GTID]])			// CHECK: call void @__kmpc_serialized_parallel(%struct.ident_t* [[UNKNOWN]], i32 [[GTID]])
	// CHECK: call void [[OUTLINED:@.+]](i32* [[ZERO_ADDR]], i32* [[BND_ZERO_ADDR]], i32* [[F_PTR]], double* %{{.+}})			// CHECK: call void [[OUTLINED:@.+]](i32* [[ZERO_ADDR]], i32* [[BND_ZERO_ADDR]], i32* [[F_PTR]], double* %{{.+}})
	// CHECK: call void @__kmpc_end_serialized_parallel(%struct.ident_t* [[UNKNOWN]], i32 [[GTID]])			// CHECK: call void @__kmpc_end_serialized_parallel(%struct.ident_t* [[UNKNOWN]], i32 [[GTID]])
	// CHECK: br label			// CHECK: br label

	// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*), i16 1)			// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*))
	// CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_PTR:%.+]], i{{64\|32}} 2)			// CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_PTR:%.+]], i{{64\|32}} 2)
	// CHECK: [[SHARED:%.+]] = load i8, i8* [[SHARED_PTR]],			// CHECK: [[SHARED:%.+]] = load i8, i8* [[SHARED_PTR]],
	// CHECK: [[REF:%.+]] = getelementptr inbounds i8, i8* [[SHARED]], i{{64\|32}} 0			// CHECK: [[REF:%.+]] = getelementptr inbounds i8, i8* [[SHARED]], i{{64\|32}} 0
	// CHECK: [[F_REF:%.+]] = bitcast i32* [[F_PTR]] to i8*			// CHECK: [[F_REF:%.+]] = bitcast i32* [[F_PTR]] to i8*
	// CHECK: store i8* [[F_REF]], i8** [[REF]],			// CHECK: store i8* [[F_REF]], i8** [[REF]],
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CHECK: call void @__kmpc_end_sharing_variables()			// CHECK: call void @__kmpc_end_sharing_variables()
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_target_teams_codegen.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	int bar(int n){
// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,		// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,
// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,		// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,
// CHECK: store i8* null, i8** [[OMP_WORK_FN]],		// CHECK: store i8* null, i8** [[OMP_WORK_FN]],
// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],		// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],
// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]		// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]
//		//
// CHECK: [[AWAIT_WORK]]		// CHECK: [[AWAIT_WORK]]
// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)		// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]], i16 1)		// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]])
// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8		// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8
// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1		// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1
// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],		// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],
// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null		// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null
// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]		// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]
//		//
// CHECK: [[SEL_WORKERS]]		// CHECK: [[SEL_WORKERS]]
// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]		// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	int bar(int n){
// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,		// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,
// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,		// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,
// CHECK: store i8* null, i8** [[OMP_WORK_FN]],		// CHECK: store i8* null, i8** [[OMP_WORK_FN]],
// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],		// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],
// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]		// CHECK: br label {{%?}}[[AWAIT_WORK:.+]]
//		//
// CHECK: [[AWAIT_WORK]]		// CHECK: [[AWAIT_WORK]]
// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)		// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]], i16 1)		// CHECK: [[KPR:%.+]] = call i1 @__kmpc_kernel_parallel(i8** [[OMP_WORK_FN]])
// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8		// CHECK: [[KPRB:%.+]] = zext i1 [[KPR]] to i8
// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1		// store i8 [[KPRB]], i8* [[OMP_EXEC_STATUS]], align 1
// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],		// CHECK: [[WORK:%.+]] = load i8, i8* [[OMP_WORK_FN]],
// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null		// CHECK: [[SHOULD_EXIT:%.+]] = icmp eq i8* [[WORK]], null
// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]		// CHECK: br i1 [[SHOULD_EXIT]], label {{%?}}[[EXIT:.+]], label {{%?}}[[SEL_WORKERS:.+]]
//		//
// CHECK: [[SEL_WORKERS]]		// CHECK: [[SEL_WORKERS]]
// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]		// CHECK: [[ST:%.+]] = load i8, i8* [[OMP_EXEC_STATUS]]
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	int bar(int n){
// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* @{{.+}}, i32 0, i32 0, i32 0) to i8), i{{64\|32}} [[SIZE]], i16 [[SHARED]], i8* addrspacecast (i8* addrspace(3)* [[BUF:@.+]] to i8**))		// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* @{{.+}}, i32 0, i32 0, i32 0) to i8), i{{64\|32}} [[SIZE]], i16 [[SHARED]], i8* addrspacecast (i8* addrspace(3)* [[BUF:@.+]] to i8**))
// SEQ: [[PTR:%.+]] = load i8, i8 addrspace(3)* [[BUF]],		// SEQ: [[PTR:%.+]] = load i8, i8 addrspace(3)* [[BUF]],
// SEQ: [[ADDR:%.+]] = getelementptr inbounds i8, i8* [[PTR]], i{{64\|32}} 0		// SEQ: [[ADDR:%.+]] = getelementptr inbounds i8, i8* [[PTR]], i{{64\|32}} 0
// PAR: [[ADDR:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4, i16 1)		// PAR: [[ADDR:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4, i16 1)
// CHECK: [[RD:%.+]] = bitcast i8* [[ADDR]] to [[GLOB_TY:%.+]]*		// CHECK: [[RD:%.+]] = bitcast i8* [[ADDR]] to [[GLOB_TY:%.+]]*
// CHECK: [[I_ADDR:%.+]] = getelementptr inbounds [[GLOB_TY]], [[GLOB_TY]]* [[RD]], i32 0, i32 0		// CHECK: [[I_ADDR:%.+]] = getelementptr inbounds [[GLOB_TY]], [[GLOB_TY]]* [[RD]], i32 0, i32 0
//		//
// CHECK: call void @__kmpc_for_static_init_4(		// CHECK: call void @__kmpc_for_static_init_4(
// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*), i16 1)		// CHECK: call void @__kmpc_kernel_prepare_parallel(i8* bitcast (void (i16, i32)* @{{.+}} to i8*))
// CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_VARS_PTR:%.+]], i{{64\|32}} 1)		// CHECK: call void @__kmpc_begin_sharing_variables(i8*** [[SHARED_VARS_PTR:%.+]], i{{64\|32}} 1)
// CHECK: [[SHARED_VARS_BUF:%.+]] = load i8, i8* [[SHARED_VARS_PTR]],		// CHECK: [[SHARED_VARS_BUF:%.+]] = load i8, i8* [[SHARED_VARS_PTR]],
// CHECK: [[VARS_BUF:%.+]] = getelementptr inbounds i8, i8* [[SHARED_VARS_BUF]], i{{64\|32}} 0		// CHECK: [[VARS_BUF:%.+]] = getelementptr inbounds i8, i8* [[SHARED_VARS_BUF]], i{{64\|32}} 0
// CHECK: [[I_ADDR_BC:%.+]] = bitcast i32* [[I_ADDR]] to i8*		// CHECK: [[I_ADDR_BC:%.+]] = bitcast i32* [[I_ADDR]] to i8*
// CHECK: store i8* [[I_ADDR_BC]], i8** [[VARS_BUF]],		// CHECK: store i8* [[I_ADDR_BC]], i8** [[VARS_BUF]],
// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)		// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)		// CHECK: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
// CHECK: call void @__kmpc_end_sharing_variables()		// CHECK: call void @__kmpc_end_sharing_variables()
// CHECK: call void @__kmpc_for_static_fini(		// CHECK: call void @__kmpc_for_static_fini(
#endif		#endif

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def

	Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines
	__OMP_RTL(__tgt_target_data_update_nowait, false, Void, Int64, Int32,			__OMP_RTL(__tgt_target_data_update_nowait, false, Void, Int64, Int32,
	VoidPtrPtr, VoidPtrPtr, Int64Ptr, Int64Ptr)			VoidPtrPtr, VoidPtrPtr, Int64Ptr, Int64Ptr)
	__OMP_RTL(__tgt_mapper_num_components, false, Int64, VoidPtr)			__OMP_RTL(__tgt_mapper_num_components, false, Int64, VoidPtr)
	__OMP_RTL(__tgt_push_mapper_component, false, Void, VoidPtr, VoidPtr, VoidPtr,			__OMP_RTL(__tgt_push_mapper_component, false, Void, VoidPtr, VoidPtr, VoidPtr,
	Int64, Int64)			Int64, Int64)
	__OMP_RTL(__kmpc_task_allow_completion_event, false, VoidPtr, IdentPtr,			__OMP_RTL(__kmpc_task_allow_completion_event, false, VoidPtr, IdentPtr,
	/* Int / Int32, / kmp_task_t */ VoidPtr)			/* Int / Int32, / kmp_task_t */ VoidPtr)

				/// Note that device runtime functions (in the following) do not necessarily
				/// need attributes as we expect to see the definitions.
				__OMP_RTL(__kmpc_kernel_parallel, false, Int1, VoidPtrPtr)
				__OMP_RTL(__kmpc_kernel_prepare_parallel, false, Void, VoidPtr)

	__OMP_RTL(__last, false, Void, )			__OMP_RTL(__last, false, Void, )

	#undef __OMP_RTL			#undef __OMP_RTL
	#undef OMP_RTL			#undef OMP_RTL

	#define EnumAttr(Kind) Attribute::get(Ctx, Attribute::AttrKind::Kind)			#define EnumAttr(Kind) Attribute::get(Ctx, Attribute::AttrKind::Kind)
	#define AttributeSet(...) \			#define AttributeSet(...) \
	AttributeSet::get(Ctx, ArrayRef<Attribute>({__VA_ARGS__}))			AttributeSet::get(Ctx, ArrayRef<Attribute>({__VA_ARGS__}))
	▲ Show 20 Lines • Show All 555 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/common/src/parallel.cu

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (NumThreads < WARPSIZE) {
NumThreads = (NumThreads & ~((uint16_t)WARPSIZE - 1));		NumThreads = (NumThreads & ~((uint16_t)WARPSIZE - 1));
}		}
#endif		#endif

return NumThreads;		return NumThreads;
}		}

// This routine is always called by the team master..		// This routine is always called by the team master..
EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn,		EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn) {
int16_t IsOMPRuntimeInitialized) {
PRINT0(LD_IO, "call to __kmpc_kernel_prepare_parallel\n");		PRINT0(LD_IO, "call to __kmpc_kernel_prepare_parallel\n");
ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized, "Expected initialized runtime.");

omptarget_nvptx_workFn = WorkFn;		omptarget_nvptx_workFn = WorkFn;

// This routine is only called by the team master. The team master is		// This routine is only called by the team master. The team master is
// the first thread of the last warp. It always has the logical thread		// the first thread of the last warp. It always has the logical thread
// id of 0 (since it is a shadow for the first worker thread).		// id of 0 (since it is a shadow for the first worker thread).
const int threadId = 0;		const int threadId = 0;
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
Show All 28 Lines	EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn) {
threadsInTeam = NumThreads;		threadsInTeam = NumThreads;
}		}

// All workers call this function. Deactivate those not needed.		// All workers call this function. Deactivate those not needed.
// Fn - the outlined work function to execute.		// Fn - the outlined work function to execute.
// returns True if this thread is active, else False.		// returns True if this thread is active, else False.
//		//
// Only the worker threads call this routine.		// Only the worker threads call this routine.
EXTERN bool __kmpc_kernel_parallel(void **WorkFn,		EXTERN bool __kmpc_kernel_parallel(void **WorkFn) {
int16_t IsOMPRuntimeInitialized) {
PRINT0(LD_IO \| LD_PAR, "call to __kmpc_kernel_parallel\n");		PRINT0(LD_IO \| LD_PAR, "call to __kmpc_kernel_parallel\n");

ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized, "Expected initialized runtime.");

// Work function and arguments for L1 parallel region.		// Work function and arguments for L1 parallel region.
*WorkFn = omptarget_nvptx_workFn;		*WorkFn = omptarget_nvptx_workFn;

// If this is the termination signal from the master, quit early.		// If this is the termination signal from the master, quit early.
if (!*WorkFn) {		if (!*WorkFn) {
PRINT0(LD_IO \| LD_PAR, "call to __kmpc_kernel_parallel finished\n");		PRINT0(LD_IO \| LD_PAR, "call to __kmpc_kernel_parallel finished\n");
return false;		return false;
}		}
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/interface.h

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	EXTERN int32_t __kmpc_cancel(kmp_Ident *loc, int32_t global_tid,
int32_t cancelVal);		int32_t cancelVal);

// non standard		// non standard
EXTERN void __kmpc_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime);		EXTERN void __kmpc_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime);
EXTERN void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized);		EXTERN void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized);
EXTERN void __kmpc_spmd_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime,		EXTERN void __kmpc_spmd_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime,
int16_t RequiresDataSharing);		int16_t RequiresDataSharing);
EXTERN void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime);		EXTERN void __kmpc_spmd_kernel_deinit_v2(int16_t RequiresOMPRuntime);
EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn,		EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn);
int16_t IsOMPRuntimeInitialized);		EXTERN bool __kmpc_kernel_parallel(void **WorkFn);
EXTERN bool __kmpc_kernel_parallel(void **WorkFn,
int16_t IsOMPRuntimeInitialized);
EXTERN void __kmpc_kernel_end_parallel();		EXTERN void __kmpc_kernel_end_parallel();

EXTERN void __kmpc_data_sharing_init_stack();		EXTERN void __kmpc_data_sharing_init_stack();
EXTERN void __kmpc_data_sharing_init_stack_spmd();		EXTERN void __kmpc_data_sharing_init_stack_spmd();
EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,		EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
int16_t UseSharedMemory);		int16_t UseSharedMemory);
EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);		EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);
EXTERN void __kmpc_data_sharing_pop_stack(void *a);		EXTERN void __kmpc_data_sharing_pop_stack(void *a);
Show All 26 Lines