This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1
ClosedPublic

Authored by saiislam on May 11 2020, 5:13 PM.

Download Raw Diff

Details

Reviewers

ronlieb
yaxunl
b-sumner
scchan
JonChesterfield
jdoerfert
sameerds
msearles
hliao
arsenm

Commits

rG602d9b0afc77: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

Summary

Allow AMDGCN as a GPU offloading target for OpenMP during compiler
invocation and allow setting CUDAMode for it.

Originally authored by Greg Rodgers (@gregrodgers).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.May 11 2020, 5:13 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMay 11 2020, 5:13 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, dexonsmith, guansong. · View Herald Transcript

Fixed lint errors.

Test?

clang/lib/AST/Decl.cpp
3227	This is also identical to the cuda handling above, can you merge them

saiislam added reviewers: sameerds, msearles, hliao.May 11 2020, 6:07 PM

jdoerfert added inline comments.May 11 2020, 6:43 PM

clang/lib/AST/Decl.cpp
3227	Yes, wasn't there an idea to have isGPU()?
clang/lib/Frontend/CompilerInvocation.cpp
3172	Can we please not call it CUDA mode for non-CUDA targets? Or do you expect the compilation to "identify" as CUDA compilation?
llvm/include/llvm/ADT/Triple.h
700	What's the difference between an AMDGPU and AMDGCN?

sameerds added inline comments.May 11 2020, 6:45 PM

clang/lib/AST/Decl.cpp
3227	It's not identical, because CUDA is filtering out host code and its check is target independent. But then, Saiyed, it seems that the new patch disallows library builtins on all languages that reach this point, both on host and device code. Is that the intention? Does this point in the flow preclude any side-effects outside of "OpenMP on AMDGCN"?
llvm/include/llvm/ADT/Triple.h
696	Why not just isAMDGPU()? I myself don't have an opinion either way, but still curious.

Harbormaster failed remote builds in B56379: Diff 263313!May 11 2020, 6:55 PM

sameerds added inline comments.May 11 2020, 7:21 PM

clang/lib/Frontend/CompilerInvocation.cpp
3172	I suspect it's just a lot of water under the bridge. All over Clang, HIP has traditionally co-opted CUDA code without introducing new identifiers, like "-fcuda-is-device". It won't be easy to redo that now, and definitely looks out of scope for this change. A typical HIP compilation usually does identify itself as a HIP compilation like setting the isHIP flag. This allows the frontend to distinguish between CUDA and HIP when it matters.

Harbormaster completed remote builds in B56380: Diff 263314.May 11 2020, 7:27 PM

yaxunl added inline comments.May 11 2020, 9:30 PM

llvm/include/llvm/ADT/Triple.h
700	AMDGPU inlude r600 and amdgcn. r600 are old AMD GPUs. They do not support flat address space therefore cannot support CUDA or HIP.

sameerds added inline comments.May 11 2020, 10:03 PM

llvm/include/llvm/ADT/Triple.h
700	As a separate topic, does that mean that Clang should reject r600 at an earlier entry point itself?

sameerds added inline comments.May 11 2020, 10:27 PM

clang/lib/AST/Decl.cpp
3230	This needs a test.
clang/lib/Frontend/CompilerInvocation.cpp
3112–3113	Needs a test.
3147	This is probably too fundamental to need a test on its own.

saiislam marked 5 inline comments as done.May 14 2020, 4:55 AM

saiislam added inline comments.

clang/lib/AST/Decl.cpp
3227	@sameerds this function returns a value indicating whether input function corresponds to a builtin function (returns BuiltinID), or not (returns 0) i.e. all conditions returning 0 are the exceptions where a valid BuiltinID can't be returned. The new condition only applies to non-OpenCL (line 3213) builtin functions which are not printf and malloc, and have been targeted for amdgcn.
3227	@jdoerfert Can you please elaborate on isGPU() idea? I am not aware about it. Guessing by the name, I have added isGPU() as a wrapper over isNVPTX() and isAMDGCN() in the next revision.
3228	this condition is not required because it has already been checked in line #3200 earlier.
clang/lib/Frontend/CompilerInvocation.cpp
3172	@jdoerfert OpenMP support document of clang defines two data sharing modes for cuda devices: CUDA and Generic mode. NVPTX and AMDGCN both are cuda devices. Doesn't it make sense to not refactor CUDA mode as of now?

Added test cases. Added a wrapper isGPU() for isNVPTX()/isAMDGCN().

saiislam marked 7 inline comments as done and 2 inline comments as done.May 14 2020, 5:00 AM

yaxunl added inline comments.May 14 2020, 5:30 AM

llvm/include/llvm/ADT/Triple.h
700	I think HIP checks triple and rejects triple other than amdgcn

Harbormaster completed remote builds in B56723: Diff 263972.May 14 2020, 7:32 AM

saiislam added a reviewer: arsenm.May 14 2020, 8:59 AM

Herald added a subscriber: wdng. · View Herald TranscriptMay 14 2020, 8:59 AM

arsenm added inline comments.May 14 2020, 9:31 AM

llvm/include/llvm/ADT/Triple.h
699	This is skipping out r600, and probably should leave this in clang?

Moved isGPU() from llvm's Triple.h to clang's TargetInfo. Renamed it to isOpenMPGPU()
to represent target's compatibility with OpenMP offloading and reduce its scope.

Harbormaster completed remote builds in B56836: Diff 264192.May 15 2020, 5:13 AM

sameerds added inline comments.May 15 2020, 9:15 AM

clang/include/clang/Basic/TargetInfo.h
1261 ↗	(On Diff #264192)	How is "OpenMP-compatible GPU" defined? I think it's too early to start designing predicates about whether a target is a GPU and whether it supports OpenMP.
clang/lib/AST/Decl.cpp
3227	This seems awkward to me. Why mix it up with only CUDA and HIP? The earlier factoring is better, where CUDA/HIP took care of their own business, and the catch-all case of AMDGCN was a separate clause by itself. It doesn't matter that the builtins being checked for AMDGCN on OpenMP are currently identical to CUDA/HIP. When this situation later changes (I am sure OpenMP will support more builtins), we will have to split it out again anyway.
clang/lib/Frontend/CompilerInvocation.cpp
3110	I am not particularly in favour of introducing a variable just because it looks smaller than a call at each appropriate location. If you really want it this way, at least make it a const.
3112–3113	Looking at the comment before this line, the correct predicate would "target supports exceptions with OpenMP". Is it always true that every GPU that supports OpenMP will not support exception handling? I would recommend just checking individual targets for now instead of inventing predicates.
3166	Is there any reason to believe that every future GPU added to this predicate will also want the CUDA mode set? I would recommend using individual targets for now instead of inventing a new predicate.
3171	Same doubt about this use of an artificial predicate as commented earlier.

sameerds added inline comments.May 16 2020, 7:51 PM

clang/lib/AST/Decl.cpp
3227	This clause seems to assume that AMDGCN automatically means OpenMP whenever it is not HIP. That does not sound right. Is there a Context.getLangOpts().OpenMP flag? If not, why not? There also seems to be an Opts.OpenMPIsDevice ... perhaps that could be used here to write a separate OpenMP clause?

Removed isOpenMPGPU() to avoid defining OpenMP compatibility of an architecture.
Reverting back to explicitly checking NVPTX and AMDGCN architectures. Also, split
handling of NVPTX's and AMDGCN's handling of getBuiltinID. For AMDGCN it now uses
OpenMPIsDevice LangOpt and returns 0 for every device library function, except for
printf and malloc.

sameerds added inline comments.May 17 2020, 10:35 PM

clang/lib/AST/Decl.cpp
3230	This still needs a test. One that shows that specific builtins are allowed and others are not.
clang/test/Driver/openmp-offload-gpu.c
257	Is there a functional reason to move these lines? Changes to existing files should be minimized to show only functional changes. Any NFC rearrangement should be a separate patch.

sameerds added inline comments.May 17 2020, 10:51 PM

clang/lib/AST/Decl.cpp
3227	This should say "OpenMP does not have ..."?

Harbormaster completed remote builds in B57011: Diff 264529.May 17 2020, 11:57 PM

sameerds added inline comments.May 18 2020, 8:27 AM

clang/lib/AST/Decl.cpp
3229	Why is the check for AMDGCN required here? Doesn't the language define the set of supported builtins in a language-independent way? At least that seems to be true for OpenCL and CUDA above.

Fixed typo. Added test case to check for device side functions.

Herald added a subscriber: jvesely. · View Herald TranscriptMay 18 2020, 6:52 PM

Harbormaster completed remote builds in B57145: Diff 264766.May 18 2020, 8:04 PM

Added test case to show treatment of specific functions as builtins or functions on the device

Herald added a subscriber: sstefan1. · View Herald TranscriptMay 22 2020, 5:51 AM

saiislam marked 19 inline comments as done.May 22 2020, 6:00 AM

saiislam added inline comments.

clang/lib/AST/Decl.cpp
3229	AMDGCN check is required because this block deals with the openmp target when it is amdgcn. A future or existing target device might decide to deal with library calls in its own way.
3230	Added test case to show difference in treatment of printf() and other predefined library functions, on the device.

Harbormaster completed remote builds in B57639: Diff 265716.May 22 2020, 7:29 AM

Shifted test cases in openmp-offload-gpu.c for better visual segmentation.
Updated device function test case to be more accurate.

Harbormaster completed remote builds in B57866: Diff 266171.May 26 2020, 7:34 AM

I'm generally fine with this, don't wait for my approval.

clang/test/OpenMP/amdgcn_device_function_call.cpp
28	Not for this patch: FWIW, we will need to make math functions work inside target regions. The way aomp does it is afaik different from the way we do it. We can however adopt our way for this target though. Feel free to ping me on this later.

saiislam marked an inline comment as done.May 26 2020, 10:34 AM

saiislam added inline comments.

clang/test/OpenMP/amdgcn_device_function_call.cpp
28	Yes, sure. I will ping you as we move towards it. Thanks.

LGTM!

This revision is now accepted and ready to land.May 27 2020, 12:24 AM

Closed by commit rG602d9b0afc77: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1 (authored by saiislam). · Explain WhyMay 27 2020, 1:02 AM

This revision was automatically updated to reflect the committed changes.

saiislam added a child revision: D80917: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2.Jun 1 2020, 4:13 AM

Revision Contents

Path

Size

clang/

lib/

AST/

Decl.cpp

9 lines

Frontend/

CompilerInvocation.cpp

12 lines

test/

Driver/

openmp-offload-gpu.c

21 lines

OpenMP/

amdgcn_device_function_call.cpp

27 lines

target_parallel_no_exceptions.cpp

1 line

llvm/

include/

llvm/

ADT/

Triple.h

3 lines

Diff 266447

clang/lib/AST/Decl.cpp

Show First 20 Lines • Show All 3,218 Lines • ▼ Show 20 Lines	unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const {

// CUDA does not have device-side standard library. printf and malloc are the		// CUDA does not have device-side standard library. printf and malloc are the
// only special cases that are supported by device-side runtime.		// only special cases that are supported by device-side runtime.
if (Context.getLangOpts().CUDA && hasAttr<CUDADeviceAttr>() &&		if (Context.getLangOpts().CUDA && hasAttr<CUDADeviceAttr>() &&
!hasAttr<CUDAHostAttr>() &&		!hasAttr<CUDAHostAttr>() &&
!(BuiltinID == Builtin::BIprintf \|\| BuiltinID == Builtin::BImalloc))		!(BuiltinID == Builtin::BIprintf \|\| BuiltinID == Builtin::BImalloc))
return 0;		return 0;

		// As AMDGCN implementation of OpenMP does not have a device-side standard
		arsenmUnsubmitted Done Reply Inline Actions This is also identical to the cuda handling above, can you merge them arsenm: This is also identical to the cuda handling above, can you merge them
		sameerdsUnsubmitted Done Reply Inline Actions It's not identical, because CUDA is filtering out host code and its check is target independent. But then, Saiyed, it seems that the new patch disallows library builtins on all languages that reach this point, both on host and device code. Is that the intention? Does this point in the flow preclude any side-effects outside of "OpenMP on AMDGCN"? sameerds: It's not identical, because CUDA is filtering out host code and its check is target independent.
		saiislamAuthorUnsubmitted Done Reply Inline Actions @sameerds this function returns a value indicating whether input function corresponds to a builtin function (returns BuiltinID), or not (returns 0) i.e. all conditions returning 0 are the exceptions where a valid BuiltinID can't be returned. The new condition only applies to non-OpenCL (line 3213) builtin functions which are not printf and malloc, and have been targeted for amdgcn. saiislam: @sameerds this function returns a value indicating whether input function corresponds to a…
		jdoerfertUnsubmitted Done Reply Inline Actions Yes, wasn't there an idea to have isGPU()? jdoerfert: Yes, wasn't there an idea to have isGPU()?
		saiislamAuthorUnsubmitted Done Reply Inline Actions @jdoerfert Can you please elaborate on isGPU() idea? I am not aware about it. Guessing by the name, I have added isGPU() as a wrapper over isNVPTX() and isAMDGCN() in the next revision. saiislam: @jdoerfert Can you please elaborate on isGPU() idea? I am not aware about it. Guessing by the…
		sameerdsUnsubmitted Done Reply Inline Actions This seems awkward to me. Why mix it up with only CUDA and HIP? The earlier factoring is better, where CUDA/HIP took care of their own business, and the catch-all case of AMDGCN was a separate clause by itself. It doesn't matter that the builtins being checked for AMDGCN on OpenMP are currently identical to CUDA/HIP. When this situation later changes (I am sure OpenMP will support more builtins), we will have to split it out again anyway. sameerds: This seems awkward to me. Why mix it up with only CUDA and HIP? The earlier factoring is better…
		sameerdsUnsubmitted Done Reply Inline Actions This clause seems to assume that AMDGCN automatically means OpenMP whenever it is not HIP. That does not sound right. Is there a Context.getLangOpts().OpenMP flag? If not, why not? There also seems to be an Opts.OpenMPIsDevice ... perhaps that could be used here to write a separate OpenMP clause? sameerds: This clause seems to assume that AMDGCN automatically means OpenMP whenever it is not HIP. That…
		sameerdsUnsubmitted Done Reply Inline Actions This should say "OpenMP does not have ..."? sameerds: This should say "OpenMP does not have ..."?
		// library, none of the predefined library functions except printf and malloc
		saiislamAuthorUnsubmitted Done Reply Inline Actions this condition is not required because it has already been checked in line #3200 earlier. saiislam: this condition is not required because it has already been checked in line #3200 earlier.
		// should be treated as a builtin i.e. 0 should be returned for them.
		sameerdsUnsubmitted Done Reply Inline Actions Why is the check for AMDGCN required here? Doesn't the language define the set of supported builtins in a language-independent way? At least that seems to be true for OpenCL and CUDA above. sameerds: Why is the check for AMDGCN required here? Doesn't the language define the set of supported…
		saiislamAuthorUnsubmitted Done Reply Inline Actions AMDGCN check is required because this block deals with the openmp target when it is amdgcn. A future or existing target device might decide to deal with library calls in its own way. saiislam: AMDGCN check is required because this block deals with the openmp target when it is amdgcn. A…
		if (Context.getTargetInfo().getTriple().isAMDGCN() &&
		sameerdsUnsubmitted Done Reply Inline Actions This needs a test. sameerds: This needs a test.
		sameerdsUnsubmitted Done Reply Inline Actions This still needs a test. One that shows that specific builtins are allowed and others are not. sameerds: This still needs a test. One that shows that specific builtins are allowed and others are not.
		saiislamAuthorUnsubmitted Done Reply Inline Actions Added test case to show difference in treatment of printf() and other predefined library functions, on the device. saiislam: Added test case to show difference in treatment of printf() and other predefined library…
		Context.getLangOpts().OpenMPIsDevice &&
		Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID) &&
		!(BuiltinID == Builtin::BIprintf \|\| BuiltinID == Builtin::BImalloc))
		return 0;

return BuiltinID;		return BuiltinID;
}		}

/// getNumParams - Return the number of parameters this function must have		/// getNumParams - Return the number of parameters this function must have
/// based on its FunctionType. This is the length of the ParamInfo array		/// based on its FunctionType. This is the length of the ParamInfo array
/// after it has been created.		/// after it has been created.
unsigned FunctionDecl::getNumParams() const {		unsigned FunctionDecl::getNumParams() const {
const auto *FPT = getType()->getAs<FunctionProtoType>();		const auto *FPT = getType()->getAs<FunctionProtoType>();
▲ Show 20 Lines • Show All 1,764 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 3,101 Lines • ▼ Show 20 Lines	if (!Opts.OpenMPIsDevice) {
case llvm::Triple::nvptx64:		case llvm::Triple::nvptx64:
Diags.Report(diag::err_drv_omp_host_target_not_supported)		Diags.Report(diag::err_drv_omp_host_target_not_supported)
<< TargetOpts.Triple;		<< TargetOpts.Triple;
break;		break;
}		}
}		}
}		}

// Set the flag to prevent the implementation from emitting device exception		// Set the flag to prevent the implementation from emitting device exception
		sameerdsUnsubmitted Done Reply Inline Actions I am not particularly in favour of introducing a variable just because it looks smaller than a call at each appropriate location. If you really want it this way, at least make it a const. sameerds: I am not particularly in favour of introducing a variable just because it looks smaller than a…
// handling code for those requiring so.		// handling code for those requiring so.
if ((Opts.OpenMPIsDevice && T.isNVPTX()) \|\| Opts.OpenCLCPlusPlus) {		if ((Opts.OpenMPIsDevice && (T.isNVPTX() \|\| T.isAMDGCN())) \|\|
		Opts.OpenCLCPlusPlus) {
		sameerdsUnsubmitted Done Reply Inline Actions Needs a test. sameerds: Needs a test.
		sameerdsUnsubmitted Done Reply Inline Actions Looking at the comment before this line, the correct predicate would "target supports exceptions with OpenMP". Is it always true that every GPU that supports OpenMP will not support exception handling? I would recommend just checking individual targets for now instead of inventing predicates. sameerds: Looking at the comment before this line, the correct predicate would "target supports…
Opts.Exceptions = 0;		Opts.Exceptions = 0;
Opts.CXXExceptions = 0;		Opts.CXXExceptions = 0;
}		}
if (Opts.OpenMPIsDevice && T.isNVPTX()) {		if (Opts.OpenMPIsDevice && T.isNVPTX()) {
Opts.OpenMPCUDANumSMs =		Opts.OpenMPCUDANumSMs =
getLastArgIntValue(Args, options::OPT_fopenmp_cuda_number_of_sm_EQ,		getLastArgIntValue(Args, options::OPT_fopenmp_cuda_number_of_sm_EQ,
Opts.OpenMPCUDANumSMs, Diags);		Opts.OpenMPCUDANumSMs, Diags);
Opts.OpenMPCUDABlocksPerSM =		Opts.OpenMPCUDABlocksPerSM =
Show All 17 Lines	for (unsigned i = 0; i < A->getNumValues(); ++i) {

if (TT.getArch() == llvm::Triple::UnknownArch \|\|		if (TT.getArch() == llvm::Triple::UnknownArch \|\|
!(TT.getArch() == llvm::Triple::aarch64 \|\|		!(TT.getArch() == llvm::Triple::aarch64 \|\|
TT.getArch() == llvm::Triple::ppc \|\|		TT.getArch() == llvm::Triple::ppc \|\|
TT.getArch() == llvm::Triple::ppc64 \|\|		TT.getArch() == llvm::Triple::ppc64 \|\|
TT.getArch() == llvm::Triple::ppc64le \|\|		TT.getArch() == llvm::Triple::ppc64le \|\|
TT.getArch() == llvm::Triple::nvptx \|\|		TT.getArch() == llvm::Triple::nvptx \|\|
TT.getArch() == llvm::Triple::nvptx64 \|\|		TT.getArch() == llvm::Triple::nvptx64 \|\|
		TT.getArch() == llvm::Triple::amdgcn \|\|
		sameerdsUnsubmitted Done Reply Inline Actions This is probably too fundamental to need a test on its own. sameerds: This is probably too fundamental to need a test on its own.
TT.getArch() == llvm::Triple::x86 \|\|		TT.getArch() == llvm::Triple::x86 \|\|
TT.getArch() == llvm::Triple::x86_64))		TT.getArch() == llvm::Triple::x86_64))
Diags.Report(diag::err_drv_invalid_omp_target) << A->getValue(i);		Diags.Report(diag::err_drv_invalid_omp_target) << A->getValue(i);
else		else
Opts.OMPTargetTriples.push_back(TT);		Opts.OMPTargetTriples.push_back(TT);
}		}
}		}

// Get OpenMP host file path if any and report if a non existent file is		// Get OpenMP host file path if any and report if a non existent file is
// found		// found
if (Arg *A = Args.getLastArg(options::OPT_fopenmp_host_ir_file_path)) {		if (Arg *A = Args.getLastArg(options::OPT_fopenmp_host_ir_file_path)) {
Opts.OMPHostIRFile = A->getValue();		Opts.OMPHostIRFile = A->getValue();
if (!llvm::sys::fs::exists(Opts.OMPHostIRFile))		if (!llvm::sys::fs::exists(Opts.OMPHostIRFile))
Diags.Report(diag::err_drv_omp_host_ir_file_not_found)		Diags.Report(diag::err_drv_omp_host_ir_file_not_found)
<< Opts.OMPHostIRFile;		<< Opts.OMPHostIRFile;
}		}

// Set CUDA mode for OpenMP target NVPTX if specified in options		// Set CUDA mode for OpenMP target NVPTX/AMDGCN if specified in options
Opts.OpenMPCUDAMode = Opts.OpenMPIsDevice && T.isNVPTX() &&		Opts.OpenMPCUDAMode = Opts.OpenMPIsDevice && (T.isNVPTX() \|\| T.isAMDGCN()) &&
		sameerdsUnsubmitted Done Reply Inline Actions Is there any reason to believe that every future GPU added to this predicate will also want the CUDA mode set? I would recommend using individual targets for now instead of inventing a new predicate. sameerds: Is there any reason to believe that every future GPU added to this predicate will also want the…
Args.hasArg(options::OPT_fopenmp_cuda_mode);		Args.hasArg(options::OPT_fopenmp_cuda_mode);

// Set CUDA mode for OpenMP target NVPTX if specified in options		// Set CUDA mode for OpenMP target NVPTX/AMDGCN if specified in options
Opts.OpenMPCUDAForceFullRuntime =		Opts.OpenMPCUDAForceFullRuntime =
Opts.OpenMPIsDevice && T.isNVPTX() &&		Opts.OpenMPIsDevice && (T.isNVPTX() \|\| T.isAMDGCN()) &&
		sameerdsUnsubmitted Done Reply Inline Actions Same doubt about this use of an artificial predicate as commented earlier. sameerds: Same doubt about this use of an artificial predicate as commented earlier.
Args.hasArg(options::OPT_fopenmp_cuda_force_full_runtime);		Args.hasArg(options::OPT_fopenmp_cuda_force_full_runtime);
		jdoerfertUnsubmitted Done Reply Inline Actions Can we please not call it CUDA mode for non-CUDA targets? Or do you expect the compilation to "identify" as CUDA compilation? jdoerfert: Can we please not call it CUDA mode for non-CUDA targets? Or do you expect the compilation to…
		sameerdsUnsubmitted Done Reply Inline Actions I suspect it's just a lot of water under the bridge. All over Clang, HIP has traditionally co-opted CUDA code without introducing new identifiers, like "-fcuda-is-device". It won't be easy to redo that now, and definitely looks out of scope for this change. A typical HIP compilation usually does identify itself as a HIP compilation like setting the isHIP flag. This allows the frontend to distinguish between CUDA and HIP when it matters. sameerds: I suspect it's just a lot of water under the bridge. All over Clang, HIP has traditionally co…
		saiislamAuthorUnsubmitted Done Reply Inline Actions @jdoerfert OpenMP support document of clang defines two data sharing modes for cuda devices: CUDA and Generic mode. NVPTX and AMDGCN both are cuda devices. Doesn't it make sense to not refactor CUDA mode as of now? saiislam: @jdoerfert [[ https://clang.llvm.org/docs/OpenMPSupport.html#data-sharing-modes \| OpenMP…

// Record whether the __DEPRECATED define was requested.		// Record whether the __DEPRECATED define was requested.
Opts.Deprecated = Args.hasFlag(OPT_fdeprecated_macro,		Opts.Deprecated = Args.hasFlag(OPT_fdeprecated_macro,
OPT_fno_deprecated_macro,		OPT_fno_deprecated_macro,
Opts.Deprecated);		Opts.Deprecated);

// FIXME: Eliminate this dependency.		// FIXME: Eliminate this dependency.
unsigned Opt = getOptimizationLevel(Args, IK, Diags),		unsigned Opt = getOptimizationLevel(Args, IK, Diags),
▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

clang/test/Driver/openmp-offload-gpu.c

	///			///
	/// Perform several driver tests for OpenMP offloading			/// Perform several driver tests for OpenMP offloading
	///			///

	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: powerpc-registered-target			// REQUIRES: powerpc-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
				// REQUIRES: amdgpu-registered-target

	/// ###########################################################################			/// ###########################################################################

	/// Check -Xopenmp-target uses one of the archs provided when several archs are used.			/// Check -Xopenmp-target uses one of the archs provided when several archs are used.
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \
	// RUN: -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \			// RUN: -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s			// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s

	▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines
	// HAS_DEBUG-SAME: "--return-at-end"			// HAS_DEBUG-SAME: "--return-at-end"
	// HAS_DEBUG: nvlink			// HAS_DEBUG: nvlink
	// HAS_DEBUG-SAME: "-g"			// HAS_DEBUG-SAME: "-g"

	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-mode 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-mode 2>&1 \
	// RUN: \| FileCheck -check-prefix=CUDA_MODE %s			// RUN: \| FileCheck -check-prefix=CUDA_MODE %s
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-mode -fopenmp-cuda-mode 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-mode -fopenmp-cuda-mode 2>&1 \
	// RUN: \| FileCheck -check-prefix=CUDA_MODE %s			// RUN: \| FileCheck -check-prefix=CUDA_MODE %s
	// CUDA_MODE: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fopenmp-cuda-mode 2>&1 \
	sameerdsUnsubmitted Not Done Reply Inline Actions Is there a functional reason to move these lines? Changes to existing files should be minimized to show only functional changes. Any NFC rearrangement should be a separate patch. sameerds: Is there a functional reason to move these lines? Changes to existing files should be minimized…
				// RUN: \| FileCheck -check-prefix=CUDA_MODE %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fno-openmp-cuda-mode -fopenmp-cuda-mode 2>&1 \
				// RUN: \| FileCheck -check-prefix=CUDA_MODE %s
				// CUDA_MODE: clang{{.}}"-cc1"{{.}}"-triple" "{{nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa}}"
	// CUDA_MODE-SAME: "-fopenmp-cuda-mode"			// CUDA_MODE-SAME: "-fopenmp-cuda-mode"
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-mode 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-mode 2>&1 \
	// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s			// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-mode -fno-openmp-cuda-mode 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-mode -fno-openmp-cuda-mode 2>&1 \
	// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s			// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fno-openmp-cuda-mode 2>&1 \
				// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fopenmp-cuda-mode -fno-openmp-cuda-mode 2>&1 \
				// RUN: \| FileCheck -check-prefix=NO_CUDA_MODE %s
	// NO_CUDA_MODE-NOT: "-{{fno-\|f}}openmp-cuda-mode"			// NO_CUDA_MODE-NOT: "-{{fno-\|f}}openmp-cuda-mode"

	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-force-full-runtime 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-force-full-runtime 2>&1 \
	// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s			// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-force-full-runtime -fopenmp-cuda-force-full-runtime 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-force-full-runtime -fopenmp-cuda-force-full-runtime 2>&1 \
	// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s			// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s
	// FULL_RUNTIME: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fopenmp-cuda-force-full-runtime 2>&1 \
				// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fno-openmp-cuda-force-full-runtime -fopenmp-cuda-force-full-runtime 2>&1 \
				// RUN: \| FileCheck -check-prefix=FULL_RUNTIME %s
				// FULL_RUNTIME: clang{{.}}"-cc1"{{.}}"-triple" "{{nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa}}"
	// FULL_RUNTIME-SAME: "-fopenmp-cuda-force-full-runtime"			// FULL_RUNTIME-SAME: "-fopenmp-cuda-force-full-runtime"
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-force-full-runtime 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fno-openmp-cuda-force-full-runtime 2>&1 \
	// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s			// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s
	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-force-full-runtime -fno-openmp-cuda-force-full-runtime 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-force-full-runtime -fno-openmp-cuda-force-full-runtime 2>&1 \
	// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s			// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fno-openmp-cuda-force-full-runtime 2>&1 \
				// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s
				// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s -fopenmp-cuda-force-full-runtime -fno-openmp-cuda-force-full-runtime 2>&1 \
				// RUN: \| FileCheck -check-prefix=NO_FULL_RUNTIME %s
	// NO_FULL_RUNTIME-NOT: "-{{fno-\|f}}openmp-cuda-force-full-runtime"			// NO_FULL_RUNTIME-NOT: "-{{fno-\|f}}openmp-cuda-force-full-runtime"

	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-teams-reduction-recs-num=2048 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-teams-reduction-recs-num=2048 2>&1 \
	// RUN: \| FileCheck -check-prefix=CUDA_RED_RECS %s			// RUN: \| FileCheck -check-prefix=CUDA_RED_RECS %s
	// CUDA_RED_RECS: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"			// CUDA_RED_RECS: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"
	// CUDA_RED_RECS-SAME: "-fopenmp-cuda-teams-reduction-recs-num=2048"			// CUDA_RED_RECS-SAME: "-fopenmp-cuda-teams-reduction-recs-num=2048"

	// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda %s 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=OPENMP_NVPTX_WRAPPERS %s			// RUN: \| FileCheck -check-prefix=OPENMP_NVPTX_WRAPPERS %s
	// OPENMP_NVPTX_WRAPPERS: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"			// OPENMP_NVPTX_WRAPPERS: clang{{.}}"-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"
	// OPENMP_NVPTX_WRAPPERS-SAME: "-internal-isystem" "{{.*}}openmp_wrappers"			// OPENMP_NVPTX_WRAPPERS-SAME: "-internal-isystem" "{{.*}}openmp_wrappers"

clang/test/OpenMP/amdgcn_device_function_call.cpp

This file was added.

				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s
				// RUN: llvm-dis < %t-ppc-host.bc \| FileCheck %s -check-prefix=HOST

				// device side declarations
				#pragma omp declare target
				extern "C" float cosf(float __x);
				#pragma omp end declare target

				// host side declaration
				extern "C" float cosf(float __x);

				void test_amdgcn_openmp_device(float __x) {
				// the default case where predefined library functions are treated as
				// builtins on the host
				// HOST: call float @llvm.cos.f32(float
				__x = cosf(__x);

				#pragma omp target
				{
				// cosf should not be treated as builtin on device
				// CHECK-NOT: call float @llvm.cos.f32(float
				__x = cosf(__x);
				}
				}
				jdoerfertUnsubmitted Not Done Reply Inline Actions Not for this patch: FWIW, we will need to make math functions work inside target regions. The way aomp does it is afaik different from the way we do it. We can however adopt our way for this target though. Feel free to ping me on this later. jdoerfert: Not for this patch: FWIW, we will need to make math functions work inside target regions. The…
				saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, sure. I will ping you as we move towards it. Thanks. saiislam: Yes, sure. I will ping you as we move towards it. Thanks.

clang/test/OpenMP/target_parallel_no_exceptions.cpp

	/// Make sure no exception messages are inclided in the llvm output.			/// Make sure no exception messages are inclided in the llvm output.
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix CHK-EXCEPTION			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix CHK-EXCEPTION
				// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix CHK-EXCEPTION

	void test_increment() {			void test_increment() {
	#pragma omp target			#pragma omp target
	{			{
	[]() { return; }();			[]() { return; }();
	}			}
	}			}

	int main() {			int main() {
	test_increment();			test_increment();
	return 0;			return 0;
	}			}

	//CHK-EXCEPTION-NOT: invoke			//CHK-EXCEPTION-NOT: invoke

llvm/include/llvm/ADT/Triple.h

Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	bool isSPIR() const {
return getArch() == Triple::spir \|\| getArch() == Triple::spir64;		return getArch() == Triple::spir \|\| getArch() == Triple::spir64;
}		}

/// Tests whether the target is NVPTX (32- or 64-bit).		/// Tests whether the target is NVPTX (32- or 64-bit).
bool isNVPTX() const {		bool isNVPTX() const {
return getArch() == Triple::nvptx \|\| getArch() == Triple::nvptx64;		return getArch() == Triple::nvptx \|\| getArch() == Triple::nvptx64;
}		}

		/// Tests whether the target is AMDGCN
		bool isAMDGCN() const { return getArch() == Triple::amdgcn; }
		sameerdsUnsubmitted Done Reply Inline Actions Why not just isAMDGPU()? I myself don't have an opinion either way, but still curious. sameerds: Why not just isAMDGPU()? I myself don't have an opinion either way, but still curious.

bool isAMDGPU() const {		bool isAMDGPU() const {
return getArch() == Triple::r600 \|\| getArch() == Triple::amdgcn;		return getArch() == Triple::r600 \|\| getArch() == Triple::amdgcn;
		arsenmUnsubmitted Done Reply Inline Actions This is skipping out r600, and probably should leave this in clang? arsenm: This is skipping out r600, and probably should leave this in clang?
}		}
		jdoerfertUnsubmitted Done Reply Inline Actions What's the difference between an AMDGPU and AMDGCN? jdoerfert: What's the difference between an AMDGPU and AMDGCN?
		yaxunlUnsubmitted Done Reply Inline Actions AMDGPU inlude r600 and amdgcn. r600 are old AMD GPUs. They do not support flat address space therefore cannot support CUDA or HIP. yaxunl: AMDGPU inlude r600 and amdgcn. r600 are old AMD GPUs. They do not support flat address space…
		sameerdsUnsubmitted Done Reply Inline Actions As a separate topic, does that mean that Clang should reject r600 at an earlier entry point itself? sameerds: As a separate topic, does that mean that Clang should reject r600 at an earlier entry point…
		yaxunlUnsubmitted Done Reply Inline Actions I think HIP checks triple and rejects triple other than amdgcn yaxunl: I think HIP checks triple and rejects triple other than amdgcn

/// Tests whether the target is Thumb (little and big endian).		/// Tests whether the target is Thumb (little and big endian).
bool isThumb() const {		bool isThumb() const {
return getArch() == Triple::thumb \|\| getArch() == Triple::thumbeb;		return getArch() == Triple::thumb \|\| getArch() == Triple::thumbeb;
}		}

/// Tests whether the target is ARM (little and big endian).		/// Tests whether the target is ARM (little and big endian).
bool isARM() const {		bool isARM() const {
▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 266447

clang/lib/AST/Decl.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/Driver/openmp-offload-gpu.c

clang/test/OpenMP/amdgcn_device_function_call.cpp

clang/test/OpenMP/target_parallel_no_exceptions.cpp

llvm/include/llvm/ADT/Triple.h

[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1
ClosedPublic