This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
3
LangOptions.def
-
Driver/
-
Options.td
-
lib/
-
CodeGen/
2/5
TargetInfo.cpp
-
Driver/ToolChains/
-
ToolChains/
1
AMDGPU.cpp
1
AMDGPUOpenMP.cpp
-
HIPAMD.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/
-
Driver/
-
openmp-offload-gpu.c
-
OpenMP/
-
amdgcn-attributes.cpp

Differential D142393

[OpenMP] Add 'amdgpu-flat-work-group-size' to OpenMP kernels
Needs RevisionPublic

Authored by jhuber6 on Jan 23 2023, 11:26 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
arsenm
tra
yaxunl
jdoerfert

Summary

This patch adds the amdgpu-flat-work-group-size=1,1024 attribute to
OpenMP kernels targeting AMDGPU. This also lets us use
--gpu-max-threads-per-block which is loosened from being a HIP only
option.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	3,440 ms	libcxx CI Modules > llvm-libc++-shared-cfg-in.libcxx/algorithms/specialized_algorithms/special_mem_concepts::nothrow_sentinel_for.compile.pass.cpp

Event Timeline

jhuber6 created this revision.Jan 23 2023, 11:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 11:26 AM

Herald added subscribers: kosarev, kerbowa, guansong and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.Jan 23 2023, 11:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 11:26 AM

Herald added subscribers: cfe-commits, sstefan1, MaskRay, wdng. · View Herald Transcript

arsenm added inline comments.Jan 23 2023, 11:32 AM

clang/lib/CodeGen/TargetInfo.cpp
9552	Probably shouldn’t check the language, just it’s a kernel. Also shouldn’t emit this if it’s the default 1024. I’ve been trying to cut down on the superfluous attribute spam

jhuber6 added inline comments.Jan 23 2023, 11:36 AM

clang/lib/CodeGen/TargetInfo.cpp
9552	There's a section for HIP above that does the same. We could probably consolidate here for all "AMDGPU" kernels and get rid of the redundant attribute. Maybe in a separate patch?

arsenm added inline comments.Jan 23 2023, 11:38 AM

clang/lib/CodeGen/TargetInfo.cpp
9552	All the isCUDA \|\| HIP \|\| OpenMP checks scattered around are driving me crazy. A bunch of the out of tree divergent patches are just adding to them. We should just purge everything checking languages to the actual features and stop putting language names in things

jhuber6 added inline comments.Jan 23 2023, 11:40 AM

clang/lib/CodeGen/TargetInfo.cpp
9552	OpenCL is the odd one out as far as I know, HIP and OpenMP are mostly equivalent as far as attributes go.

Harbormaster completed remote builds in B209444: Diff 491465.Jan 23 2023, 1:13 PM

yaxunl added inline comments.Jan 23 2023, 1:47 PM

clang/lib/CodeGen/TargetInfo.cpp
9552	OpenCL uses 256 as default max block size. This is to avoid performance regressions for existing apps. HIP uses 1024 by default.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
61	we should keep the default value in Options.td instead of having multiple copies at different places. save as below.

jdoerfert added inline comments.Jan 23 2023, 4:02 PM

clang/lib/Driver/ToolChains/AMDGPU.cpp
791	^

arsenm added inline comments.Feb 2 2023, 6:31 PM

clang/include/clang/Basic/LangOptions.def
271	probably should drop the language and describe what it is

arsenm requested changes to this revision.Apr 26 2023, 4:56 AM

This revision now requires changes to proceed.Apr 26 2023, 4:56 AM

Herald added subscribers: jplehr, sunshaoce. · View Herald TranscriptApr 26 2023, 4:56 AM

yaxunl added inline comments.Apr 26 2023, 7:36 AM

clang/include/clang/Basic/LangOptions.def
271	CUDA does not use it. Drop the language may cause confusion.

arsenm added inline comments.Apr 26 2023, 10:43 AM

clang/include/clang/Basic/LangOptions.def
271	If CUDA doesn't respect the flag, that's just broken. The concept is common among all the languages, and all the unnecessary language qualifications are a plague. Controls should be expressed as a generic concept that languages selectively enable

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

LangOptions.def

2 lines

Driver/

Options.td

5 lines

lib/

CodeGen/

TargetInfo.cpp

4 lines

Driver/

ToolChains/

AMDGPU.cpp

2 lines

AMDGPUOpenMP.cpp

6 lines

HIPAMD.cpp

12 lines

Frontend/

CompilerInvocation.cpp

3 lines

test/

Driver/

openmp-offload-gpu.c

10 lines

OpenMP/

amdgcn-attributes.cpp

11 lines

Diff 491465

clang/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	ENUM_LANGOPT(HLSLVersion, HLSLLangStd, 16, HLSL_Unset, "HLSL Version")			ENUM_LANGOPT(HLSLVersion, HLSLLangStd, 16, HLSL_Unset, "HLSL Version")

	LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")
	LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "allowing variadic functions in CUDA device code")			LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "allowing variadic functions in CUDA device code")
	LANGOPT(CUDAHostDeviceConstexpr, 1, 1, "treating unattributed constexpr functions as __host__ __device__")			LANGOPT(CUDAHostDeviceConstexpr, 1, 1, "treating unattributed constexpr functions as __host__ __device__")
	LANGOPT(CUDADeviceApproxTranscendentals, 1, 0, "using approximate transcendental functions")			LANGOPT(CUDADeviceApproxTranscendentals, 1, 0, "using approximate transcendental functions")
	LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate relocatable device code")			LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate relocatable device code")
	LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions for HIP")			LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions for HIP")
	LANGOPT(GPUMaxThreadsPerBlock, 32, 1024, "default max threads per block for kernel launch bounds for HIP")			LANGOPT(GPUMaxThreadsPerBlock, 32, 1024, "default max threads per block for kernel launch bounds for OpenMP/HIP")
				arsenmUnsubmitted Not Done Reply Inline Actions probably should drop the language and describe what it is arsenm: probably should drop the language and describe what it is
				yaxunlUnsubmitted Not Done Reply Inline Actions CUDA does not use it. Drop the language may cause confusion. yaxunl: CUDA does not use it. Drop the language may cause confusion.
				arsenmUnsubmitted Not Done Reply Inline Actions If CUDA doesn't respect the flag, that's just broken. The concept is common among all the languages, and all the unnecessary language qualifications are a plague. Controls should be expressed as a generic concept that languages selectively enable arsenm: If CUDA doesn't respect the flag, that's just broken. The concept is common among all the…
	LANGOPT(GPUDeferDiag, 1, 0, "defer host/device related diagnostic messages for CUDA/HIP")			LANGOPT(GPUDeferDiag, 1, 0, "defer host/device related diagnostic messages for CUDA/HIP")
	LANGOPT(GPUExcludeWrongSideOverloads, 1, 0, "always exclude wrong side overloads in overloading resolution for CUDA/HIP")			LANGOPT(GPUExcludeWrongSideOverloads, 1, 0, "always exclude wrong side overloads in overloading resolution for CUDA/HIP")
	LANGOPT(OffloadingNewDriver, 1, 0, "use the new driver for generating offloading code.")			LANGOPT(OffloadingNewDriver, 1, 0, "use the new driver for generating offloading code.")

	LANGOPT(SYCLIsDevice , 1, 0, "Generate code for SYCL device")			LANGOPT(SYCLIsDevice , 1, 0, "Generate code for SYCL device")
	LANGOPT(SYCLIsHost , 1, 0, "SYCL host compilation")			LANGOPT(SYCLIsHost , 1, 0, "SYCL host compilation")
	ENUM_LANGOPT(SYCLVersion , SYCLMajorVersion, 2, SYCL_None, "Version of the SYCL standard used")			ENUM_LANGOPT(SYCLVersion , SYCLMajorVersion, 2, SYCL_None, "Version of the SYCL standard used")

	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,041 Lines • ▼ Show 20 Lines	defm gpu_defer_diag : BoolFOption<"gpu-defer-diag",
BothFlags<[], " host/device related diagnostic messages for CUDA/HIP">>;		BothFlags<[], " host/device related diagnostic messages for CUDA/HIP">>;
defm gpu_exclude_wrong_side_overloads : BoolFOption<"gpu-exclude-wrong-side-overloads",		defm gpu_exclude_wrong_side_overloads : BoolFOption<"gpu-exclude-wrong-side-overloads",
LangOpts<"GPUExcludeWrongSideOverloads">, DefaultFalse,		LangOpts<"GPUExcludeWrongSideOverloads">, DefaultFalse,
PosFlag<SetTrue, [CC1Option], "Always exclude wrong side overloads">,		PosFlag<SetTrue, [CC1Option], "Always exclude wrong side overloads">,
NegFlag<SetFalse, [], "Exclude wrong side overloads only if there are same side overloads">,		NegFlag<SetFalse, [], "Exclude wrong side overloads only if there are same side overloads">,
BothFlags<[HelpHidden], " in overloading resolution for CUDA/HIP">>;		BothFlags<[HelpHidden], " in overloading resolution for CUDA/HIP">>;
def gpu_max_threads_per_block_EQ : Joined<["--"], "gpu-max-threads-per-block=">,		def gpu_max_threads_per_block_EQ : Joined<["--"], "gpu-max-threads-per-block=">,
Flags<[CC1Option]>,		Flags<[CC1Option]>,
HelpText<"Default max threads per block for kernel launch bounds for HIP">,		HelpText<"Default max threads per block for kernel launch bounds for OpenMP/HIP">,
MarshallingInfoInt<LangOpts<"GPUMaxThreadsPerBlock">, "1024">,		MarshallingInfoInt<LangOpts<"GPUMaxThreadsPerBlock">, "1024">;
ShouldParseIf<hip.KeyPath>;
def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,		def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
Flags<[HelpHidden]>,		Flags<[HelpHidden]>,
HelpText<"Inline threshold for device compilation for CUDA/HIP">;		HelpText<"Inline threshold for device compilation for CUDA/HIP">;
def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,		def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
HelpText<"Instrument device library for HIP, which is a LLVM bitcode containing "		HelpText<"Instrument device library for HIP, which is a LLVM bitcode containing "
"__cyg_profile_func_enter and __cyg_profile_func_exit">;		"__cyg_profile_func_enter and __cyg_profile_func_exit">;
def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,		def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
HelpText<"Enable sanitizer for AMDGPU target">;		HelpText<"Enable sanitizer for AMDGPU target">;
▲ Show 20 Lines • Show All 6,051 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,543 Lines • ▼ Show 20 Lines	const bool IsHIPKernel =
M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();		M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();
const bool IsOpenMPkernel =		const bool IsOpenMPkernel =
M.getLangOpts().OpenMPIsDevice &&		M.getLangOpts().OpenMPIsDevice &&
(F->getCallingConv() == llvm::CallingConv::AMDGPU_KERNEL);		(F->getCallingConv() == llvm::CallingConv::AMDGPU_KERNEL);

// TODO: This should be moved to language specific attributes instead.		// TODO: This should be moved to language specific attributes instead.
if (IsHIPKernel \|\| IsOpenMPkernel)		if (IsHIPKernel \|\| IsOpenMPkernel)
F->addFnAttr("uniform-work-group-size", "true");		F->addFnAttr("uniform-work-group-size", "true");
		if (IsOpenMPkernel)
		arsenmUnsubmitted Not Done Reply Inline Actions Probably shouldn’t check the language, just it’s a kernel. Also shouldn’t emit this if it’s the default 1024. I’ve been trying to cut down on the superfluous attribute spam arsenm: Probably shouldn’t check the language, just it’s a kernel. Also shouldn’t emit this if it’s the…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions There's a section for HIP above that does the same. We could probably consolidate here for all "AMDGPU" kernels and get rid of the redundant attribute. Maybe in a separate patch? jhuber6: There's a section for HIP above that does the same. We could probably consolidate here for all…
		arsenmUnsubmitted Not Done Reply Inline Actions All the isCUDA \|\| HIP \|\| OpenMP checks scattered around are driving me crazy. A bunch of the out of tree divergent patches are just adding to them. We should just purge everything checking languages to the actual features and stop putting language names in things arsenm: All the isCUDA \|\| HIP \|\| OpenMP checks scattered around are driving me crazy. A bunch of the…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions OpenCL is the odd one out as far as I know, HIP and OpenMP are mostly equivalent as far as attributes go. jhuber6: OpenCL is the odd one out as far as I know, HIP and OpenMP are mostly equivalent as far as…
		yaxunlUnsubmitted Not Done Reply Inline Actions OpenCL uses 256 as default max block size. This is to avoid performance regressions for existing apps. HIP uses 1024 by default. yaxunl: OpenCL uses 256 as default max block size. This is to avoid performance regressions for…
		F->addFnAttr("amdgpu-flat-work-group-size",
		std::string("1,") +
		llvm::utostr(M.getLangOpts().GPUMaxThreadsPerBlock));

if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())		if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");		F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");

if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)		if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
F->addFnAttr("amdgpu-ieee", "false");		F->addFnAttr("amdgpu-ieee", "false");
}		}

▲ Show 20 Lines • Show All 2,942 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

Show First 20 Lines • Show All 782 Lines • ▼ Show 20 Lines	void ROCMToolChain::addClangTargetOptions(
// disable bitcode linking.		// disable bitcode linking.
if (DeviceOffloadingKind == Action::OFK_None &&		if (DeviceOffloadingKind == Action::OFK_None &&
DriverArgs.hasArg(options::OPT_nostdlib))		DriverArgs.hasArg(options::OPT_nostdlib))
return;		return;

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;

// Get the device name and canonicalize it		// Get the ndevice name and canonicalize it
		jdoerfertUnsubmitted Not Done Reply Inline Actions ^ jdoerfert: ^
const StringRef GpuArch = getGPUArch(DriverArgs);		const StringRef GpuArch = getGPUArch(DriverArgs);
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);
const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);		const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);
StringRef LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);		StringRef LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);
auto ABIVer = DeviceLibABIVersion::fromCodeObjectVersion(		auto ABIVer = DeviceLibABIVersion::fromCodeObjectVersion(
getAMDGPUCodeObjectVersion(getDriver(), DriverArgs));		getAMDGPUCodeObjectVersion(getDriver(), DriverArgs));
if (!RocmInstallation.checkCommonBitcodeLibs(CanonArch, LibDeviceFile,		if (!RocmInstallation.checkCommonBitcodeLibs(CanonArch, LibDeviceFile,
ABIVer))		ABIVer))
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void AMDGPUOpenMPToolChain::addClangTargetOptions(

assert(DeviceOffloadingKind == Action::OFK_OpenMP &&		assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");		"Only OpenMP offloading kinds are supported.");

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

		if (DriverArgs.hasArg(options::OPT_gpu_max_threads_per_block_EQ))
		CC1Args.push_back(DriverArgs.MakeArgString(
		"--gpu-max-threads-per-block=" +
		DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ,
		"1024")));
		yaxunlUnsubmitted Not Done Reply Inline Actions we should keep the default value in Options.td instead of having multiple copies at different places. save as below. yaxunl: we should keep the default value in Options.td instead of having multiple copies at different…

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;

for (auto BCFile : getDeviceLibs(DriverArgs)) {		for (auto BCFile : getDeviceLibs(DriverArgs)) {
CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"		CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"
: "-mlink-bitcode-file");		: "-mlink-bitcode-file");
CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));		CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));
}		}
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIPAMD.cpp

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	void HIPAMDToolChain::addClangTargetOptions(
if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,		if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))		options::OPT_fno_cuda_approx_transcendentals, false))
CC1Args.push_back("-fcuda-approx-transcendentals");		CC1Args.push_back("-fcuda-approx-transcendentals");

if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,		if (!DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
false))		false))
CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});		CC1Args.append({"-mllvm", "-amdgpu-internalize-symbols"});

StringRef MaxThreadsPerBlock =		if (DriverArgs.hasArg(options::OPT_gpu_max_threads_per_block_EQ))
DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ);		CC1Args.push_back(DriverArgs.MakeArgString(
if (!MaxThreadsPerBlock.empty()) {		"--gpu-max-threads-per-block=" +
std::string ArgStr =		DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ,
(Twine("--gpu-max-threads-per-block=") + MaxThreadsPerBlock).str();		"1024")));
CC1Args.push_back(DriverArgs.MakeArgStringRef(ArgStr));
}

CC1Args.push_back("-fcuda-allow-variadic-functions");		CC1Args.push_back("-fcuda-allow-variadic-functions");

// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.append({"-fvisibility=hidden"});		CC1Args.append({"-fvisibility=hidden"});
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 512 Lines • ▼ Show 20 Lines	static bool FixupInvocation(CompilerInvocation &Invocation,
if (Args.hasArg(OPT_hlsl_entrypoint) && !LangOpts.HLSL)		if (Args.hasArg(OPT_hlsl_entrypoint) && !LangOpts.HLSL)
Diags.Report(diag::err_drv_argument_not_allowed_with)		Diags.Report(diag::err_drv_argument_not_allowed_with)
<< "-hlsl-entry" << GetInputKindName(IK);		<< "-hlsl-entry" << GetInputKindName(IK);

if (Args.hasArg(OPT_fgpu_allow_device_init) && !LangOpts.HIP)		if (Args.hasArg(OPT_fgpu_allow_device_init) && !LangOpts.HIP)
Diags.Report(diag::warn_ignored_hip_only_option)		Diags.Report(diag::warn_ignored_hip_only_option)
<< Args.getLastArg(OPT_fgpu_allow_device_init)->getAsString(Args);		<< Args.getLastArg(OPT_fgpu_allow_device_init)->getAsString(Args);

if (Args.hasArg(OPT_gpu_max_threads_per_block_EQ) && !LangOpts.HIP)		if (Args.hasArg(OPT_gpu_max_threads_per_block_EQ) && !LangOpts.HIP &&
		!LangOpts.OpenMPIsDevice)
Diags.Report(diag::warn_ignored_hip_only_option)		Diags.Report(diag::warn_ignored_hip_only_option)
<< Args.getLastArg(OPT_gpu_max_threads_per_block_EQ)->getAsString(Args);		<< Args.getLastArg(OPT_gpu_max_threads_per_block_EQ)->getAsString(Args);

// When these options are used, the compiler is allowed to apply		// When these options are used, the compiler is allowed to apply
// optimizations that may affect the final result. For example		// optimizations that may affect the final result. For example
// (x+y)+z is transformed to x+(y+z) but may not give the same		// (x+y)+z is transformed to x+(y+z) but may not give the same
// final result; it's not value safe.		// final result; it's not value safe.
// Another example can be to simplify x/x to 1.0 but x could be 0.0, INF		// Another example can be to simplify x/x to 1.0 but x could be 0.0, INF
▲ Show 20 Lines • Show All 4,251 Lines • Show Last 20 Lines

clang/test/Driver/openmp-offload-gpu.c

	Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	//			//
	// Check that `-Xarch_device` works for OpenMP offloading.			// Check that `-Xarch_device` works for OpenMP offloading.
	//			//
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
	// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xarch_device -O3 %s 2>&1 \			// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xarch_device -O3 %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=XARCH-DEVICE %s			// RUN: \| FileCheck --check-prefix=XARCH-DEVICE %s
	// XARCH-DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-O3"			// XARCH-DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-O3"
	// XARCH-DEVICE-NOT: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-O3"			// XARCH-DEVICE-NOT: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-O3"

				//
				// Check that `--gpu-max-threads-per-block` works for AMDGPU OpenMP offloading.
				//
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp \
				// RUN: -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx1030 \
				// RUN: -nogpulib --gpu-max-threads-per-block=512 %s 2>&1 \
				// RUN: \| FileCheck --check-prefix=AMD-MAX-THREADS %s
				// AMD-MAX-THREADS: "-cc1" {{.*}} "--gpu-max-threads-per-block=512"
				// AMD-MAX-THREADS-SAME: "-fopenmp-is-device"

clang/test/OpenMP/amdgcn-attributes.cpp

// REQUIRES: amdgpu-registered-target		// REQUIRES: amdgpu-registered-target

// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc		// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=DEFAULT,ALL %s		// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=DEFAULT,ALL %s
		// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s --gpu-max-threads-per-block=512 -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=MAX-THREADS %s
// RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=CPU,ALL %s		// RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=CPU,ALL %s

// RUN: %clang_cc1 -menable-no-nans -mno-amdgpu-ieee -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=NOIEEE,ALL %s		// RUN: %clang_cc1 -menable-no-nans -mno-amdgpu-ieee -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=NOIEEE,ALL %s
// RUN: %clang_cc1 -munsafe-fp-atomics -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=UNSAFEATOMIC,ALL %s		// RUN: %clang_cc1 -munsafe-fp-atomics -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck -check-prefixes=UNSAFEATOMIC,ALL %s

// expected-no-diagnostics		// expected-no-diagnostics

#define N 100		#define N 100
Show All 14 Lines	#pragma omp target
return arr[0];		return arr[0];
}		}

int callable(int x) {		int callable(int x) {
// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1		// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1
return x + 1;		return x + 1;
}		}

// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" }		// MAX-THREADS: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,512" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" }
// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "amdgpu-ieee"="false" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
		// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "amdgpu-unsafe-fp-atomics"="true" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }

// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
		// MAX-THREADS: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }		// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" }
// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add 'amdgpu-flat-work-group-size' to OpenMP kernelsNeeds RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 491465

clang/include/clang/Basic/LangOptions.def

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/TargetInfo.cpp

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/HIPAMD.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/Driver/openmp-offload-gpu.c

clang/test/OpenMP/amdgcn-attributes.cpp

[OpenMP] Add 'amdgpu-flat-work-group-size' to OpenMP kernels
Needs RevisionPublic