This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.
ClosedPublic

Authored by tra on Apr 10 2018, 10:55 AM.

Download Raw Diff

Details

Reviewers

Commits

rGdde3dc27ee71: [CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.
rL329737: [CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.
rC329737: [CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.

Summary

Currently we always include PTX into the fatbin along
with the GPU code. It about doubles the size of the GPU binary
we need to carry in the executable. These options allow control
inclusion of PTX into GPU binary.

This patch does not change the defaults, though we may consider
making no-PTX the default in the future.

Diff Detail

Build Status

Buildable 16941
Build 16941: arc lint + arc unit

Event Timeline

tra created this revision.Apr 10 2018, 10:55 AM

Herald added a subscriber: sanjoy. · View Herald TranscriptApr 10 2018, 10:55 AM

Harbormaster completed remote builds in B16941: Diff 141877.Apr 10 2018, 10:55 AM

Where do we document the default values for these flags? Like, how is a user supposed to figure out whether the default is to include ptx or not?

This revision is now accepted and ready to land.Apr 10 2018, 11:06 AM

Documented new options in ClangCommandLineReference.rst

In D45495#1063370, @jlebar wrote:

Where do we document the default values for these flags? Like, how is a user supposed to figure out whether the default is to include ptx or not?

I've updated docs/ClangCommandLineReference.rst.

lgtm!

Closed by commit rC329737: [CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option. (authored by tra). · Explain WhyApr 10 2018, 11:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

Options.td

4 lines

lib/

Driver/

ToolChains/

Cuda.cpp

19 lines

test/

Driver/

cuda-options.cu

51 lines

Diff 141877

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines
	def cuda_device_only : Flag<["--"], "cuda-device-only">,			def cuda_device_only : Flag<["--"], "cuda-device-only">,
	HelpText<"Compile CUDA code for device only">;			HelpText<"Compile CUDA code for device only">;
	def cuda_host_only : Flag<["--"], "cuda-host-only">,			def cuda_host_only : Flag<["--"], "cuda-host-only">,
	HelpText<"Compile CUDA code for host only. Has no effect on non-CUDA "			HelpText<"Compile CUDA code for host only. Has no effect on non-CUDA "
	"compilations.">;			"compilations.">;
	def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,			def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,
	HelpText<"Compile CUDA code for both host and device (default). Has no "			HelpText<"Compile CUDA code for both host and device (default). Has no "
	"effect on non-CUDA compilations.">;			"effect on non-CUDA compilations.">;
				def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,
				HelpText<"Include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
				def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,
				HelpText<"Do not include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,			def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,
	HelpText<"CUDA GPU architecture (e.g. sm_35). May be specified more than once.">;			HelpText<"CUDA GPU architecture (e.g. sm_35). May be specified more than once.">;
	def no_cuda_gpu_arch_EQ : Joined<["--"], "no-cuda-gpu-arch=">, Flags<[DriverOption]>,			def no_cuda_gpu_arch_EQ : Joined<["--"], "no-cuda-gpu-arch=">, Flags<[DriverOption]>,
	HelpText<"Remove GPU architecture (e.g. sm_35) from the list of GPUs to compile for. "			HelpText<"Remove GPU architecture (e.g. sm_35) from the list of GPUs to compile for. "
	"'all' resets the list to its default value.">;			"'all' resets the list to its default value.">;
	def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,			def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,
	HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;			HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
	def no_cuda_version_check : Flag<["--"], "no-cuda-version-check">,			def no_cuda_version_check : Flag<["--"], "no-cuda-version-check">,
	▲ Show 20 Lines • Show All 2,311 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
const char *Exec;		const char *Exec;
if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))		if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))
Exec = A->getValue();		Exec = A->getValue();
else		else
Exec = Args.MakeArgString(TC.GetProgramPath("ptxas"));		Exec = Args.MakeArgString(TC.GetProgramPath("ptxas"));
C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));
}		}

		static bool shouldIncludePTX(const ArgList &Args, const char *gpu_arch) {
		bool includePTX = true;
		for (Arg *A : Args) {
		if (!(A->getOption().matches(options::OPT_cuda_include_ptx_EQ) \|\|
		A->getOption().matches(options::OPT_no_cuda_include_ptx_EQ)))
		continue;
		A->claim();
		const StringRef ArchStr = A->getValue();
		if (ArchStr == "all" \|\| ArchStr == gpu_arch) {
		includePTX = A->getOption().matches(options::OPT_cuda_include_ptx_EQ);
		continue;
		}
		}
		return includePTX;
		}

// All inputs to this linker must be from CudaDeviceActions, as we need to look		// All inputs to this linker must be from CudaDeviceActions, as we need to look
// at the Inputs' Actions in order to figure out which GPU architecture they		// at the Inputs' Actions in order to figure out which GPU architecture they
// correspond to.		// correspond to.
void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
Show All 11 Lines	for (const auto& II : Inputs) {
auto *A = II.getAction();		auto *A = II.getAction();
assert(A->getInputs().size() == 1 &&		assert(A->getInputs().size() == 1 &&
"Device offload action is expected to have a single input");		"Device offload action is expected to have a single input");
const char *gpu_arch_str = A->getOffloadingArch();		const char *gpu_arch_str = A->getOffloadingArch();
assert(gpu_arch_str &&		assert(gpu_arch_str &&
"Device action expected to have associated a GPU architecture!");		"Device action expected to have associated a GPU architecture!");
CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);		CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);

		if (II.getType() == types::TY_PP_Asm &&
		!shouldIncludePTX(Args, gpu_arch_str))
		continue;
// We need to pass an Arch of the form "sm_XX" for cubin files and		// We need to pass an Arch of the form "sm_XX" for cubin files and
// "compute_XX" for ptx.		// "compute_XX" for ptx.
const char *Arch =		const char *Arch =
(II.getType() == types::TY_PP_Asm)		(II.getType() == types::TY_PP_Asm)
? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))		? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
: gpu_arch_str;		: gpu_arch_str;
CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +		CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
Arch + ",file=" + II.getFilename()));		Arch + ",file=" + II.getFilename()));
▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

clang/test/Driver/cuda-options.cu

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	// RUN: \| FileCheck -check-prefixes NOARCH-SM20,NOARCH-SM30,ARCH-SM35 %s			// RUN: \| FileCheck -check-prefixes NOARCH-SM20,NOARCH-SM30,ARCH-SM35 %s

	// g) There's no --cuda-gpu-arch=all			// g) There's no --cuda-gpu-arch=all
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-device-only \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-device-only \
	// RUN: --cuda-gpu-arch=all \			// RUN: --cuda-gpu-arch=all \
	// RUN: -c %s 2>&1 \			// RUN: -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix ARCHALLERROR %s			// RUN: \| FileCheck -check-prefix ARCHALLERROR %s


				// Verify that --[no-]cuda-include-ptx arguments are handled correctly.
				// a) by default we're including PTX for all GPUs.
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,PTX-SM30 %s

				// b) --no-cuda-include-ptx=all disables PTX inclusion for all GPUs
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: --no-cuda-include-ptx=all \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,NOPTX-SM35,NOPTX-SM30 %s

				// c) --no-cuda-include-ptx=sm_XX disables PTX inclusion for that GPU only.
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: --no-cuda-include-ptx=sm_35 \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,NOPTX-SM35,PTX-SM30 %s
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: --no-cuda-include-ptx=sm_30 \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,NOPTX-SM30 %s

				// d) --cuda-include-ptx=all overrides preceding --no-cuda-include-ptx=all
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: --no-cuda-include-ptx=all --cuda-include-ptx=all \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,PTX-SM30 %s

				// e) --cuda-include-ptx=all overrides preceding --no-cuda-include-ptx=sm_XX
				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
				// RUN: --no-cuda-include-ptx=sm_30 --cuda-include-ptx=all \
				// RUN: -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,PTX-SM30 %s


	// ARCH-SM20: "-cc1"{{.*}}"-target-cpu" "sm_20"			// ARCH-SM20: "-cc1"{{.*}}"-target-cpu" "sm_20"
	// NOARCH-SM20-NOT: "-cc1"{{.*}}"-target-cpu" "sm_20"			// NOARCH-SM20-NOT: "-cc1"{{.*}}"-target-cpu" "sm_20"
	// ARCH-SM30: "-cc1"{{.*}}"-target-cpu" "sm_30"			// ARCH-SM30: "-cc1"{{.*}}"-target-cpu" "sm_30"
	// NOARCH-SM30-NOT: "-cc1"{{.*}}"-target-cpu" "sm_30"			// NOARCH-SM30-NOT: "-cc1"{{.*}}"-target-cpu" "sm_30"
	// ARCH-SM35: "-cc1"{{.*}}"-target-cpu" "sm_35"			// ARCH-SM35: "-cc1"{{.*}}"-target-cpu" "sm_35"
	// NOARCH-SM35-NOT: "-cc1"{{.*}}"-target-cpu" "sm_35"			// NOARCH-SM35-NOT: "-cc1"{{.*}}"-target-cpu" "sm_35"
	// ARCHALLERROR: error: Unsupported CUDA gpu architecture: all			// ARCHALLERROR: error: Unsupported CUDA gpu architecture: all

	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	// NOHOST-NOT: "-x" "cuda"			// NOHOST-NOT: "-x" "cuda"

	// Match linker.			// Match linker.
	// LINK: "{{.*}}{{ld\|link}}{{(.exe)?}}"			// LINK: "{{.*}}{{ld\|link}}{{(.exe)?}}"
	// LINK-SAME: "[[HOSTOUTPUT]]"			// LINK-SAME: "[[HOSTOUTPUT]]"

	// Match no linker.			// Match no linker.
	// NOLINK-NOT: "{{.*}}{{ld\|link}}{{(.exe)?}}"			// NOLINK-NOT: "{{.*}}{{ld\|link}}{{(.exe)?}}"

				// FATBIN-COMMON:fatbinary
				// FATBIN-COMMON: "--create" "[[FATBINARY:[^"]*]]"
				// FATBIN-COMMON: "--image=profile=sm_30,file=
				// PTX-SM30: "--image=profile=compute_30,file=
				// NOPTX-SM30-NOT: "--image=profile=compute_30,file=
				// FATBIN-COMMON: "--image=profile=sm_35,file=
				// PTX-SM35: "--image=profile=compute_35,file=
				// NOPTX-SM35-NOT: "--image=profile=compute_35,file=

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 141877

clang/include/clang/Driver/Options.td

clang/lib/Driver/ToolChains/Cuda.cpp

clang/test/Driver/cuda-options.cu

[CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.
ClosedPublic