This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
-
Options.td
-
ToolChain.h
-
lib/Driver/
-
Driver/
-
Driver.cpp
-
ToolChain.cpp
-
ToolChains/
-
AMDGPU.h
-
AMDGPU.cpp
-
AMDGPUOpenMP.cpp
-
Cuda.h
-
Cuda.cpp
-
test/Driver/
-
Driver/
-
Inputs/nvptx-arch/
-
nvptx-arch/
-
nvptx_arch_fail
-
nvptx_arch_sm_70
1/2
amdgpu-hip-system-arch.c
-
amdgpu-openmp-system-arch-fail.c
-
nvptx-cuda-system-arch.c

Differential D141051

[CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor
ClosedPublic

Authored by jhuber6 on Jan 5 2023, 6:35 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
tra
yaxunl
MaskRay

Commits

rG56ebfca4bcc0: [CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor

Summary

This patch adds basic support for --offload-arch=native to CUDA. This
is done using the nvptx-arch tool that was introduced previously. Some
of the logic for handling executing these tools was factored into a
common helper as well. This patch does not add support for OpenMP or the
"new" driver. That will be done later.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,790 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_query.sh.cpp
	4,940 ms	libcxx CI C++2b > llvm-libc++-shared-cfg-in.libcxx::clang_query.sh.cpp
	3,570 ms	libcxx CI Modules > llvm-libc++-shared-cfg-in.libcxx::clang_query.sh.cpp

Event Timeline

jhuber6 created this revision.Jan 5 2023, 6:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 6:35 AM

Herald added subscribers: kosarev, mattd, carlosgalvezp and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.Jan 5 2023, 6:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 6:35 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

jhuber6 retitled this revision from [CUDA]HIP] Add support for `--offload-arch=native` to CUDA and refactor to [CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor.Jan 5 2023, 7:17 AM

Harbormaster completed remote builds in B205892: Diff 486551.Jan 5 2023, 8:03 AM

jhuber6 added a child revision: D141078: [CUDA][HIP] Support '--offload-arch=native' for the new driver.Jan 5 2023, 10:45 AM

Change error to print canonical arch name from Triple.

Harbormaster completed remote builds in B206029: Diff 486739.Jan 5 2023, 8:18 PM

yaxunl added inline comments.Jan 9 2023, 10:40 AM

clang/test/Driver/amdgpu-hip-system-arch.c
25	comment incorrect?

jhuber6 added inline comments.Jan 9 2023, 10:40 AM

clang/test/Driver/amdgpu-hip-system-arch.c
25	Yes, thanks for catching that. I'll fix it.

Fix typo.

Harbormaster completed remote builds in B206589: Diff 487503.Jan 9 2023, 4:15 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Jan 10 2023, 10:32 AM

This revision was landed with ongoing or failed builds.Jan 11 2023, 8:30 AM

Closed by commit rG56ebfca4bcc0: [CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG56ebfca4bcc0: [CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor.

For reasons that aren't yet clear to me, this change is failing to compile when using gcc-7 and targeting 32-bit targets; the error is of the form

AMDGPU.cpp:773:10: error: could not convert ‘GPUArchs’ from ‘llvm::SmallVector<std::__cxx11::basic_string<char>, 1>’ to ‘llvm::Expected<llvm::SmallVector<std::__cxx11::basic_string<char> > >’
   return GPUArchs;

it's not (yet) clear to me whether this is specific to gcc-7 (which I realize is fairly old -- is it still supported?) or what -- investigating further.

EDIT: I am (so far) unable to repro the failure using the gcc-12 installed on my local Linux machine, so it may be specific to that compiler version -- I am going to try to install gcc-7 on my local machine to see if there's a reasonable workaround.

In D141051#4048456, @srj wrote:
For reasons that aren't yet clear to me, this change is failing to compile when using gcc-7 and targeting 32-bit targets; the error is of the form
AMDGPU.cpp:773:10: error: could not convert ‘GPUArchs’ from ‘llvm::SmallVector<std::__cxx11::basic_string<char>, 1>’ to ‘llvm::Expected<llvm::SmallVector<std::__cxx11::basic_string<char> > >’
   return GPUArchs;

Probably the older GCC doesn't handle the implicit copy elision to the expected type well and thinks that it's copied. I'll put an explicit move on it.

srj mentioned this in D139752: cmake: Enable 64bit off_t on 32bit glibc systems.Jan 12 2023, 12:36 PM

In D141051#4048456, @srj wrote:
For reasons that aren't yet clear to me, this change is failing to compile when using gcc-7 and targeting 32-bit targets; the error is of the form
AMDGPU.cpp:773:10: error: could not convert ‘GPUArchs’ from ‘llvm::SmallVector<std::__cxx11::basic_string<char>, 1>’ to ‘llvm::Expected<llvm::SmallVector<std::__cxx11::basic_string<char> > >’
   return GPUArchs;
it's not (yet) clear to me whether this is specific to gcc-7 (which I realize is fairly old -- is it still supported?) or what -- investigating further.

Let me know if rG26d62674cf50 solved it. I can't reproduce it on my system.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticDriverKinds.td

6 lines

Driver/

Options.td

2 lines

ToolChain.h

8 lines

lib/

Driver/

Driver.cpp

24 lines

ToolChain.cpp

33 lines

ToolChains/

9 lines

75 lines

17 lines

5 lines

25 lines

test/

Driver/

Inputs/

nvptx-arch/

nvptx_arch_fail

2 lines

nvptx_arch_sm_70

3 lines

amdgpu-hip-system-arch.c

27 lines

amdgpu-openmp-system-arch-fail.c

6 lines

nvptx-cuda-system-arch.c

27 lines

Diff 487503

clang/include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	def err_drv_no_hipspv_device_lib : Error<			def err_drv_no_hipspv_device_lib : Error<
	"cannot find HIP device library%select{\| for %1}0; provide its path via "			"cannot find HIP device library%select{\| for %1}0; provide its path via "
	"'--hip-path' or '--hip-device-lib-path', or pass '-nogpulib' to build "			"'--hip-path' or '--hip-device-lib-path', or pass '-nogpulib' to build "
	"without HIP device library">;			"without HIP device library">;
	def err_drv_hipspv_no_hip_path : Error<			def err_drv_hipspv_no_hip_path : Error<
	"'--hip-path' must be specified when offloading to "			"'--hip-path' must be specified when offloading to "
	"SPIR-V%select{\| unless %1 is given}0.">;			"SPIR-V%select{\| unless %1 is given}0.">;

	def err_drv_undetermined_amdgpu_arch : Error<			def err_drv_undetermined_gpu_arch : Error<
	"cannot determine AMDGPU architecture: %0; consider passing it via "			"cannot determine %0 architecture: %1; consider passing it via "
	"'--march'">;			"'%2'">;
	def err_drv_cuda_version_unsupported : Error<			def err_drv_cuda_version_unsupported : Error<
	"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "			"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "
	"but installation at %3 is %4; use '--cuda-path' to specify a different CUDA "			"but installation at %3 is %4; use '--cuda-path' to specify a different CUDA "
	"install, pass a different GPU arch with '--cuda-gpu-arch', or pass "			"install, pass a different GPU arch with '--cuda-gpu-arch', or pass "
	"'--no-cuda-version-check'">;			"'--no-cuda-version-check'">;
	def warn_drv_new_cuda_version: Warning<			def warn_drv_new_cuda_version: Warning<
	"CUDA version%0 is newer than the latest%select{\| partially}1 supported version %2">,			"CUDA version%0 is newer than the latest%select{\| partially}1 supported version %2">,
	InGroup<CudaUnknownVersion>;			InGroup<CudaUnknownVersion>;
	▲ Show 20 Lines • Show All 616 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 994 Lines • ▼ Show 20 Lines	def fgpu_default_stream_EQ : Joined<["-"], "fgpu-default-stream=">,
NormalizedValues<["Legacy", "PerThread"]>,		NormalizedValues<["Legacy", "PerThread"]>,
MarshallingInfoEnum<LangOpts<"GPUDefaultStream">, "Legacy">;		MarshallingInfoEnum<LangOpts<"GPUDefaultStream">, "Legacy">;
def rocm_path_EQ : Joined<["--"], "rocm-path=">, Group<i_Group>,		def rocm_path_EQ : Joined<["--"], "rocm-path=">, Group<i_Group>,
HelpText<"ROCm installation path, used for finding and automatically linking required bitcode libraries.">;		HelpText<"ROCm installation path, used for finding and automatically linking required bitcode libraries.">;
def hip_path_EQ : Joined<["--"], "hip-path=">, Group<i_Group>,		def hip_path_EQ : Joined<["--"], "hip-path=">, Group<i_Group>,
HelpText<"HIP runtime installation path, used for finding HIP version and adding HIP include path.">;		HelpText<"HIP runtime installation path, used for finding HIP version and adding HIP include path.">;
def amdgpu_arch_tool_EQ : Joined<["--"], "amdgpu-arch-tool=">, Group<i_Group>,		def amdgpu_arch_tool_EQ : Joined<["--"], "amdgpu-arch-tool=">, Group<i_Group>,
HelpText<"Tool used for detecting AMD GPU arch in the system.">;		HelpText<"Tool used for detecting AMD GPU arch in the system.">;
		def nvptx_arch_tool_EQ : Joined<["--"], "nvptx-arch-tool=">, Group<i_Group>,
		HelpText<"Tool used for detecting NVIDIA GPU arch in the system.">;
def rocm_device_lib_path_EQ : Joined<["--"], "rocm-device-lib-path=">, Group<Link_Group>,		def rocm_device_lib_path_EQ : Joined<["--"], "rocm-device-lib-path=">, Group<Link_Group>,
HelpText<"ROCm device library path. Alternative to rocm-path.">;		HelpText<"ROCm device library path. Alternative to rocm-path.">;
def : Joined<["--"], "hip-device-lib-path=">, Alias<rocm_device_lib_path_EQ>;		def : Joined<["--"], "hip-device-lib-path=">, Alias<rocm_device_lib_path_EQ>;
def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,		def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,
HelpText<"HIP device library">;		HelpText<"HIP device library">;
def hip_version_EQ : Joined<["--"], "hip-version=">,		def hip_version_EQ : Joined<["--"], "hip-version=">,
HelpText<"HIP version in the format of major.minor.patch">;		HelpText<"HIP version in the format of major.minor.patch">;
def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,		def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,
▲ Show 20 Lines • Show All 6,092 Lines • Show Last 20 Lines

clang/include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines

protected:		protected:
MultilibSet Multilibs;		MultilibSet Multilibs;
Multilib SelectedMultilib;		Multilib SelectedMultilib;

ToolChain(const Driver &D, const llvm::Triple &T,		ToolChain(const Driver &D, const llvm::Triple &T,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

		/// Executes the given \p Executable and returns the stdout.
		llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
		executeToolChainProgram(StringRef Executable) const;

void setTripleEnvironment(llvm::Triple::EnvironmentType Env);		void setTripleEnvironment(llvm::Triple::EnvironmentType Env);

virtual Tool *buildAssembler() const;		virtual Tool *buildAssembler() const;
virtual Tool *buildLinker() const;		virtual Tool *buildLinker() const;
virtual Tool *buildStaticLibTool() const;		virtual Tool *buildStaticLibTool() const;
virtual Tool *getTool(Action::ActionClass AC) const;		virtual Tool *getTool(Action::ActionClass AC) const;

virtual std::string buildCompilerRTBasename(const llvm::opt::ArgList &Args,		virtual std::string buildCompilerRTBasename(const llvm::opt::ArgList &Args,
▲ Show 20 Lines • Show All 496 Lines • ▼ Show 20 Lines	public:

/// AddFastMathRuntimeIfAvailable - If a runtime library exists that sets		/// AddFastMathRuntimeIfAvailable - If a runtime library exists that sets
/// global flags for unsafe floating point math, add it and return true.		/// global flags for unsafe floating point math, add it and return true.
///		///
/// This checks for presence of the -Ofast, -ffast-math or -funsafe-math flags.		/// This checks for presence of the -Ofast, -ffast-math or -funsafe-math flags.
bool addFastMathRuntimeIfAvailable(		bool addFastMathRuntimeIfAvailable(
const llvm::opt::ArgList &Args, llvm::opt::ArgStringList &CmdArgs) const;		const llvm::opt::ArgList &Args, llvm::opt::ArgStringList &CmdArgs) const;

		/// getSystemGPUArchs - Use a tool to detect the user's availible GPUs.
		virtual Expected<SmallVector<std::string>>
		getSystemGPUArchs(const llvm::opt::ArgList &Args) const;

/// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass		/// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass
/// a suitable profile runtime library to the linker.		/// a suitable profile runtime library to the linker.
virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,		virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const;		llvm::opt::ArgStringList &CmdArgs) const;

/// Add arguments to use system-specific CUDA includes.		/// Add arguments to use system-specific CUDA includes.
virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;		llvm::opt::ArgStringList &CC1Args) const;
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 3,072 Lines • ▼ Show 20 Lines	bool initialize() override {
A->getOption().matches(options::OPT_no_offload_arch_EQ)))		A->getOption().matches(options::OPT_no_offload_arch_EQ)))
continue;		continue;
A->claim();		A->claim();

for (StringRef ArchStr : llvm::split(A->getValue(), ",")) {		for (StringRef ArchStr : llvm::split(A->getValue(), ",")) {
if (A->getOption().matches(options::OPT_no_offload_arch_EQ) &&		if (A->getOption().matches(options::OPT_no_offload_arch_EQ) &&
ArchStr == "all") {		ArchStr == "all") {
GpuArchs.clear();		GpuArchs.clear();
} else if (ArchStr == "native" &&		} else if (ArchStr == "native") {
ToolChains.front()->getTriple().isAMDGPU()) {		const ToolChain &TC = *ToolChains.front();
auto TC = static_cast<const toolchains::HIPAMDToolChain >(		auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
ToolChains.front());		if (!GPUsOrErr) {
SmallVector<std::string, 1> GPUs;		TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
auto Err = TC->detectSystemGPUs(Args, GPUs);		<< llvm::Triple::getArchTypeName(TC.getArch())
if (!Err) {		<< llvm::toString(GPUsOrErr.takeError()) << "--offload-arch";
for (auto GPU : GPUs)		continue;
		}

		for (auto GPU : *GPUsOrErr) {
GpuArchs.insert(Args.MakeArgString(GPU));		GpuArchs.insert(Args.MakeArgString(GPU));
} else		}
llvm::consumeError(std::move(Err));
} else {		} else {
ArchStr = getCanonicalOffloadArch(ArchStr);		ArchStr = getCanonicalOffloadArch(ArchStr);
if (ArchStr.empty()) {		if (ArchStr.empty()) {
Error = true;		Error = true;
} else if (A->getOption().matches(options::OPT_offload_arch_EQ))		} else if (A->getOption().matches(options::OPT_offload_arch_EQ))
GpuArchs.insert(ArchStr);		GpuArchs.insert(ArchStr);
else if (A->getOption().matches(options::OPT_no_offload_arch_EQ))		else if (A->getOption().matches(options::OPT_no_offload_arch_EQ))
GpuArchs.erase(ArchStr);		GpuArchs.erase(ArchStr);
▲ Show 20 Lines • Show All 3,201 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChain.cpp

Show All 30 Lines
#include "llvm/MC/MCTargetOptions.h"		#include "llvm/MC/MCTargetOptions.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Option/Arg.h"		#include "llvm/Option/Arg.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Option/OptTable.h"		#include "llvm/Option/OptTable.h"
#include "llvm/Option/Option.h"		#include "llvm/Option/Option.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
		#include "llvm/Support/FileUtilities.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"
#include "llvm/Support/VersionTuple.h"		#include "llvm/Support/VersionTuple.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstring>		#include <cstring>
#include <string>		#include <string>
Show All 36 Lines	ToolChain::ToolChain(const Driver &D, const llvm::Triple &T,

for (const auto &Path : getRuntimePaths())		for (const auto &Path : getRuntimePaths())
addIfExists(getLibraryPaths(), Path);		addIfExists(getLibraryPaths(), Path);
for (const auto &Path : getStdlibPaths())		for (const auto &Path : getStdlibPaths())
addIfExists(getFilePaths(), Path);		addIfExists(getFilePaths(), Path);
addIfExists(getFilePaths(), getArchSpecificLibPath());		addIfExists(getFilePaths(), getArchSpecificLibPath());
}		}

		llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
		ToolChain::executeToolChainProgram(StringRef Executable) const {
		llvm::SmallString<64> OutputFile;
		llvm::sys::fs::createTemporaryFile("toolchain-program", "txt", OutputFile);
		llvm::FileRemover OutputRemover(OutputFile.c_str());
		std::optional<llvm::StringRef> Redirects[] = {
		{""},
		OutputFile.str(),
		{""},
		};

		std::string ErrorMessage;
		if (llvm::sys::ExecuteAndWait(Executable, {}, {}, Redirects,
		/* SecondsToWait */ 0,
		/MemoryLimit/ 0, &ErrorMessage))
		return llvm::createStringError(std::error_code(),
		Executable + ": " + ErrorMessage);

		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> OutputBuf =
		llvm::MemoryBuffer::getFile(OutputFile.c_str());
		if (!OutputBuf)
		return llvm::createStringError(OutputBuf.getError(),
		"Failed to read stdout of " + Executable +
		": " + OutputBuf.getError().message());
		return std::move(*OutputBuf);
		}

void ToolChain::setTripleEnvironment(llvm::Triple::EnvironmentType Env) {		void ToolChain::setTripleEnvironment(llvm::Triple::EnvironmentType Env) {
Triple.setEnvironment(Env);		Triple.setEnvironment(Env);
if (EffectiveTriple != llvm::Triple())		if (EffectiveTriple != llvm::Triple())
EffectiveTriple.setEnvironment(Env);		EffectiveTriple.setEnvironment(Env);
}		}

ToolChain::~ToolChain() = default;		ToolChain::~ToolChain() = default;

▲ Show 20 Lines • Show All 982 Lines • ▼ Show 20 Lines	bool ToolChain::addFastMathRuntimeIfAvailable(const ArgList &Args,
if (isFastMathRuntimeAvailable(Args, Path)) {		if (isFastMathRuntimeAvailable(Args, Path)) {
CmdArgs.push_back(Args.MakeArgString(Path));		CmdArgs.push_back(Args.MakeArgString(Path));
return true;		return true;
}		}

return false;		return false;
}		}

		Expected<SmallVector<std::string>>
		ToolChain::getSystemGPUArchs(const llvm::opt::ArgList &Args) const {
		return SmallVector<std::string>();
		}

SanitizerMask ToolChain::getSupportedSanitizers() const {		SanitizerMask ToolChain::getSupportedSanitizers() const {
// Return sanitizers which don't require runtime support and are not		// Return sanitizers which don't require runtime support and are not
// platform dependent.		// platform dependent.

SanitizerMask Res =		SanitizerMask Res =
(SanitizerKind::Undefined & ~SanitizerKind::Vptr &		(SanitizerKind::Undefined & ~SanitizerKind::Vptr &
~SanitizerKind::Function) \|		~SanitizerKind::Function) \|
(SanitizerKind::CFI & ~SanitizerKind::CFIICall) \|		(SanitizerKind::CFI & ~SanitizerKind::CFIICall) \|
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	public:
}		}

/// Needed for translating LTO options.		/// Needed for translating LTO options.
const char *getDefaultLinker() const override { return "ld.lld"; }		const char *getDefaultLinker() const override { return "ld.lld"; }

/// Should skip argument.		/// Should skip argument.
bool shouldSkipArgument(const llvm::opt::Arg *Arg) const;		bool shouldSkipArgument(const llvm::opt::Arg *Arg) const;

/// Uses amdgpu_arch tool to get arch of the system GPU. Will return error		/// Uses amdgpu-arch tool to get arch of the system GPU. Will return error
/// if unable to find one.		/// if unable to find one.
llvm::Error getSystemGPUArch(const llvm::opt::ArgList &Args,		virtual Expected<SmallVector<std::string>>
std::string &GPUArch) const;		getSystemGPUArchs(const llvm::opt::ArgList &Args) const override;

llvm::Error detectSystemGPUs(const llvm::opt::ArgList &Args,
SmallVector<std::string, 1> &GPUArchs) const;

protected:		protected:
/// Check and diagnose invalid target ID specified by -mcpu.		/// Check and diagnose invalid target ID specified by -mcpu.
virtual void checkTargetID(const llvm::opt::ArgList &DriverArgs) const;		virtual void checkTargetID(const llvm::opt::ArgList &DriverArgs) const;

/// The struct type returned by getParsedTargetID.		/// The struct type returned by getParsedTargetID.
struct ParsedTargetIDType {		struct ParsedTargetIDType {
std::optional<std::string> OptionalTargetID;		std::optional<std::string> OptionalTargetID;
Show All 35 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

Show All 10 Lines
#include "clang/Basic/TargetID.h"		#include "clang/Basic/TargetID.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/Distro.h"		#include "clang/Driver/Distro.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/InputInfo.h"		#include "clang/Driver/InputInfo.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/FileUtilities.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/LineIterator.h"		#include "llvm/Support/LineIterator.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include <optional>		#include <optional>
#include <system_error>		#include <system_error>

#define AMDGPU_ARCH_PROGRAM_NAME "amdgpu-arch"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

// Look for sub-directory starts with PackageName under ROCm candidate path.		// Look for sub-directory starts with PackageName under ROCm candidate path.
// If there is one and only one matching sub-directory found, append the		// If there is one and only one matching sub-directory found, append the
▲ Show 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	void AMDGPUToolChain::checkTargetID(
const llvm::opt::ArgList &DriverArgs) const {		const llvm::opt::ArgList &DriverArgs) const {
auto PTID = getParsedTargetID(DriverArgs);		auto PTID = getParsedTargetID(DriverArgs);
if (PTID.OptionalTargetID && !PTID.OptionalGPUArch) {		if (PTID.OptionalTargetID && !PTID.OptionalGPUArch) {
getDriver().Diag(clang::diag::err_drv_bad_target_id)		getDriver().Diag(clang::diag::err_drv_bad_target_id)
<< *PTID.OptionalTargetID;		<< *PTID.OptionalTargetID;
}		}
}		}

llvm::Error		Expected<SmallVector<std::string>>
AMDGPUToolChain::detectSystemGPUs(const ArgList &Args,		AMDGPUToolChain::getSystemGPUArchs(const ArgList &Args) const {
SmallVector<std::string, 1> &GPUArchs) const {		// Detect AMD GPUs availible on the system.
std::string Program;		std::string Program;
if (Arg *A = Args.getLastArg(options::OPT_amdgpu_arch_tool_EQ))		if (Arg *A = Args.getLastArg(options::OPT_amdgpu_arch_tool_EQ))
Program = A->getValue();		Program = A->getValue();
else		else
Program = GetProgramPath(AMDGPU_ARCH_PROGRAM_NAME);		Program = GetProgramPath("amdgpu-arch");
llvm::SmallString<64> OutputFile;
llvm::sys::fs::createTemporaryFile("print-system-gpus", "" /* No Suffix */,
OutputFile);
llvm::FileRemover OutputRemover(OutputFile.c_str());
std::optional<llvm::StringRef> Redirects[] = {
{""},
OutputFile.str(),
{""},
};

std::string ErrorMessage;
if (int Result = llvm::sys::ExecuteAndWait(
Program, {}, {}, Redirects, /* SecondsToWait */ 0,
/MemoryLimit/ 0, &ErrorMessage)) {
if (Result > 0) {
ErrorMessage = "Exited with error code " + std::to_string(Result);
} else if (Result == -1) {
ErrorMessage = "Execute failed: " + ErrorMessage;
} else {
ErrorMessage = "Crashed: " + ErrorMessage;
}

return llvm::createStringError(std::error_code(),		auto StdoutOrErr = executeToolChainProgram(Program);
Program + ": " + ErrorMessage);		if (!StdoutOrErr)
}		return StdoutOrErr.takeError();

llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> OutputBuf =
llvm::MemoryBuffer::getFile(OutputFile.c_str());
if (!OutputBuf) {
return llvm::createStringError(OutputBuf.getError(),
"Failed to read stdout of " + Program +
": " + OutputBuf.getError().message());
}

for (llvm::line_iterator LineIt(**OutputBuf); !LineIt.is_at_end(); ++LineIt) {
GPUArchs.push_back(LineIt->str());
}
return llvm::Error::success();
}

llvm::Error AMDGPUToolChain::getSystemGPUArch(const ArgList &Args,
std::string &GPUArch) const {
// detect the AMDGPU installed in system
SmallVector<std::string, 1> GPUArchs;		SmallVector<std::string, 1> GPUArchs;
auto Err = detectSystemGPUs(Args, GPUArchs);		for (StringRef Arch : llvm::split((*StdoutOrErr)->getBuffer(), "\n"))
if (Err) {		if (!Arch.empty())
return Err;		GPUArchs.push_back(Arch.str());
}
if (GPUArchs.empty()) {		if (GPUArchs.empty())
return llvm::createStringError(std::error_code(),		return llvm::createStringError(std::error_code(),
"No AMD GPU detected in the system");		"No AMD GPU detected in the system");
}
GPUArch = GPUArchs[0];		return GPUArchs;
if (GPUArchs.size() > 1) {
if (!llvm::all_equal(GPUArchs))
return llvm::createStringError(
std::error_code(), "Multiple AMD GPUs found with different archs");
}
return llvm::Error::success();
}		}

void ROCMToolChain::addClangTargetOptions(		void ROCMToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,		AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,
DeviceOffloadingKind);		DeviceOffloadingKind);

▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

	Show All 27 Lines
	using namespace clang::driver::tools;			using namespace clang::driver::tools;
	using namespace clang;			using namespace clang;
	using namespace llvm::opt;			using namespace llvm::opt;

	namespace {			namespace {

	static bool checkSystemForAMDGPU(const ArgList &Args, const AMDGPUToolChain &TC,			static bool checkSystemForAMDGPU(const ArgList &Args, const AMDGPUToolChain &TC,
	std::string &GPUArch) {			std::string &GPUArch) {
	if (auto Err = TC.getSystemGPUArch(Args, GPUArch)) {			auto CheckError = [&](llvm::Error Err) -> bool {
	std::string ErrMsg =			std::string ErrMsg =
	llvm::formatv("{0}", llvm::fmt_consume(std::move(Err)));			llvm::formatv("{0}", llvm::fmt_consume(std::move(Err)));
	TC.getDriver().Diag(diag::err_drv_undetermined_amdgpu_arch) << ErrMsg;			TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
				<< llvm::Triple::getArchTypeName(TC.getArch()) << ErrMsg << "-march";
	return false;			return false;
	}			};

				auto ArchsOrErr = TC.getSystemGPUArchs(Args);
				if (!ArchsOrErr)
				return CheckError(ArchsOrErr.takeError());

				if (ArchsOrErr->size() > 1)
				if (!llvm::all_equal(*ArchsOrErr))
				return CheckError(llvm::createStringError(
				std::error_code(), "Multiple AMD GPUs found with different archs"));

				GPUArch = ArchsOrErr->front();
	return true;			return true;
	}			}
	} // namespace			} // namespace

	AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,			AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
	const llvm::Triple &Triple,			const llvm::Triple &Triple,
	const ToolChain &HostTC,			const ToolChain &HostTC,
	const ArgList &Args)			const ArgList &Args)
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	public:

unsigned GetDefaultDwarfVersion() const override { return 2; }		unsigned GetDefaultDwarfVersion() const override { return 2; }
// NVPTX supports only DWARF2.		// NVPTX supports only DWARF2.
unsigned getMaxDwarfVersion() const override { return 2; }		unsigned getMaxDwarfVersion() const override { return 2; }

const ToolChain &HostTC;		const ToolChain &HostTC;
CudaInstallationDetector CudaInstallation;		CudaInstallationDetector CudaInstallation;

		/// Uses nvptx-arch tool to get arch of the system GPU. Will return error
		/// if unable to find one.
		virtual Expected<SmallVector<std::string>>
		getSystemGPUArchs(const llvm::opt::ArgList &Args) const override;

protected:		protected:
Tool *buildAssembler() const override; // ptxas		Tool *buildAssembler() const override; // ptxas
Tool *buildLinker() const override; // fatbinary (ok, not really a linker)		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 779 Lines • ▼ Show 20 Lines	CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,

if (!BoundArch.empty()) {		if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);		DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
}		}
return DAL;		return DAL;
}		}

		Expected<SmallVector<std::string>>
		CudaToolChain::getSystemGPUArchs(const ArgList &Args) const {
		// Detect NVIDIA GPUs availible on the system.
		std::string Program;
		if (Arg *A = Args.getLastArg(options::OPT_nvptx_arch_tool_EQ))
		Program = A->getValue();
		else
		Program = GetProgramPath("nvptx-arch");

		auto StdoutOrErr = executeToolChainProgram(Program);
		if (!StdoutOrErr)
		return StdoutOrErr.takeError();

		SmallVector<std::string, 1> GPUArchs;
		for (StringRef Arch : llvm::split((*StdoutOrErr)->getBuffer(), "\n"))
		if (!Arch.empty())
		GPUArchs.push_back(Arch.str());

		if (GPUArchs.empty())
		return llvm::createStringError(std::error_code(),
		"No NVIDIA GPU detected in the system");

		return GPUArchs;
		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
return new tools::NVPTX::Linker(*this);		return new tools::NVPTX::Linker(*this);
}		}

▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

clang/test/Driver/Inputs/nvptx-arch/nvptx_arch_fail

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				exit 1

clang/test/Driver/Inputs/nvptx-arch/nvptx_arch_sm_70

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				echo sm_70
				exit 0

clang/test/Driver/amdgpu-hip-system-arch.c

This file was added.

				// REQUIRES: system-linux
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target
				// REQUIRES: shell

				// RUN: mkdir -p %t
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_fail %t/
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_gfx906 %t/
				// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty
				// RUN: chmod +x %t/amdgpu_arch_fail
				// RUN: chmod +x %t/amdgpu_arch_gfx906
				// RUN: chmod +x %t/amdgpu_arch_empty

				// case when amdgpu-arch returns nothing or fails
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --amdgpu-arch-tool=%t/amdgpu_arch_fail -x hip %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR
				// NO-OUTPUT-ERROR: error: cannot determine amdgcn architecture{{.*}}; consider passing it via '--offload-arch'

				// case when amdgpu-arch does not return anything with successful execution
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --amdgpu-arch-tool=%t/amdgpu_arch_empty -x hip %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT
				// EMPTY-OUTPUT: error: cannot determine amdgcn architecture: No AMD GPU detected in the system; consider passing it via '--offload-arch'

				// case when amdgpu-arch returns a gfx906 GPU.
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --amdgpu-arch-tool=%t/amdgpu_arch_gfx906 -x hip %s 2>&1 \
				yaxunlUnsubmitted Not Done Reply Inline Actions comment incorrect? yaxunl: comment incorrect?
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Yes, thanks for catching that. I'll fix it. jhuber6: Yes, thanks for catching that. I'll fix it.
				// RUN: \| FileCheck %s --check-prefix=ARCH-GFX906
				// ARCH-GFX906: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx906"

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

	Show All 9 Lines
	// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty			// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty
	// RUN: chmod +x %t/amdgpu_arch_fail			// RUN: chmod +x %t/amdgpu_arch_fail
	// RUN: chmod +x %t/amdgpu_arch_different			// RUN: chmod +x %t/amdgpu_arch_different
	// RUN: chmod +x %t/amdgpu_arch_empty			// RUN: chmod +x %t/amdgpu_arch_empty

	// case when amdgpu_arch returns nothing or fails			// case when amdgpu_arch returns nothing or fails
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_fail %s 2>&1 \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_fail %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR			// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR
	// NO-OUTPUT-ERROR: error: cannot determine AMDGPU architecture{{.*}}Exited with error code 1; consider passing it via '--march'			// NO-OUTPUT-ERROR: error: cannot determine amdgcn architecture{{.*}}; consider passing it via '-march'

	// case when amdgpu_arch returns multiple gpus but all are different			// case when amdgpu_arch returns multiple gpus but all are different
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_different %s 2>&1 \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_different %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=MULTIPLE-OUTPUT-ERROR			// RUN: \| FileCheck %s --check-prefix=MULTIPLE-OUTPUT-ERROR
	// MULTIPLE-OUTPUT-ERROR: error: cannot determine AMDGPU architecture: Multiple AMD GPUs found with different archs; consider passing it via '--march'			// MULTIPLE-OUTPUT-ERROR: error: cannot determine amdgcn architecture: Multiple AMD GPUs found with different archs; consider passing it via '-march'

	// case when amdgpu_arch does not return anything with successful execution			// case when amdgpu_arch does not return anything with successful execution
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_empty %s 2>&1 \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_empty %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT			// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT
	// EMPTY-OUTPUT: error: cannot determine AMDGPU architecture: No AMD GPU detected in the system; consider passing it via '--march'			// EMPTY-OUTPUT: error: cannot determine amdgcn architecture: No AMD GPU detected in the system; consider passing it via '-march'

clang/test/Driver/nvptx-cuda-system-arch.c

This file was added.

				// REQUIRES: system-linux
				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target
				// REQUIRES: shell

				// RUN: mkdir -p %t
				// RUN: cp %S/Inputs/nvptx-arch/nvptx_arch_fail %t/
				// RUN: cp %S/Inputs/nvptx-arch/nvptx_arch_sm_70 %t/
				// RUN: echo '#!/bin/sh' > %t/nvptx_arch_empty
				// RUN: chmod +x %t/nvptx_arch_fail
				// RUN: chmod +x %t/nvptx_arch_sm_70
				// RUN: chmod +x %t/nvptx_arch_empty

				// case when nvptx-arch returns nothing or fails
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --nvptx-arch-tool=%t/nvptx_arch_fail -x cuda %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR
				// NO-OUTPUT-ERROR: error: cannot determine nvptx64 architecture{{.*}}; consider passing it via '--offload-arch'

				// case when nvptx-arch does not return anything with successful execution
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --nvptx-arch-tool=%t/nvptx_arch_empty -x cuda %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT
				// EMPTY-OUTPUT: error: cannot determine nvptx64 architecture: No NVIDIA GPU detected in the system; consider passing it via '--offload-arch'

				// case when nvptx-arch does not return anything with successful execution
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpulib --offload-arch=native --nvptx-arch-tool=%t/nvptx_arch_sm_70 -x cuda %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=ARCH-sm_70
				// ARCH-sm_70: "-cc1" "-triple" "nvptx64-nvidia-cuda"{{.*}}"-target-cpu" "sm_70"

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactorClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 487503

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Driver/Options.td

clang/include/clang/Driver/ToolChain.h

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChain.cpp

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/Cuda.h

clang/lib/Driver/ToolChains/Cuda.cpp

clang/test/Driver/Inputs/nvptx-arch/nvptx_arch_fail

clang/test/Driver/Inputs/nvptx-arch/nvptx_arch_sm_70

clang/test/Driver/amdgpu-hip-system-arch.c

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

clang/test/Driver/nvptx-cuda-system-arch.c

[CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor
ClosedPublic