This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
-
ToolChain.h
-
lib/Driver/
-
Driver/
-
Action.cpp
-
Driver.cpp
-
ToolChains/
-
AMDGPUOpenMP.h
-
AMDGPUOpenMP.cpp
-
Clang.cpp
-
Cuda.h
-
Cuda.cpp
-
test/Driver/
-
Driver/
-
amdgpu-openmp-system-arch-fail.c
-
amdgpu-openmp-toolchain.c
-
hip-rdc-device-only.hip
-
hip-toolchain-rdc-separate.hip
-
openmp-offload-multi.c
-
tools/clang-offload-wrapper/
-
clang-offload-wrapper/
-
ClangOffloadWrapper.cpp
-
openmp/libomptarget/
-
libomptarget/
-
include/
-
omptarget.h
-
src/
-
exports
-
interface.cpp
2/2
rtl.cpp

Differential D106870

[OpenMP] Multi architecture compilation support
Needs ReviewPublic

Authored by saiislam on Jul 27 2021, 6:00 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
yaxunl
JonChesterfield
RaviNarayanaswamy
ye-luo
ronlieb
pdhaliwal

Summary

Multiple offloading targets can now be specified in the command
line. An instance of toolchain is created for each unique
combination of Target Triple and Target GPU. Device runtime has
been modified to support binaries containing multiple images,
each for a different target.
Data structure "__tgt_image_info" defined in
"llvm-project/openmp/libomptarget/include/omptarget.h" is used
to pass requirements of each image. E.g. GPU name like gfx906,
sm35, etc are the requirements of the image, which is produced
by clang-offload-wrapper and read by device RTL.

Example:

clang  -O2  -target x86_64-pc-linux-gnu -fopenmp \
  -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa \
  -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 \
  -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 \
 helloworld.c -o helloworld

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	180 ms	x64 debian > Clang.Driver::amdgpu-openmp-system-arch.c
	230 ms	x64 debian > Clang.Driver::cuda-unused-arg-warning.cu
	130 ms	x64 debian > Clang.Driver::openmp-offload-gpu.c
	1,690 ms	x64 debian > Clang.Driver::openmp-offload-multi.c
	100 ms	x64 debian > Clang.Driver::openmp-offload.c
		View Full Test Results (27 Failed)

Event Timeline

saiislam created this revision.Jul 27 2021, 6:00 AM

Herald added subscribers: kerbowa, pengfei, guansong and 2 others. · View Herald TranscriptJul 27 2021, 6:00 AM

saiislam requested review of this revision.Jul 27 2021, 6:00 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 27 2021, 6:00 AM

Herald added subscribers: openmp-commits, cfe-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B116409: Diff 362002.Jul 27 2021, 6:41 AM

-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa seems burdensome. Could you just count how many -Xopenmp-target=amdgcn-amd-amdhsa there are on the comand line and then count the unique ones?

to me -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64 makes sense.
-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa is not optimal.

In D106870#2907252, @ye-luo wrote:

-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa seems burdensome. Could you just count how many -Xopenmp-target=amdgcn-amd-amdhsa there are on the comand line and then count the unique ones?

I have a patch in pipeline which will eliminate need of (-fopenmp-targets, -Xopenmp-target, and -march) altogether. User will be able to compile with just "--offload-arch=gfx906" instead of using the other three flags.
It is working in our downstream AOMP Compiler but I haven't posted a phab review yet.

saiislam added a reviewer: ye-luo.Jul 27 2021, 6:53 AM

saiislam added a subscriber: ronlieb.

saiislam added inline comments.Jul 27 2021, 6:59 AM

openmp/libomptarget/src/rtl.cpp
306	Call to amdgpu-arch binary is going to be replaced with call to a new library named OffloadArch. It will return current GPU name along with enabled GPU features (i.e. requirements) in a platform-independent way. As the library and its various functionalities are self-contained I decided to post it is a separate review and use amdgpu-arch here for demonstration. I will be posting the phab review for the library soon.

In D106870#2907257, @saiislam wrote:

In D106870#2907252, @ye-luo wrote:

-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa seems burdensome. Could you just count how many -Xopenmp-target=amdgcn-amd-amdhsa there are on the comand line and then count the unique ones?

I have a patch in pipeline which will eliminate need of (-fopenmp-targets, -Xopenmp-target, and -march) altogether. User will be able to compile with just "--offload-arch=gfx906" instead of using the other three flags.
It is working in our downstream AOMP Compiler but I haven't posted a phab review yet.

That is just a convenient option and separate topic. I'm commenting on the current generic option you are fiddle with.

There seems to be a bunch of different things in this patch.

There's some driver plumbing to compile for more than one arch (presumably by calling the target compiler N times). That's a great feature, I want to build an application that can run on nvptx or amdgpu. Probably need a test case showing that combination.

Then there's a bunch of stuff to do with 'requirements', but it's not clear what that is.

Finally there's some stuff where libomptarget dlopens itself then spawns amdgpu-arch. I can't tell why we would want to do that.

My guess was that each arch would get its own section in the host executable containing a code object and each host plugin would be responsible for indicating whether it could do anything with a given code object. That should work out of the box for machines with only one offloading arch available, and need some work around device_id to handle multiple ones.

saiislam added inline comments.Jul 28 2021, 8:01 AM

openmp/libomptarget/src/rtl.cpp
306	Here is the patch for the OffloadArch library: D106960

sandoval added a subscriber: sandoval.Jul 28 2021, 8:56 PM

saiislam mentioned this in D93525: [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives.Aug 18 2021, 5:46 AM

I think this patch needs to split up into a large number of much smaller pieces.

Spent some time reading through this. I think the idea is to create a host binary that contains code objects for multiple variants of amdgpu - e.g. one that runs on gfx906 and another on gfx908, or one that runs on gfx906-xnack+ and another on gfx906-xnack-.

That's close to the long running feature request to compile a program to a binary that can run on totally different architectures, e.g. nvptx + amdgpu + vgpu + remote. Probably in the first instance making one binary that can run on whatever and then extending it to run on a system that has multiple targets available. I've got a nvptx / amdgpu box here that would be well suited to testing that. Tagging Ron and Pushpinder who may be interested in such.

Can you document the device binary embedding scheme for multiple GPU's in clang documentation? This will help tool developers to develop tools to extract device binaries from executables or shared libraries. Also this may help interoperability with other offloading language modes in case multiple offloading are desired to be supported in one executable or shared library in the future.

saiislam mentioned this in D110083: [clang-offload-bundler][docs][NFC] Add archive unbundling documentation.Sep 20 2021, 12:20 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticDriverKinds.td

3 lines

Driver/

ToolChain.h

9 lines

lib/

Driver/

Action.cpp

18 lines

Driver.cpp

330 lines

ToolChains/

4 lines

15 lines

101 lines

4 lines

25 lines

test/

Driver/

amdgpu-openmp-system-arch-fail.c

9 lines

amdgpu-openmp-toolchain.c

22 lines

hip-rdc-device-only.hip

8 lines

hip-toolchain-rdc-separate.hip

12 lines

openmp-offload-multi.c

34 lines

tools/

clang-offload-wrapper/

ClangOffloadWrapper.cpp

93 lines

openmp/

libomptarget/

include/

omptarget.h

45 lines

src/

exports

1 line

interface.cpp

28 lines

rtl.cpp

152 lines

Diff 362002

clang/include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	def err_drv_command_failure : Error<			def err_drv_command_failure : Error<
	"unable to execute command: %0">;			"unable to execute command: %0">;
	def err_drv_invalid_darwin_version : Error<			def err_drv_invalid_darwin_version : Error<
	"invalid Darwin version number: %0">;			"invalid Darwin version number: %0">;
	def err_drv_invalid_diagnotics_hotness_threshold : Error<			def err_drv_invalid_diagnotics_hotness_threshold : Error<
	"invalid argument in '%0', only integer or 'auto' is supported">;			"invalid argument in '%0', only integer or 'auto' is supported">;
	def err_drv_missing_argument : Error<			def err_drv_missing_argument : Error<
	"argument to '%0' is missing (expected %1 value%s1)">;			"argument to '%0' is missing (expected %1 value%s1)">;
				def err_drv_missing_Xopenmptarget_or_march: Error<
				"The option -fopenmp-targets= requires additional options -Xopenmp-target= and -march= .">,
				DefaultFatal;
	def err_drv_invalid_Xarch_argument_with_args : Error<			def err_drv_invalid_Xarch_argument_with_args : Error<
	"invalid Xarch argument: '%0', options requiring arguments are unsupported">;			"invalid Xarch argument: '%0', options requiring arguments are unsupported">;
	def err_drv_Xopenmp_target_missing_triple : Error<			def err_drv_Xopenmp_target_missing_triple : Error<
	"cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=<triple>">;			"cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=<triple>">;
	def err_drv_invalid_Xopenmp_target_with_args : Error<			def err_drv_invalid_Xopenmp_target_with_args : Error<
	"invalid -Xopenmp-target argument: '%0', options requiring arguments are unsupported">;			"invalid -Xopenmp-target argument: '%0', options requiring arguments are unsupported">;
	def err_drv_argument_only_allowed_with : Error<			def err_drv_argument_only_allowed_with : Error<
	"invalid argument '%0' only allowed with '%1'">;			"invalid argument '%0' only allowed with '%1'">;
	▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

clang/include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	private:
void setEffectiveTriple(llvm::Triple ET) const {		void setEffectiveTriple(llvm::Triple ET) const {
EffectiveTriple = std::move(ET);		EffectiveTriple = std::move(ET);
}		}

mutable llvm::Optional<CXXStdlibType> cxxStdlibType;		mutable llvm::Optional<CXXStdlibType> cxxStdlibType;
mutable llvm::Optional<RuntimeLibType> runtimeLibType;		mutable llvm::Optional<RuntimeLibType> runtimeLibType;
mutable llvm::Optional<UnwindLibType> unwindLibType;		mutable llvm::Optional<UnwindLibType> unwindLibType;

		// OpenMP creates a toolchain for each target arch. eg - gfx908
		std::string OffloadArch;

protected:		protected:
MultilibSet Multilibs;		MultilibSet Multilibs;
Multilib SelectedMultilib;		Multilib SelectedMultilib;

ToolChain(const Driver &D, const llvm::Triple &T,		ToolChain(const Driver &D, const llvm::Triple &T,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

void setTripleEnvironment(llvm::Triple::EnvironmentType Env);		void setTripleEnvironment(llvm::Triple::EnvironmentType Env);
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	public:
}		}

/// Get the toolchain's effective clang triple.		/// Get the toolchain's effective clang triple.
const llvm::Triple &getEffectiveTriple() const {		const llvm::Triple &getEffectiveTriple() const {
assert(!EffectiveTriple.getTriple().empty() && "No effective triple");		assert(!EffectiveTriple.getTriple().empty() && "No effective triple");
return EffectiveTriple;		return EffectiveTriple;
}		}

		const std::string getOffloadArch() const { return OffloadArch; }

		void setOffloadArch(std::string OffloadArch) {
		this->OffloadArch = std::move(OffloadArch);
		}

path_list &getLibraryPaths() { return LibraryPaths; }		path_list &getLibraryPaths() { return LibraryPaths; }
const path_list &getLibraryPaths() const { return LibraryPaths; }		const path_list &getLibraryPaths() const { return LibraryPaths; }

path_list &getFilePaths() { return FilePaths; }		path_list &getFilePaths() { return FilePaths; }
const path_list &getFilePaths() const { return FilePaths; }		const path_list &getFilePaths() const { return FilePaths; }

path_list &getProgramPaths() { return ProgramPaths; }		path_list &getProgramPaths() { return ProgramPaths; }
const path_list &getProgramPaths() const { return ProgramPaths; }		const path_list &getProgramPaths() const { return ProgramPaths; }
▲ Show 20 Lines • Show All 470 Lines • Show Last 20 Lines

clang/lib/Driver/Action.cpp

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	OffloadAction::OffloadAction(const DeviceDependences &DDeps, types::ID Ty)
for (unsigned i = 0, e = getInputs().size(); i != e; ++i)		for (unsigned i = 0, e = getInputs().size(); i != e; ++i)
getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);		getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);
}		}

OffloadAction::OffloadAction(const HostDependence &HDep,		OffloadAction::OffloadAction(const HostDependence &HDep,
const DeviceDependences &DDeps)		const DeviceDependences &DDeps)
: Action(OffloadClass, HDep.getAction()), HostTC(HDep.getToolChain()),		: Action(OffloadClass, HDep.getAction()), HostTC(HDep.getToolChain()),
DevToolChains(DDeps.getToolChains()) {		DevToolChains(DDeps.getToolChains()) {
		auto &OKinds = DDeps.getOffloadKinds();
		auto &BArchs = DDeps.getBoundArchs();

		// If all inputs agree on the same kind, use it also for this action.
		if (llvm::all_of(OKinds, [&](OffloadKind K) { return K == OKinds.front(); }))
		OffloadingDeviceKind = OKinds.front();

		// If we have a single dependency, inherit the architecture from it.
		if (OKinds.size() == 1)
		OffloadingArch = BArchs.front();
		else
// We use the kinds of the host dependence for this action.		// We use the kinds of the host dependence for this action.
OffloadingArch = HDep.getBoundArch();		OffloadingArch = HDep.getBoundArch();

ActiveOffloadKindMask = HDep.getOffloadKinds();		ActiveOffloadKindMask = HDep.getOffloadKinds();
HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),		HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
HDep.getBoundArch());		OffloadingArch);

// Add device inputs and propagate info to the device actions. Do work only if		// Add device inputs and propagate info to the device actions. Do work only if
// we have dependencies.		// we have dependencies.
for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)		for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)
if (auto *A = DDeps.getActions()[i]) {		if (auto *A = DDeps.getActions()[i]) {
getInputs().push_back(A);		getInputs().push_back(A);
A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],		A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],
DDeps.getBoundArchs()[i]);		DDeps.getBoundArchs()[i]);
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	if (RT == OMPRT_Unknown) {
else		else
// FIXME: We could use a nicer diagnostic here.		// FIXME: We could use a nicer diagnostic here.
Diag(diag::err_drv_unsupported_opt) << "-fopenmp";		Diag(diag::err_drv_unsupported_opt) << "-fopenmp";
}		}

return RT;		return RT;
}		}

		bool GetTargetInfoFromMArch(Compilation &C,
		std::set<std::string> &OffloadArchs) {
		StringRef OpenMPTargetArch;
		for (Arg *A : C.getInputArgs()) {
		if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
		for (auto *V : A->getValues()) {
		StringRef VStr = StringRef(V);
		if (VStr.startswith("-march=") \|\| VStr.startswith("--march=")) {
		OpenMPTargetArch = VStr.split('=').second;
		CudaArch Arch = StringToCudaArch(StringRef(OpenMPTargetArch));
		if (Arch == CudaArch::UNKNOWN) {
		C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch)
		<< OpenMPTargetArch;
		C.setContainsError();
		return false;
		}
		StringRef OpenMPTargetTriple = StringRef(A->getValue(0));
		llvm::Triple TargetTriple(OpenMPTargetTriple);

		// Append Triple and Arch to form a unique key for each instance of
		// the ToolChain
		if (!OpenMPTargetTriple.empty() && !OpenMPTargetArch.empty())
		OffloadArchs.insert(TargetTriple.normalize().append("^").append(
		OpenMPTargetArch.str()));
		}
		A->claim();
		}
		}
		}
		return true;
		}

void Driver::CreateOffloadingDeviceToolChains(Compilation &C,		void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
InputList &Inputs) {		InputList &Inputs) {

//		//
// CUDA/HIP		// CUDA/HIP
//		//
// We need to generate a CUDA/HIP toolchain if any of the inputs has a CUDA		// We need to generate a CUDA/HIP toolchain if any of the inputs has a CUDA
// or HIP type. However, mixed CUDA/HIP compilation is not supported.		// or HIP type. However, mixed CUDA/HIP compilation is not supported.
Show All 35 Lines	if (IsCuda) {
// Use the HIP and host triples as the key into the ToolChains map,		// Use the HIP and host triples as the key into the ToolChains map,
// because the device toolchain we create depends on both.		// because the device toolchain we create depends on both.
auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()];		auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()];
if (!HIPTC) {		if (!HIPTC) {
HIPTC = std::make_unique<toolchains::HIPToolChain>(		HIPTC = std::make_unique<toolchains::HIPToolChain>(
this, HIPTriple, HostTC, C.getInputArgs());		this, HIPTriple, HostTC, C.getInputArgs());
}		}
C.addOffloadDeviceToolChain(HIPTC.get(), OFK);		C.addOffloadDeviceToolChain(HIPTC.get(), OFK);
}		} else {

//		//
// OpenMP		// OpenMP
//		//
// We need to generate an OpenMP toolchain if the user specified targets with
// the -fopenmp-targets option.		std::set<std::string> OffloadArchs;

if (Arg *OpenMPTargets =		if (Arg *OpenMPTargets =
C.getInputArgs().getLastArg(options::OPT_fopenmp_targets_EQ)) {		C.getInputArgs().getLastArg(options::OPT_fopenmp_targets_EQ)) {
if (OpenMPTargets->getNumValues()) {
// We expect that -fopenmp-targets is always used in conjunction with the		if (!OpenMPTargets->getNumValues()) {
		Diag(clang::diag::warn_drv_empty_joined_argument)
		<< OpenMPTargets->getAsString(C.getInputArgs());
		return;
		}

		// First, handle errors in command line for OpenMP target offload
		bool is_host_offloading =
		(OpenMPTargets->getNumValues() == 1) &&
		StringRef(OpenMPTargets->getValue())
		.startswith_insensitive(
		C.getSingleOffloadToolChain<Action::OFK_Host>()
		->getTriple()
		.getArchName());
		if (!is_host_offloading) {
		// Ensure at least one -Xopenm-target exists with a gpu -march
		if (Arg *XOpenMPTargets =
		C.getInputArgs().getLastArg(options::OPT_Xopenmp_target_EQ)) {
		bool has_valid_march = false;
		for (auto *V : XOpenMPTargets->getValues())
		if (StringRef(V).startswith("-march=") \|\|
		StringRef(V).startswith("--march="))
		has_valid_march = true;
		if (!has_valid_march) {
		Diag(diag::err_drv_missing_Xopenmptarget_or_march);
		return;
		}
		} else {
		Diag(diag::err_drv_missing_Xopenmptarget_or_march);
		return;
		}
		}

		// process legacy option -fopenmp-targets -Xopenmp-target and -march
		auto status = GetTargetInfoFromMArch(C, OffloadArchs);
		if (!status)
		return;
		}

		if (!OffloadArchs.empty()) {

		// We expect that an offload target is always used in conjunction with
// option -fopenmp specifying a valid runtime with offloading support,		// option -fopenmp specifying a valid runtime with offloading support,
// i.e. libomp or libiomp.		// i.e. libomp or libiomp.
bool HasValidOpenMPRuntime = C.getInputArgs().hasFlag(		bool HasValidOpenMPRuntime = C.getInputArgs().hasFlag(
options::OPT_fopenmp, options::OPT_fopenmp_EQ,		options::OPT_fopenmp, options::OPT_fopenmp_EQ,
options::OPT_fno_openmp, false);		options::OPT_fno_openmp, false);
if (HasValidOpenMPRuntime) {		if (HasValidOpenMPRuntime) {
OpenMPRuntimeKind OpenMPKind = getOpenMPRuntime(C.getInputArgs());		OpenMPRuntimeKind OpenMPKind = getOpenMPRuntime(C.getInputArgs());
HasValidOpenMPRuntime =		HasValidOpenMPRuntime =
OpenMPKind == OMPRT_OMP \|\| OpenMPKind == OMPRT_IOMP5;		OpenMPKind == OMPRT_OMP \|\| OpenMPKind == OMPRT_IOMP5;
}		}
		if (!HasValidOpenMPRuntime) {
		Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
		return;
		}

if (HasValidOpenMPRuntime) {
llvm::StringMap<const char *> FoundNormalizedTriples;		llvm::StringMap<const char *> FoundNormalizedTriples;
for (const char *Val : OpenMPTargets->getValues()) {		for (auto &Target : OffloadArchs) {
llvm::Triple TT(Val);		size_t Loc = Target.find('^');
std::string NormalizedName = TT.normalize();		std::string TripleStr = Target.substr(0, Loc);
		std::string OpenMPTargetArch = Target.substr(Loc + 1);
		llvm::Triple TT(TripleStr);
		std::string NormalizedName = Target;

// Make sure we don't have a duplicate triple.		// Make sure we don't have a duplicate triple.
auto Duplicate = FoundNormalizedTriples.find(NormalizedName);		auto Duplicate = FoundNormalizedTriples.find(NormalizedName);
if (Duplicate != FoundNormalizedTriples.end()) {		if (Duplicate != FoundNormalizedTriples.end()) {
Diag(clang::diag::warn_drv_omp_offload_target_duplicate)		Diag(clang::diag::warn_drv_omp_offload_target_duplicate)
<< Val << Duplicate->second;		<< NormalizedName << Duplicate->second;
continue;		continue;
}		}

// Store the current triple so that we can check for duplicates in the		// Store the current triple so that we can check for duplicates in the
// following iterations.		// following iterations.
FoundNormalizedTriples[NormalizedName] = Val;		FoundNormalizedTriples[NormalizedName] = NormalizedName.c_str();

// If the specified target is invalid, emit a diagnostic.		// If the specified target is invalid, emit a diagnostic.
if (TT.getArch() == llvm::Triple::UnknownArch)		if (TT.getArch() == llvm::Triple::UnknownArch) {
Diag(clang::diag::err_drv_invalid_omp_target) << Val;		Diag(clang::diag::err_drv_invalid_omp_target) << NormalizedName;
else {		return;
		}

const ToolChain *TC;		const ToolChain *TC;
// Device toolchains have to be selected differently. They pair host		// Device toolchains have to be selected differently. They pair host
// and device in their implementation.		// and device in their implementation.
if (TT.isNVPTX() \|\| TT.isAMDGCN()) {		if (TT.isNVPTX() \|\| TT.isAMDGCN()) {
const ToolChain *HostTC =		const ToolChain *HostTC =
C.getSingleOffloadToolChain<Action::OFK_Host>();		C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "Host toolchain should be always defined.");		assert(HostTC && "Host toolchain should be always defined.");
auto &DeviceTC =		auto &DeviceTC = ToolChains[NormalizedName + "/" +
ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];		HostTC->getTriple().normalize()];
if (!DeviceTC) {		if (!DeviceTC) {
if (TT.isNVPTX())		if (TT.isNVPTX())
DeviceTC = std::make_unique<toolchains::CudaToolChain>(		DeviceTC = std::make_unique<toolchains::CudaToolChain>(
this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP);		this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP,
		OpenMPTargetArch);
else if (TT.isAMDGCN())		else if (TT.isAMDGCN())
DeviceTC =		DeviceTC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
std::make_unique<toolchains::AMDGPUOpenMPToolChain>(		this, TT, HostTC, C.getInputArgs(), OpenMPTargetArch);
this, TT, HostTC, C.getInputArgs());
else		else
assert(DeviceTC && "Device toolchain not defined.");		assert(DeviceTC && "Device toolchain not defined.");
}		}

TC = DeviceTC.get();		TC = DeviceTC.get();
} else		} else {
TC = &getToolChain(C.getInputArgs(), TT);		TC = &getToolChain(C.getInputArgs(), TT);
C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
}		}
}		// Each value of -fopenmp-targets gets instance of offload toolchain
} else		C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);		} // end foreach openmp target
} else		} // end has openmp offload targets
Diag(clang::diag::warn_drv_empty_joined_argument)
<< OpenMPTargets->getAsString(C.getInputArgs());
}		}

//		//
// TODO: Add support for other offloading programming models here.		// TODO: Add support for other offloading programming models here.
//		//
}		}

/// Looks the given directories for the specified file.		/// Looks the given directories for the specified file.
▲ Show 20 Lines • Show All 1,585 Lines • ▼ Show 20 Lines	enum ActionBuilderReturnCode {
ABRT_Success,		ABRT_Success,
// The builder didn't have to act on the current action.		// The builder didn't have to act on the current action.
ABRT_Inactive,		ABRT_Inactive,
// The builder was successful and requested the host action to not be		// The builder was successful and requested the host action to not be
// generated.		// generated.
ABRT_Ignore_Host,		ABRT_Ignore_Host,
};		};

		/// ID to identify each device compilation. For CUDA it is simply the
		/// GPU arch string. For HIP it is either the GPU arch string or GPU
		/// arch string plus feature strings delimited by a plus sign, e.g.
		/// gfx906+xnack.
		struct TargetID {
		/// Target ID string which is persistent throughout the compilation.
		const char *ID;
		TargetID(CudaArch Arch) { ID = CudaArchToString(Arch); }
		TargetID(const char *ID) : ID(ID) {}
		operator const char *() { return ID; }
		operator StringRef() { return StringRef(ID); }
		};

protected:		protected:
/// Compilation associated with this builder.		/// Compilation associated with this builder.
Compilation &C;		Compilation &C;

/// Tool chains associated with this builder. The same programming		/// Tool chains associated with this builder. The same programming
/// model may have associated one or more tool chains.		/// model may have associated one or more tool chains.
SmallVector<const ToolChain *, 2> ToolChains;		SmallVector<const ToolChain *, 2> ToolChains;

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	class OffloadingActionBuilder final {
protected:		protected:
/// Flags to signal if the user requested host-only or device-only		/// Flags to signal if the user requested host-only or device-only
/// compilation.		/// compilation.
bool CompileHostOnly = false;		bool CompileHostOnly = false;
bool CompileDeviceOnly = false;		bool CompileDeviceOnly = false;
bool EmitLLVM = false;		bool EmitLLVM = false;
bool EmitAsm = false;		bool EmitAsm = false;

/// ID to identify each device compilation. For CUDA it is simply the
/// GPU arch string. For HIP it is either the GPU arch string or GPU
/// arch string plus feature strings delimited by a plus sign, e.g.
/// gfx906+xnack.
struct TargetID {
/// Target ID string which is persistent throughout the compilation.
const char *ID;
TargetID(CudaArch Arch) { ID = CudaArchToString(Arch); }
TargetID(const char *ID) : ID(ID) {}
operator const char *() { return ID; }
operator StringRef() { return StringRef(ID); }
};
/// List of GPU architectures to use in this compilation.		/// List of GPU architectures to use in this compilation.
SmallVector<TargetID, 4> GpuArchList;		SmallVector<TargetID, 4> GpuArchList;

/// The CUDA actions for the current input.		/// The CUDA actions for the current input.
ActionList CudaDeviceActions;		ActionList CudaDeviceActions;

/// The CUDA fat binary if it was generated for the current input.		/// The CUDA fat binary if it was generated for the current input.
Action *CudaFatBinary = nullptr;		Action *CudaFatBinary = nullptr;
▲ Show 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	class OffloadingActionBuilder final {
};		};

/// OpenMP action builder. The host bitcode is passed to the device frontend		/// OpenMP action builder. The host bitcode is passed to the device frontend
/// and all the device linked images are passed to the host link phase.		/// and all the device linked images are passed to the host link phase.
class OpenMPActionBuilder final : public DeviceActionBuilder {		class OpenMPActionBuilder final : public DeviceActionBuilder {
/// The OpenMP actions for the current input.		/// The OpenMP actions for the current input.
ActionList OpenMPDeviceActions;		ActionList OpenMPDeviceActions;

		bool CompileHostOnly = false;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: private field 'CompileHostOnly' is not used [clang-diagnostic-unused-private-field] not useful Lint: Pre-merge checks: clang-tidy: warning: private field 'CompileHostOnly' is not used [clang-diagnostic-unused…
		bool CompileDeviceOnly = false;

		/// List of GPU architectures to use in this compilation.
		SmallVector<TargetID, 4> GpuArchList;

/// The linker inputs obtained for each toolchain.		/// The linker inputs obtained for each toolchain.
SmallVector<ActionList, 8> DeviceLinkerInputs;		SmallVector<ActionList, 8> DeviceLinkerInputs;

public:		public:
OpenMPActionBuilder(Compilation &C, DerivedArgList &Args,		OpenMPActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: DeviceActionBuilder(C, Args, Inputs, Action::OFK_OpenMP) {}		: DeviceActionBuilder(C, Args, Inputs, Action::OFK_OpenMP) {}

Show All 17 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
for (auto *A : OpenMPDeviceActions) {		for (auto *A : OpenMPDeviceActions) {
LI->push_back(A);		LI->push_back(A);
++LI;		++LI;
}		}

// We passed the device action as a host dependence, so we don't need to		// We passed the device action as a host dependence, so we don't need to
// do anything else with them.		// do anything else with them.
OpenMPDeviceActions.clear();		OpenMPDeviceActions.clear();
return ABRT_Success;		return CompileDeviceOnly ? ABRT_Ignore_Host : ABRT_Success;
		;
}		}

		bool LastActionIsCompile = false;
// By default, we produce an action for each device arch.		// By default, we produce an action for each device arch.
for (Action *&A : OpenMPDeviceActions)		for (unsigned I = 0; I < ToolChains.size(); ++I) {
A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);		Action *&A = OpenMPDeviceActions[I];
		// AMDGPU does not support linking of object files, so we skip
return ABRT_Success;		// assemble and backend actions to produce LLVM IR.
		if (ToolChains[I]->getTriple().isAMDGCN() &&
		(CurPhase == phases::Assemble \|\| CurPhase == phases::Backend))
		continue;
		A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A,
		Action::OFK_OpenMP);
		LastActionIsCompile =
		(A->getKind() == Action::ActionClass::CompileJobClass);
		}
		return (CompileDeviceOnly && LastActionIsCompile) ? ABRT_Ignore_Host
		: ABRT_Success;
}		}

ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {		ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {

// If this is an input action replicate it for each OpenMP toolchain.		// If this is an input action replicate it for each OpenMP toolchain.
if (auto *IA = dyn_cast<InputAction>(HostAction)) {		if (auto *IA = dyn_cast<InputAction>(HostAction)) {
OpenMPDeviceActions.clear();		OpenMPDeviceActions.clear();
for (unsigned I = 0; I < ToolChains.size(); ++I)		// Only process input actions for files that have extensions
OpenMPDeviceActions.push_back(		std::string FileName = IA->getInputArg().getAsString(Args);
C.MakeAction<InputAction>(IA->getInputArg(), IA->getType()));		if (!llvm::sys::path::has_extension(FileName)) {
		return ABRT_Inactive;
		}
		for (unsigned I = 0; I < ToolChains.size(); ++I) {
		OpenMPDeviceActions.push_back(C.MakeAction<InputAction>(
		IA->getInputArg(), IA->getType(), GpuArchList[I].ID));
		}
return ABRT_Success;		return ABRT_Success;
}		}

// If this is an unbundling action use it as is for each OpenMP toolchain.		// If this is an unbundling action use it as is for each OpenMP toolchain.
if (auto *UA = dyn_cast<OffloadUnbundlingJobAction>(HostAction)) {		if (auto *UA = dyn_cast<OffloadUnbundlingJobAction>(HostAction)) {
OpenMPDeviceActions.clear();		OpenMPDeviceActions.clear();
auto *IA = cast<InputAction>(UA->getInputs().back());		auto *IA = cast<InputAction>(UA->getInputs().back());
std::string FileName = IA->getInputArg().getAsString(Args);		std::string FileName = IA->getInputArg().getAsString(Args);
// Check if the type of the file is the same as the action. Do not		// Check if the type of the file is the same as the action. Do not
// unbundle it if it is not. Do not unbundle .so files, for example,		// unbundle it if it is not. Do not unbundle .so files, for example,
// which are not object files.		// which are not object files.
if (IA->getType() == types::TY_Object &&		if (IA->getType() == types::TY_Object &&
(!llvm::sys::path::has_extension(FileName) \|\|		(!llvm::sys::path::has_extension(FileName) \|\|
types::lookupTypeForExtension(		types::lookupTypeForExtension(
llvm::sys::path::extension(FileName).drop_front()) !=		llvm::sys::path::extension(FileName).drop_front()) !=
types::TY_Object))		types::TY_Object))
return ABRT_Inactive;		return ABRT_Inactive;
for (unsigned I = 0; I < ToolChains.size(); ++I) {		for (unsigned I = 0; I < ToolChains.size(); ++I) {
OpenMPDeviceActions.push_back(UA);		OpenMPDeviceActions.push_back(UA);
UA->registerDependentActionInfo(		UA->registerDependentActionInfo(ToolChains[I],
ToolChains[I], /BoundArch=/StringRef(), Action::OFK_OpenMP);		/BoundArch=/GpuArchList[I].ID,
		Action::OFK_OpenMP);
}		}
return ABRT_Success;		return ABRT_Success;
}		}

// When generating code for OpenMP we use the host compile phase result as		// When generating code for OpenMP we use the host compile phase result as
// a dependence to the device compile phase so that it can learn what		// a dependence to the device compile phase so that it can learn what
// declarations should be emitted. However, this is not the only use for		// declarations should be emitted. However, this is not the only use for
// the host action, so we prevent it from being collapsed.		// the host action, so we prevent it from being collapsed.
if (isa<CompileJobAction>(HostAction)) {		if (isa<CompileJobAction>(HostAction)) {
HostAction->setCannotBeCollapsedWithNextDependentAction();		HostAction->setCannotBeCollapsedWithNextDependentAction();
assert(ToolChains.size() == OpenMPDeviceActions.size() &&		assert(ToolChains.size() == OpenMPDeviceActions.size() &&
"Toolchains and device action sizes do not match.");		"Toolchains and device action sizes do not match.");
OffloadAction::HostDependence HDep(		OffloadAction::HostDependence HDep(
HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),		HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),
/BoundArch=/nullptr, Action::OFK_OpenMP);		/BoundArch=/nullptr, Action::OFK_OpenMP);
auto TC = ToolChains.begin();		auto TC = ToolChains.begin();
		unsigned arch_count = 0;
for (Action *&A : OpenMPDeviceActions) {		for (Action *&A : OpenMPDeviceActions) {
assert(isa<CompileJobAction>(A));		assert(isa<CompileJobAction>(A));
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, TC, /BoundArch=*/nullptr, Action::OFK_OpenMP);		DDep.add(A, *TC, GpuArchList[arch_count++].ID, Action::OFK_OpenMP);
A = C.MakeAction<OffloadAction>(HDep, DDep);		A = C.MakeAction<OffloadAction>(HDep, DDep);
++TC;		++TC;
}		}
}		}
return ABRT_Success;		return ABRT_Success;
}		}

void appendTopLevelActions(ActionList &AL) override {		void appendTopLevelActions(ActionList &AL) override {
if (OpenMPDeviceActions.empty())		if (OpenMPDeviceActions.empty())
return;		return;

// We should always have an action for each input.		// We should always have an action for each input.
assert(OpenMPDeviceActions.size() == ToolChains.size() &&		assert(OpenMPDeviceActions.size() == ToolChains.size() &&
"Number of OpenMP actions and toolchains do not match.");		"Number of OpenMP actions and toolchains do not match.");

		unsigned arch_count = 0;
// Append all device actions followed by the proper offload action.		// Append all device actions followed by the proper offload action.
auto TI = ToolChains.begin();		auto TI = ToolChains.begin();
for (auto *A : OpenMPDeviceActions) {		for (auto *A : OpenMPDeviceActions) {
OffloadAction::DeviceDependences Dep;		OffloadAction::DeviceDependences Dep;
Dep.add(A, TI, /BoundArch=*/nullptr, Action::OFK_OpenMP);		Dep.add(A, TI, /BoundArch=*/GpuArchList[arch_count++].ID,
		Action::OFK_OpenMP);
AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));		AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));
++TI;		++TI;
}		}
// We no longer need the action stored in this builder.		// We no longer need the action stored in this builder.
OpenMPDeviceActions.clear();		OpenMPDeviceActions.clear();
}		}

void appendLinkDeviceActions(ActionList &AL) override {		void appendLinkDeviceActions(ActionList &AL) override {
assert(ToolChains.size() == DeviceLinkerInputs.size() &&		assert(ToolChains.size() == DeviceLinkerInputs.size() &&
"Toolchains and linker inputs sizes do not match.");		"Toolchains and linker inputs sizes do not match.");

// Append a new link action for each device.		// Append a new link action for each device.
auto TC = ToolChains.begin();		auto TC = ToolChains.begin();
		unsigned arch_count = 0;
for (auto &LI : DeviceLinkerInputs) {		for (auto &LI : DeviceLinkerInputs) {
auto *DeviceLinkAction =		auto *DeviceLinkAction =
C.MakeAction<LinkJobAction>(LI, types::TY_Image);		C.MakeAction<LinkJobAction>(LI, types::TY_Image);
OffloadAction::DeviceDependences DeviceLinkDeps;		OffloadAction::DeviceDependences DeviceLinkDeps;
DeviceLinkDeps.add(DeviceLinkAction, TC, /BoundArch=*/nullptr,		DeviceLinkDeps.add(DeviceLinkAction, *TC,
Action::OFK_OpenMP);		GpuArchList[arch_count++].ID, Action::OFK_OpenMP);
AL.push_back(C.MakeAction<OffloadAction>(DeviceLinkDeps,		AL.push_back(C.MakeAction<OffloadAction>(DeviceLinkDeps,
DeviceLinkAction->getType()));		DeviceLinkAction->getType()));
++TC;		++TC;
}		}
DeviceLinkerInputs.clear();		DeviceLinkerInputs.clear();
}		}

Action* appendLinkHostActions(ActionList &AL) override {		Action* appendLinkHostActions(ActionList &AL) override {
// Create wrapper bitcode from the result of device link actions and compile		// Create wrapper bitcode from the result of device link actions and compile
// it to an object which will be added to the host link command.		// it to an object which will be added to the host link command.
auto *BC = C.MakeAction<OffloadWrapperJobAction>(AL, types::TY_LLVM_BC);		auto *BC = C.MakeAction<OffloadWrapperJobAction>(AL, types::TY_LLVM_BC);
auto *ASM = C.MakeAction<BackendJobAction>(BC, types::TY_PP_Asm);		auto *ASM = C.MakeAction<BackendJobAction>(BC, types::TY_PP_Asm);
return C.MakeAction<AssembleJobAction>(ASM, types::TY_Object);		return C.MakeAction<AssembleJobAction>(ASM, types::TY_Object);
}		}

void appendLinkDependences(OffloadAction::DeviceDependences &DA) override {}		void appendLinkDependences(OffloadAction::DeviceDependences &DA) override {}

bool initialize() override {		bool initialize() override {
		if (Arg *cu_dev_only =
		C.getInputArgs().getLastArg(options::OPT_cuda_device_only)) {
		cu_dev_only->claim();
		CompileDeviceOnly = true;
		// TODO: Check emitting IR for OpenMP when cuda-device-only is set
		}
// Get the OpenMP toolchains. If we don't get any, the action builder will		// Get the OpenMP toolchains. If we don't get any, the action builder will
// know there is nothing to do related to OpenMP offloading.		// know there is nothing to do related to OpenMP offloading.
auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();		auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();
for (auto TI = OpenMPTCRange.first, TE = OpenMPTCRange.second; TI != TE;		for (auto TI = OpenMPTCRange.first, TE = OpenMPTCRange.second; TI != TE;
++TI)		++TI) {
		GpuArchList.push_back(
		TI->second->getTriple().getEnvironmentName().data());
ToolChains.push_back(TI->second);		ToolChains.push_back(TI->second);
		}

DeviceLinkerInputs.resize(ToolChains.size());		DeviceLinkerInputs.resize(ToolChains.size());
return false;		return false;
}		}

bool canUseBundlerUnbundler() const override {		bool canUseBundlerUnbundler() const override {
// OpenMP should use bundled files whenever possible.		// OpenMP should use bundled files whenever possible.
return true;		return true;
▲ Show 20 Lines • Show All 1,301 Lines • ▼ Show 20 Lines	if (const OffloadAction *OA = dyn_cast<OffloadAction>(A)) {

// If 'Action 2' is host, we generate jobs for the device dependences and		// If 'Action 2' is host, we generate jobs for the device dependences and
// override the current action with the host dependence. Otherwise, we		// override the current action with the host dependence. Otherwise, we
// generate the host dependences and override the action with the device		// generate the host dependences and override the action with the device
// dependence. The dependences can't therefore be a top-level action.		// dependence. The dependences can't therefore be a top-level action.
OA->doOnEachDependence(		OA->doOnEachDependence(
/IsHostDependence=/BuildingForOffloadDevice,		/IsHostDependence=/BuildingForOffloadDevice,
[&](Action DepA, const ToolChain DepTC, const char *DepBoundArch) {		[&](Action DepA, const ToolChain DepTC, const char *DepBoundArch) {

OffloadDependencesInputInfo.push_back(BuildJobsForAction(		OffloadDependencesInputInfo.push_back(BuildJobsForAction(
C, DepA, DepTC, DepBoundArch, /AtTopLevel=/false,		C, DepA, DepTC, DepBoundArch, /AtTopLevel=/false,
/MultipleArchs/ !!DepBoundArch, LinkingOutput, CachedResults,		/MultipleArchs/ !!DepBoundArch, LinkingOutput, CachedResults,
DepA->getOffloadingDeviceKind()));		DepA->getOffloadingDeviceKind()));
});		});

A = BuildingForOffloadDevice		A = BuildingForOffloadDevice
? OA->getSingleDeviceDependence(/DoNotConsiderHostActions=/true)		? OA->getSingleDeviceDependence(/DoNotConsiderHostActions=/true)
Show All 36 Lines	InputInfo Driver::BuildJobsForActionNoCache(

ToolSelector TS(JA, *TC, C, isSaveTempsEnabled(),		ToolSelector TS(JA, *TC, C, isSaveTempsEnabled(),
embedBitcodeInObject() && !isUsingLTO());		embedBitcodeInObject() && !isUsingLTO());
const Tool *T = TS.getTool(Inputs, CollapsedOffloadActions);		const Tool *T = TS.getTool(Inputs, CollapsedOffloadActions);

if (!T)		if (!T)
return InputInfo();		return InputInfo();

if (BuildingForOffloadDevice &&
A->getOffloadingDeviceKind() == Action::OFK_OpenMP) {
if (TC->getTriple().isAMDGCN()) {
// AMDGCN treats backend and assemble actions as no-op because
// linker does not support object files.
if (const BackendJobAction *BA = dyn_cast<BackendJobAction>(A)) {
return BuildJobsForAction(C, *BA->input_begin(), TC, BoundArch,
AtTopLevel, MultipleArchs, LinkingOutput,
CachedResults, TargetDeviceOffloadKind);
}

if (const AssembleJobAction *AA = dyn_cast<AssembleJobAction>(A)) {
return BuildJobsForAction(C, *AA->input_begin(), TC, BoundArch,
AtTopLevel, MultipleArchs, LinkingOutput,
CachedResults, TargetDeviceOffloadKind);
}
}
}

// If we've collapsed action list that contained OffloadAction we		// If we've collapsed action list that contained OffloadAction we
// need to build jobs for host/device-side inputs it may have held.		// need to build jobs for host/device-side inputs it may have held.
for (const auto *OA : CollapsedOffloadActions)		for (const auto *OA : CollapsedOffloadActions)
cast<OffloadAction>(OA)->doOnEachDependence(		cast<OffloadAction>(OA)->doOnEachDependence(
/IsHostDependence=/BuildingForOffloadDevice,		/IsHostDependence=/BuildingForOffloadDevice,
[&](Action DepA, const ToolChain DepTC, const char *DepBoundArch) {		[&](Action DepA, const ToolChain DepTC, const char *DepBoundArch) {
OffloadDependencesInputInfo.push_back(BuildJobsForAction(		OffloadDependencesInputInfo.push_back(BuildJobsForAction(
C, DepA, DepTC, DepBoundArch, /* AtTopLevel */ false,		C, DepA, DepTC, DepBoundArch, /* AtTopLevel */ false,
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (auto &UI : UA->getDependentActionsInfo()) {
auto CurI = InputInfo(		auto CurI = InputInfo(
UA,		UA,
GetNamedOutputPath(C, *UA, BaseInput, UI.DependentBoundArch,		GetNamedOutputPath(C, *UA, BaseInput, UI.DependentBoundArch,
/AtTopLevel=/false,		/AtTopLevel=/false,
MultipleArchs \|\|		MultipleArchs \|\|
UI.DependentOffloadKind == Action::OFK_HIP,		UI.DependentOffloadKind == Action::OFK_HIP,
OffloadingPrefix),		OffloadingPrefix),
BaseInput);		BaseInput);
		if (UI.DependentOffloadKind == Action::OFK_Host &&
		llvm::sys::path::extension(InputInfos[0].getFilename()) == ".a")
		CurI = InputInfos[0];
// Save the unbundling result.		// Save the unbundling result.
UnbundlingResults.push_back(CurI);		UnbundlingResults.push_back(CurI);

// Get the unique string identifier for this dependence and cache the		// Get the unique string identifier for this dependence and cache the
// result.		// result.
StringRef Arch;		StringRef Arch;
if (TargetDeviceOffloadKind == Action::OFK_HIP) {		if (TargetDeviceOffloadKind == Action::OFK_HIP \|\|
		TargetDeviceOffloadKind == Action::OFK_OpenMP) {
if (UI.DependentOffloadKind == Action::OFK_Host)		if (UI.DependentOffloadKind == Action::OFK_Host)
Arch = StringRef();		Arch = StringRef();
else		else if (TargetDeviceOffloadKind == Action::OFK_HIP)
Arch = UI.DependentBoundArch;		Arch = UI.DependentBoundArch;
		else if (TargetDeviceOffloadKind == Action::OFK_OpenMP)
		Arch = UI.DependentToolChain->getOffloadArch();
} else		} else
Arch = BoundArch;		Arch = BoundArch;

CachedResults[{A, GetTriplePlusArchString(UI.DependentToolChain, Arch,		CachedResults[{A, GetTriplePlusArchString(UI.DependentToolChain, Arch,
UI.DependentOffloadKind)}] =		UI.DependentOffloadKind)}] =
CurI;		CurI;
}		}

Show All 13 Lines	std::string OffloadingPrefix = Action::GetOffloadingFileNamePrefix(
A->getOffloadingDeviceKind(), TC->getTriple().normalize(),		A->getOffloadingDeviceKind(), TC->getTriple().normalize(),
/CreatePrefixForHost=/!!A->getOffloadingHostActiveKinds() &&		/CreatePrefixForHost=/!!A->getOffloadingHostActiveKinds() &&
!AtTopLevel);		!AtTopLevel);
if (isa<OffloadWrapperJobAction>(JA)) {		if (isa<OffloadWrapperJobAction>(JA)) {
if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o))		if (Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o))
BaseInput = FinalOutput->getValue();		BaseInput = FinalOutput->getValue();
else		else
BaseInput = getDefaultImageName();		BaseInput = getDefaultImageName();
BaseInput =		std::string BaseNm = std::string(BaseInput);
C.getArgs().MakeArgString(std::string(BaseInput) + "-wrapper");		std::replace(BaseNm.begin(), BaseNm.end(), '.', '_');
		BaseInput = C.getArgs().MakeArgString(BaseNm + "-wrapper");
}		}
Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput, BoundArch,		Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput, BoundArch,
AtTopLevel, MultipleArchs,		AtTopLevel, MultipleArchs,
OffloadingPrefix),		OffloadingPrefix),
BaseInput);		BaseInput);
}		}

if (CCCPrintBindings && !CCGenDiagnostics) {		if (CCCPrintBindings && !CCGenDiagnostics) {
▲ Show 20 Lines • Show All 776 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines

	class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final			class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
	: public ROCMToolChain {			: public ROCMToolChain {
	public:			public:
	AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,			AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
	const ToolChain &HostTC,			const ToolChain &HostTC,
	const llvm::opt::ArgList &Args);			const llvm::opt::ArgList &Args);

				AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
				const ToolChain &HostTC, const llvm::opt::ArgList &Args,
				const std::string OffloadArch);

	const llvm::Triple *getAuxTriple() const override {			const llvm::Triple *getAuxTriple() const override {
	return &HostTC.getTriple();			return &HostTC.getTriple();
	}			}

	llvm::opt::DerivedArgList *			llvm::opt::DerivedArgList *
	TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,			TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
	Action::OffloadKind DeviceOffloadKind) const override;			Action::OffloadKind DeviceOffloadKind) const override;
	void			void
	Show All 28 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const ToolChain &TC = getToolChain();		const ToolChain &TC = getToolChain();
assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");		assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");

const toolchains::AMDGPUOpenMPToolChain &AMDGPUOpenMPTC =		const toolchains::AMDGPUOpenMPToolChain &AMDGPUOpenMPTC =
static_cast<const toolchains::AMDGPUOpenMPToolChain &>(TC);		static_cast<const toolchains::AMDGPUOpenMPToolChain &>(TC);

std::string GPUArch = Args.getLastArgValue(options::OPT_march_EQ).str();		std::string GPUArch = AMDGPUOpenMPTC.getOffloadArch();
if (GPUArch.empty()) {		if (GPUArch.empty()) {
if (!checkSystemForAMDGPU(Args, AMDGPUOpenMPTC, GPUArch))		if (!checkSystemForAMDGPU(Args, AMDGPUOpenMPTC, GPUArch))
return;		return;
}		}

// Prefix for temporary file name.		// Prefix for temporary file name.
std::string Prefix;		std::string Prefix;
for (const auto &II : Inputs)		for (const auto &II : Inputs)
Show All 19 Lines	AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
const ToolChain &HostTC,		const ToolChain &HostTC,
const ArgList &Args)		const ArgList &Args)
: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {		: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
// Lookup binaries into the driver directory, this is used to		// Lookup binaries into the driver directory, this is used to
// discover the clang-offload-bundler executable.		// discover the clang-offload-bundler executable.
getProgramPaths().push_back(getDriver().Dir);		getProgramPaths().push_back(getDriver().Dir);
}		}

		AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
		const llvm::Triple &Triple,
		const ToolChain &HostTC,
		const ArgList &Args,
		const std::string OffloadArch)
		: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
		getProgramPaths().push_back(getDriver().Dir);
		setOffloadArch(OffloadArch);
		}

void AMDGPUOpenMPToolChain::addClangTargetOptions(		void AMDGPUOpenMPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
		std::string GPUArch = getOffloadArch();
std::string GPUArch = DriverArgs.getLastArgValue(options::OPT_march_EQ).str();
if (GPUArch.empty()) {		if (GPUArch.empty()) {
if (!checkSystemForAMDGPU(DriverArgs, *this, GPUArch))		if (!checkSystemForAMDGPU(DriverArgs, *this, GPUArch))
return;		return;
}		}

assert(DeviceOffloadingKind == Action::OFK_OpenMP &&		assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");		"Only OpenMP offloading kinds are supported.");

▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,653 Lines • ▼ Show 20 Lines	if (Triple.isAMDGPU()) {
handleAMDGPUCodeObjectVersionOptions(D, Args, CmdArgs);		handleAMDGPUCodeObjectVersionOptions(D, Args, CmdArgs);

if (Args.hasFlag(options::OPT_munsafe_fp_atomics,		if (Args.hasFlag(options::OPT_munsafe_fp_atomics,
options::OPT_mno_unsafe_fp_atomics, /Default=/false))		options::OPT_mno_unsafe_fp_atomics, /Default=/false))
CmdArgs.push_back("-munsafe-fp-atomics");		CmdArgs.push_back("-munsafe-fp-atomics");
}		}

// For all the host OpenMP offloading compile jobs we need to pass the targets		// For all the host OpenMP offloading compile jobs we need to pass the targets
// information using -fopenmp-targets= option.		// information using `-fopenmp-targets=` option.
if (JA.isHostOffloading(Action::OFK_OpenMP)) {		if (JA.isHostOffloading(Action::OFK_OpenMP)) {
SmallString<128> TargetInfo("-fopenmp-targets=");		SmallString<128> TargetInfo("-fopenmp-targets=");

Arg *Tgts = Args.getLastArg(options::OPT_fopenmp_targets_EQ);		Arg *Tgts = Args.getLastArg(options::OPT_fopenmp_targets_EQ);
assert(Tgts && Tgts->getNumValues() &&		// Get list of device Toolchains
"OpenMP offloading has to have targets specified.");		auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();

		if (Tgts && Tgts->getNumValues()) {
for (unsigned i = 0; i < Tgts->getNumValues(); ++i) {		for (unsigned i = 0; i < Tgts->getNumValues(); ++i) {
if (i)		if (i)
TargetInfo += ',';		TargetInfo += ',';
// We need to get the string from the triple because it may be not exactly		// We need to get the string from the triple because it may be not
// the same as the one we get directly from the arguments.		// exactly the same as the one we get directly from the arguments.
llvm::Triple T(Tgts->getValue(i));		llvm::Triple T(Tgts->getValue(i));
TargetInfo += T.getTriple();		TargetInfo += T.getTriple();
}		}
		} else if (OpenMPTCRange.first != OpenMPTCRange.second) {
		for (auto TI = OpenMPTCRange.first, TE = OpenMPTCRange.second; TI != TE;
		++TI) {
		auto *deviceTC = TI->second;
		TargetInfo += deviceTC->getTriple().str();
		}
		} else {
		assert("OpenMP offloading requires target devices use \
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: found assert() that could be replaced by static_assert() [misc-static-assert] not useful Lint: Pre-merge checks: clang-tidy: warning: found assert() that could be replaced by static_assert() [misc-static…
		`-fopenmp-targets=`");
		}
CmdArgs.push_back(Args.MakeArgString(TargetInfo.str()));		CmdArgs.push_back(Args.MakeArgString(TargetInfo.str()));
}		}

bool VirtualFunctionElimination =		bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,		Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);		options::OPT_fno_virtual_function_elimination, false);
if (VirtualFunctionElimination) {		if (VirtualFunctionElimination) {
// VFE requires full LTO (currently, this might be relaxed to allow ThinLTO		// VFE requires full LTO (currently, this might be relaxed to allow ThinLTO
▲ Show 20 Lines • Show All 978 Lines • ▼ Show 20 Lines	if (const auto *OA = dyn_cast<OffloadAction>(CurDep)) {
CurTC = nullptr;		CurTC = nullptr;
OA->doOnEachDependence([&](Action A, const ToolChain TC, const char *) {		OA->doOnEachDependence([&](Action A, const ToolChain TC, const char *) {
assert(CurTC == nullptr && "Expected one dependence!");		assert(CurTC == nullptr && "Expected one dependence!");
CurKind = A->getOffloadingDeviceKind();		CurKind = A->getOffloadingDeviceKind();
CurTC = TC;		CurTC = TC;
});		});
}		}
Triples += Action::GetOffloadKindName(CurKind);		Triples += Action::GetOffloadKindName(CurKind);
Triples += "-";		Triples += '-';
std::string NormalizedTriple = CurTC->getTriple().normalize();		Triples += CurTC->getTriple().normalize();
Triples += NormalizedTriple;		if ((CurKind == Action::OFK_HIP \|\| CurKind == Action::OFK_Cuda) &&
		CurDep->getOffloadingArch()) {
if (CurDep->getOffloadingArch() != nullptr) {		Triples += '-';
// If OffloadArch is present it can only appear as the 6th hypen
// sepearated field of Bundle Entry ID. So, pad required number of
// hyphens in Triple.
for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
Triples += "-";
Triples += CurDep->getOffloadingArch();		Triples += CurDep->getOffloadingArch();
}		}
		if (CurKind == Action::OFK_OpenMP && !CurTC->getOffloadArch().empty()) {
		Triples += '-';
		Triples += CurTC->getOffloadArch();
		}
}		}
CmdArgs.push_back(TCArgs.MakeArgString(Triples));		CmdArgs.push_back(TCArgs.MakeArgString(Triples));

// Get bundled file command.		// Get bundled file command.
CmdArgs.push_back(		CmdArgs.push_back(
TCArgs.MakeArgString(Twine("-outputs=") + Output.getFilename()));		TCArgs.MakeArgString(Twine("-outputs=") + Output.getFilename()));

// Get unbundled files command.		// Get unbundled files command.
Show All 15 Lines	for (unsigned I = 0; I < Inputs.size(); ++I) {
UB += CurTC->getInputFilename(Inputs[I]);		UB += CurTC->getInputFilename(Inputs[I]);
}		}
CmdArgs.push_back(TCArgs.MakeArgString(UB));		CmdArgs.push_back(TCArgs.MakeArgString(UB));

// All the inputs are encoded as commands.		// All the inputs are encoded as commands.
C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::None(),		JA, *this, ResponseFileSupport::None(),
TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),		TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),
CmdArgs, None, Output));		CmdArgs, Inputs, Output));
}		}

void OffloadBundler::ConstructJobMultipleOutputs(		void OffloadBundler::ConstructJobMultipleOutputs(
Compilation &C, const JobAction &JA, const InputInfoList &Outputs,		Compilation &C, const JobAction &JA, const InputInfoList &Outputs,
const InputInfoList &Inputs, const llvm::opt::ArgList &TCArgs,		const InputInfoList &Inputs, const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
// The version with multiple outputs is expected to refer to a unbundling job.		// The version with multiple outputs is expected to refer to a unbundling job.
auto &UA = cast<OffloadUnbundlingJobAction>(JA);		auto &UA = cast<OffloadUnbundlingJobAction>(JA);
Show All 18 Lines	void OffloadBundler::ConstructJobMultipleOutputs(
SmallString<128> Triples;		SmallString<128> Triples;
Triples += "-targets=";		Triples += "-targets=";
auto DepInfo = UA.getDependentActionsInfo();		auto DepInfo = UA.getDependentActionsInfo();
for (unsigned I = 0; I < DepInfo.size(); ++I) {		for (unsigned I = 0; I < DepInfo.size(); ++I) {
if (I)		if (I)
Triples += ',';		Triples += ',';

auto &Dep = DepInfo[I];		auto &Dep = DepInfo[I];
Triples += Action::GetOffloadKindName(Dep.DependentOffloadKind);		auto OffloadKind = Dep.DependentOffloadKind;
Triples += "-";		Triples += Action::GetOffloadKindName(OffloadKind);
std::string NormalizedTriple =		Triples += '-';
Dep.DependentToolChain->getTriple().normalize();		Triples += Dep.DependentToolChain->getTriple().normalize();
Triples += NormalizedTriple;		if ((Dep.DependentOffloadKind == Action::OFK_HIP \|\|
		Dep.DependentOffloadKind == Action::OFK_Cuda) &&
if (!Dep.DependentBoundArch.empty()) {		!Dep.DependentBoundArch.empty()) {
// If OffloadArch is present it can only appear as the 6th hypen		Triples += '-';
// sepearated field of Bundle Entry ID. So, pad required number of
// hyphens in Triple.
for (int i = 4 - StringRef(NormalizedTriple).count("-"); i > 0; i--)
Triples += "-";
Triples += Dep.DependentBoundArch;		Triples += Dep.DependentBoundArch;
}		}
		if (OffloadKind == Action::OFK_OpenMP &&
		!Dep.DependentToolChain->getOffloadArch().empty()) {
		Triples += '-';
		Triples += Dep.DependentToolChain->getOffloadArch();
		}
}		}

CmdArgs.push_back(TCArgs.MakeArgString(Triples));		CmdArgs.push_back(TCArgs.MakeArgString(Triples));

// Get bundled file command.		// Get bundled file command.
CmdArgs.push_back(		CmdArgs.push_back(
TCArgs.MakeArgString(Twine("-inputs=") + Input.getFilename()));		TCArgs.MakeArgString(Twine("-inputs=") + Input.getFilename()));

Show All 29 Lines	void OffloadWrapper::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-target");		CmdArgs.push_back("-target");
CmdArgs.push_back(Args.MakeArgString(Triple.getTriple()));		CmdArgs.push_back(Args.MakeArgString(Triple.getTriple()));

// Add the output file name.		// Add the output file name.
assert(Output.isFilename() && "Invalid output.");		assert(Output.isFilename() && "Invalid output.");
CmdArgs.push_back("-o");		CmdArgs.push_back("-o");
CmdArgs.push_back(Output.getFilename());		CmdArgs.push_back(Output.getFilename());

// Add inputs.		auto TCs = C.getOffloadToolChains<Action::OFK_OpenMP>();

		// Add runtime requirements on each image which includes the offload-arch
		auto II = TCs.first;
for (const InputInfo &I : Inputs) {		for (const InputInfo &I : Inputs) {
assert(I.isFilename() && "Invalid input.");		assert(I.isFilename() && "Invalid input.");
		if (I.getAction()) {
		auto TC = II->second;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto TC' can be declared as 'const auto TC' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto TC' can be declared as 'const auto *TC' [llvm-qualified-auto]…
		II++;
		std::string requirements("--requirements=");
		requirements.append(TC->getOffloadArch());
		// targetid could have user specified features such as :xnack-:sramecc+
		// so replace ":" with "__" in requirements used for
		// clang-offload-wrapper.
		size_t start_pos = 0;
		while ((start_pos = requirements.find(":", start_pos)) !=
		std::string::npos) {
		requirements.replace(start_pos, 1, "__");
		start_pos += 2;
		}

		// FIXME: Add other architecture requirements here
		CmdArgs.push_back(Args.MakeArgString(requirements.c_str()));
		}
CmdArgs.push_back(I.getFilename());		CmdArgs.push_back(I.getFilename());
}		}

C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::None(),		JA, *this, ResponseFileSupport::None(),
Args.MakeArgString(getToolChain().GetProgramPath(getShortName())),		Args.MakeArgString(getToolChain().GetProgramPath(getShortName())),
CmdArgs, Inputs, Output));		CmdArgs, Inputs, Output));
}		}

clang/lib/Driver/ToolChains/Cuda.h

	Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	namespace toolchains {			namespace toolchains {

	class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {			class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
	public:			public:
	CudaToolChain(const Driver &D, const llvm::Triple &Triple,			CudaToolChain(const Driver &D, const llvm::Triple &Triple,
	const ToolChain &HostTC, const llvm::opt::ArgList &Args,			const ToolChain &HostTC, const llvm::opt::ArgList &Args,
	const Action::OffloadKind OK);			const Action::OffloadKind OK);

				CudaToolChain(const Driver &D, const llvm::Triple &Triple,
				const ToolChain &HostTC, const llvm::opt::ArgList &Args,
				const Action::OffloadKind OK, const std::string OffloadArch);

	const llvm::Triple *getAuxTriple() const override {			const llvm::Triple *getAuxTriple() const override {
	return &HostTC.getTriple();			return &HostTC.getTriple();
	}			}

	std::string getInputFilename(const InputInfo &Input) const override;			std::string getInputFilename(const InputInfo &Input) const override;

	llvm::opt::DerivedArgList *			llvm::opt::DerivedArgList *
	TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,			TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert(TC.getTriple().isNVPTX() && "Wrong platform");

StringRef GPUArchName;		StringRef GPUArchName;
// If this is an OpenMP action we need to extract the device architecture		// If this is an OpenMP action we need to extract the device architecture
// from the -march=arch option. This option may come from -Xopenmp-target		// from the -march=arch option. This option may come from -Xopenmp-target
// flag or the default value.		// flag or the default value.
if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {		if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);		GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
		if (GPUArchName.empty())
		GPUArchName = TC.getOffloadArch();
assert(!GPUArchName.empty() && "Must have an architecture passed in.");		assert(!GPUArchName.empty() && "Must have an architecture passed in.");
} else		} else
GPUArchName = JA.getOffloadingArch();		GPUArchName = JA.getOffloadingArch();

// Obtain architecture from the action.		// Obtain architecture from the action.
CudaArch gpu_arch = StringToCudaArch(GPUArchName);		CudaArch gpu_arch = StringToCudaArch(GPUArchName);
assert(gpu_arch != CudaArch::UNKNOWN &&		assert(gpu_arch != CudaArch::UNKNOWN &&
"Device action expected to have an architecture.");		"Device action expected to have an architecture.");
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	void NVPTX::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
if (mustEmitDebugInfo(Args) == EmitSameDebugInfoAsHost)		if (mustEmitDebugInfo(Args) == EmitSameDebugInfoAsHost)
CmdArgs.push_back("-g");		CmdArgs.push_back("-g");

if (Args.hasArg(options::OPT_v))		if (Args.hasArg(options::OPT_v))
CmdArgs.push_back("-v");		CmdArgs.push_back("-v");

StringRef GPUArch =		StringRef GPUArch =
Args.getLastArgValue(options::OPT_march_EQ);		Args.getLastArgValue(options::OPT_march_EQ);
		if (GPUArch.empty())
		GPUArch = getToolChain().getOffloadArch();

assert(!GPUArch.empty() && "At least one GPU Arch required for ptxas.");		assert(!GPUArch.empty() && "At least one GPU Arch required for ptxas.");

CmdArgs.push_back("-arch");		CmdArgs.push_back("-arch");
CmdArgs.push_back(Args.MakeArgString(GPUArch));		CmdArgs.push_back(Args.MakeArgString(GPUArch));

// Add paths specified in LIBRARY_PATH environment variable as -L options.		// Add paths specified in LIBRARY_PATH environment variable as -L options.
addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");		addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (CudaInstallation.isValid()) {
CudaInstallation.WarnIfUnsupportedVersion();		CudaInstallation.WarnIfUnsupportedVersion();
getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));		getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
}		}
// Lookup binaries into the driver directory, this is used to		// Lookup binaries into the driver directory, this is used to
// discover the clang-offload-bundler executable.		// discover the clang-offload-bundler executable.
getProgramPaths().push_back(getDriver().Dir);		getProgramPaths().push_back(getDriver().Dir);
}		}

		CudaToolChain::CudaToolChain(const Driver &D, const llvm::Triple &Triple,
		const ToolChain &HostTC, const ArgList &Args,
		const Action::OffloadKind OK,
		const std::string OffloadArch)
		: ToolChain(D, Triple, Args), HostTC(HostTC),
		CudaInstallation(D, HostTC.getTriple(), Args), OK(OK) {
		if (CudaInstallation.isValid()) {
		CudaInstallation.WarnIfUnsupportedVersion();
		getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
		}
		// Lookup binaries into the driver directory, this is used to
		// discover the clang-offload-bundler executable.
		getProgramPaths().push_back(getDriver().Dir);
		setOffloadArch(OffloadArch);
		}

std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {		std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
// Only object files are changed, for example assembly files keep their .s		// Only object files are changed, for example assembly files keep their .s
// extensions. CUDA also continues to use .o as they don't use nvlink but		// extensions. CUDA also continues to use .o as they don't use nvlink but
// fatbinary.		// fatbinary.
if (!(OK == Action::OFK_OpenMP && Input.getType() == types::TY_Object))		if (!(OK == Action::OFK_OpenMP && Input.getType() == types::TY_Object))
return ToolChain::getInputFilename(Input);		return ToolChain::getInputFilename(Input);

// Replace extension for object files with cubin because nvlink relies on		// Replace extension for object files with cubin because nvlink relies on
// these particular file names.		// these particular file names.
SmallString<256> Filename(ToolChain::getInputFilename(Input));		SmallString<256> Filename(ToolChain::getInputFilename(Input));
llvm::sys::path::replace_extension(Filename, "cubin");		llvm::sys::path::replace_extension(Filename, "cubin");
return std::string(Filename.str());		return std::string(Filename.str());
}		}

void CudaToolChain::addClangTargetOptions(		void CudaToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
		if (GpuArch.empty())
		GpuArch = getOffloadArch();
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|		assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|
DeviceOffloadingKind == Action::OFK_Cuda) &&		DeviceOffloadingKind == Action::OFK_Cuda) &&
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");		"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");

if (DeviceOffloadingKind == Action::OFK_Cuda) {		if (DeviceOffloadingKind == Action::OFK_Cuda) {
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	for (Arg *A : Args) {
}		}
}		}
if (!IsDuplicate)		if (!IsDuplicate)
DAL->append(A);		DAL->append(A);
}		}

StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);		StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
if (Arch.empty())		if (Arch.empty())
		Arch = getOffloadArch();
		if (Arch.empty())
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
CLANG_OPENMP_NVPTX_DEFAULT_ARCH);		CLANG_OPENMP_NVPTX_DEFAULT_ARCH);

return DAL;		return DAL;
}		}

for (Arg *A : Args) {		for (Arg *A : Args) {
DAL->append(A);		DAL->append(A);
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

	Show All 9 Lines
	// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty			// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty
	// RUN: chmod +x %t/amdgpu_arch_fail			// RUN: chmod +x %t/amdgpu_arch_fail
	// RUN: chmod +x %t/amdgpu_arch_different			// RUN: chmod +x %t/amdgpu_arch_different
	// RUN: chmod +x %t/amdgpu_arch_empty			// RUN: chmod +x %t/amdgpu_arch_empty

	// case when amdgpu_arch returns nothing or fails			// case when amdgpu_arch returns nothing or fails
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_fail %s 2>&1 \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_fail %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR			// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR
	// NO-OUTPUT-ERROR: error: Cannot determine AMDGPU architecture{{.*}}Exited with error code 1. Consider passing it via --march			// NO-OUTPUT-ERROR: fatal error: The option -fopenmp-targets= requires additional options -Xopenmp-target= and -march=

	// case when amdgpu_arch returns multiple gpus but all are different
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_different %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=MULTIPLE-OUTPUT-ERROR
	// MULTIPLE-OUTPUT-ERROR: error: Cannot determine AMDGPU architecture: Multiple AMD GPUs found with different archs. Consider passing it via --march

	// case when amdgpu_arch does not return anything with successful execution			// case when amdgpu_arch does not return anything with successful execution
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_empty %s 2>&1 \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_empty %s 2>&1 \
	// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT			// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT
	// EMPTY-OUTPUT: error: Cannot determine AMDGPU architecture: No AMD GPU detected in the system. Consider passing it via --march			// EMPTY-OUTPUT: fatal error: The option -fopenmp-targets= requires additional options -Xopenmp-target= and -march=

clang/test/Driver/amdgpu-openmp-toolchain.c

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: env LIBRARY_PATH=%S/Inputs/hip_dev_lib %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \			// RUN: env LIBRARY_PATH=%S/Inputs/hip_dev_lib %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// verify the tools invocations			// verify the tools invocations
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}
	// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-emit-llvm-bc"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device"{{.}}"-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx906.bc"{{.*}}			// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-emit-llvm-bc"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device"{{.}}"-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx906.bc"{{.*}}
	// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc"			// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc"
	// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"			// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
	// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"			// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
	// CHECK: clang-offload-wrapper{{.}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a-{{.}}.bc" {{.}}amdgpu-openmp-toolchain-{{.*}}.out"			// CHECK: clang-offload-wrapper{{.}}" "-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a_{{.}}.bc" "--requirements=gfx906" "{{.}}amdgpu-openmp-toolchain-{{.*}}.out"
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a-{{.}}.o" "-x" "ir" "{{.}}a-{{.}}.bc"			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a_{{.}}.o" "-x" "ir" "{{.}}a_{{.}}.bc"
	// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a-{{.}}.o" "-lomp" "-lomptarget"			// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a_{{.}}.o" "-lomp" "-lomptarget"

	// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \			// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s			// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s
	// phases			// phases
	// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)			// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
	// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)			// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
	// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)			// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
	// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)			// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
	// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)			// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
	// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)			// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
	// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)			// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
	// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)			// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
	// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir			// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
	// CHECK-PHASES: 9: backend, {8}, assembler, (device-openmp)			// CHECK-PHASES: 9: linker, {8}, image, (device-openmp)
	// CHECK-PHASES: 10: assembler, {9}, object, (device-openmp)			// CHECK-PHASES: 10: offload, "device-openmp (amdgcn-amd-amdhsa)" {9}, image
	// CHECK-PHASES: 11: linker, {10}, image, (device-openmp)			// CHECK-PHASES: 11: clang-offload-wrapper, {10}, ir, (host-openmp)
	// CHECK-PHASES: 12: offload, "device-openmp (amdgcn-amd-amdhsa)" {11}, image			// CHECK-PHASES: 12: backend, {11}, assembler, (host-openmp)
	// CHECK-PHASES: 13: clang-offload-wrapper, {12}, ir, (host-openmp)			// CHECK-PHASES: 13: assembler, {12}, object, (host-openmp)
	// CHECK-PHASES: 14: backend, {13}, assembler, (host-openmp)			// CHECK-PHASES: 14: linker, {4, 13}, image, (host-openmp)
	// CHECK-PHASES: 15: assembler, {14}, object, (host-openmp)
	// CHECK-PHASES: 16: linker, {4, 15}, image, (host-openmp)

	// handling of --libomptarget-amdgcn-bc-path			// handling of --libomptarget-amdgcn-bc-path
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgcn-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgcn-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET
	// CHECK-LIBOMPTARGET: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc"{{.*}}			// CHECK-LIBOMPTARGET: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc"{{.*}}

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB
	// CHECK-NOGPULIB-NOT: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx803.bc"{{.*}}			// CHECK-NOGPULIB-NOT: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx803.bc"{{.*}}

	Show All 23 Lines
	// CHECK-C: "x86_64-unknown-linux-gnu" - "clang",{{.}}output: "[[HOST_BC:.]]"			// CHECK-C: "x86_64-unknown-linux-gnu" - "clang",{{.}}output: "[[HOST_BC:.]]"
	// CHECK-C: "amdgcn-amd-amdhsa" - "clang",{{.}}output: "[[DEVICE_I:.]]"			// CHECK-C: "amdgcn-amd-amdhsa" - "clang",{{.}}output: "[[DEVICE_I:.]]"
	// CHECK-C: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[DEVICE_I]]", "[[HOST_BC]]"]			// CHECK-C: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[DEVICE_I]]", "[[HOST_BC]]"]
	// CHECK-C: "x86_64-unknown-linux-gnu" - "clang"			// CHECK-C: "x86_64-unknown-linux-gnu" - "clang"
	// CHECK-C: "x86_64-unknown-linux-gnu" - "clang::as"			// CHECK-C: "x86_64-unknown-linux-gnu" - "clang::as"
	// CHECK-C: "x86_64-unknown-linux-gnu" - "offload bundler"			// CHECK-C: "x86_64-unknown-linux-gnu" - "offload bundler"

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
	// CHECK-EMIT-LLVM-IR: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"			// CHECK-EMIT-LLVM-IR: clang{{.}}" "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.}}"-emit-llvm"

clang/test/Driver/hip-rdc-device-only.hip

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	// COMMON-SAME: "-fapply-global-visibility-to-externs"			// COMMON-SAME: "-fapply-global-visibility-to-externs"
	// COMMON-SAME: "-target-cpu" "gfx900"			// COMMON-SAME: "-target-cpu" "gfx900"
	// COMMON-SAME: "-fgpu-rdc"			// COMMON-SAME: "-fgpu-rdc"
	// EMITBC-SAME: {{.}} "-o" {{".a.*bc"}} "-x" "hip"			// EMITBC-SAME: {{.}} "-o" {{".a.*bc"}} "-x" "hip"
	// EMITLL-SAME: {{.}} "-o" {{".a.*ll"}} "-x" "hip"			// EMITLL-SAME: {{.}} "-o" {{".a.*ll"}} "-x" "hip"
	// COMMON-SAME: {{.}} {{".a.cu"}}			// COMMON-SAME: {{.}} {{".a.cu"}}

	// COMMON: "{{.*}}clang-offload-bundler" "-type={{(bc\|ll)}}"			// COMMON: "{{.*}}clang-offload-bundler" "-type={{(bc\|ll)}}"
	// COMMON-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// COMMON-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// COMMON-SAME: "-outputs=a-hip-amdgcn-amd-amdhsa.{{(bc\|ll)}}"			// COMMON-SAME: "-outputs=a-hip-amdgcn-amd-amdhsa.{{(bc\|ll)}}"

	// COMMON: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"			// COMMON: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
	// COMMON-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// COMMON-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
	// EMITBC-SAME: "-emit-llvm-bc"			// EMITBC-SAME: "-emit-llvm-bc"
	// EMITLL-SAME: "-emit-llvm"			// EMITLL-SAME: "-emit-llvm"
	// COMMON-SAME: {{.*}} "-main-file-name" "b.hip"			// COMMON-SAME: {{.*}} "-main-file-name" "b.hip"
	// COMMON-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions" "-fvisibility" "hidden"			// COMMON-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions" "-fvisibility" "hidden"
	Show All 13 Lines
	// COMMON-SAME: "-fapply-global-visibility-to-externs"			// COMMON-SAME: "-fapply-global-visibility-to-externs"
	// COMMON-SAME: "-target-cpu" "gfx900"			// COMMON-SAME: "-target-cpu" "gfx900"
	// COMMON-SAME: "-fgpu-rdc"			// COMMON-SAME: "-fgpu-rdc"
	// EMITBC-SAME: {{.}} "-o" {{".b.*bc"}} "-x" "hip"			// EMITBC-SAME: {{.}} "-o" {{".b.*bc"}} "-x" "hip"
	// EMITLL-SAME: {{.}} "-o" {{".b.*ll"}} "-x" "hip"			// EMITLL-SAME: {{.}} "-o" {{".b.*ll"}} "-x" "hip"
	// COMMON-SAME: {{.}} {{".b.hip"}}			// COMMON-SAME: {{.}} {{".b.hip"}}

	// COMMON: "{{.*}}clang-offload-bundler" "-type={{(bc\|ll)}}"			// COMMON: "{{.*}}clang-offload-bundler" "-type={{(bc\|ll)}}"
	// COMMON-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// COMMON-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// COMMON-SAME: "-outputs=b-hip-amdgcn-amd-amdhsa.{{(bc\|ll)}}"			// COMMON-SAME: "-outputs=b-hip-amdgcn-amd-amdhsa.{{(bc\|ll)}}"

	// SAVETEMP: [[CLANG:".clang."]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP: [[CLANG:".clang."]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-E"			// SAVETEMP-SAME: "-E"
	// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx803"			// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx803"
	// SAVETEMP-SAME: {{.}} "-o" [[A_GFX803_CUI:"a.cui"]] "-x" "hip" {{".*a.cu"}}			// SAVETEMP-SAME: {{.}} "-o" [[A_GFX803_CUI:"a.cui"]] "-x" "hip" {{".*a.cu"}}
	// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-emit-llvm-bc"			// SAVETEMP-SAME: "-emit-llvm-bc"
	Show All 13 Lines
	// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx900"			// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx900"
	// SAVETEMP-SAME: {{.}} "-o" [[A_GFX900_TMP_BC:"a.tmp.bc"]] "-x" "hip-cpp-output" [[A_GFX900_CUI]]			// SAVETEMP-SAME: {{.}} "-o" [[A_GFX900_TMP_BC:"a.tmp.bc"]] "-x" "hip-cpp-output" [[A_GFX900_CUI]]
	// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-emit-llvm"			// SAVETEMP-SAME: "-emit-llvm"
	// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx900"			// SAVETEMP-SAME: {{.}} "-main-file-name" "a.cu" {{.}} "-target-cpu" "gfx900"
	// SAVETEMP-SAME: {{.}} "-o" {{"a..ll"}} "-x" "ir" [[A_GFX900_TMP_BC]]			// SAVETEMP-SAME: {{.}} "-o" {{"a..ll"}} "-x" "ir" [[A_GFX900_TMP_BC]]

	// SAVETEMP: "{{.*}}clang-offload-bundler" "-type=ll"			// SAVETEMP: "{{.*}}clang-offload-bundler" "-type=ll"
	// SAVETEMP-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// SAVETEMP-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// SAVETEMP-SAME: "-outputs=a-hip-amdgcn-amd-amdhsa.ll"			// SAVETEMP-SAME: "-outputs=a-hip-amdgcn-amd-amdhsa.ll"

	// SAVETEMP: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-E"			// SAVETEMP-SAME: "-E"
	// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx803"			// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx803"
	// SAVETEMP-SAME: {{.}} "-o" [[B_GFX803_CUI:"b.cui"]] "-x" "hip" {{".*b.hip"}}			// SAVETEMP-SAME: {{.}} "-o" [[B_GFX803_CUI:"b.cui"]] "-x" "hip" {{".*b.hip"}}
	// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-emit-llvm-bc"			// SAVETEMP-SAME: "-emit-llvm-bc"
	Show All 13 Lines
	// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx900"			// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx900"
	// SAVETEMP-SAME: {{.}} "-o" [[B_GFX900_TMP_BC:"b.tmp.bc"]] "-x" "hip-cpp-output" [[B_GFX900_CUI]]			// SAVETEMP-SAME: {{.}} "-o" [[B_GFX900_TMP_BC:"b.tmp.bc"]] "-x" "hip-cpp-output" [[B_GFX900_CUI]]
	// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"			// SAVETEMP-NEXT: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"
	// SAVETEMP-SAME: "-emit-llvm"			// SAVETEMP-SAME: "-emit-llvm"
	// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx900"			// SAVETEMP-SAME: {{.}} "-main-file-name" "b.hip" {{.}} "-target-cpu" "gfx900"
	// SAVETEMP-SAME: {{.}} "-o" {{"b..ll"}} "-x" "ir" [[B_GFX900_TMP_BC]]			// SAVETEMP-SAME: {{.}} "-o" {{"b..ll"}} "-x" "ir" [[B_GFX900_TMP_BC]]

	// SAVETEMP: "{{.*}}clang-offload-bundler" "-type=ll"			// SAVETEMP: "{{.*}}clang-offload-bundler" "-type=ll"
	// SAVETEMP-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// SAVETEMP-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// SAVETEMP-SAME: "-outputs=b-hip-amdgcn-amd-amdhsa.ll"			// SAVETEMP-SAME: "-outputs=b-hip-amdgcn-amd-amdhsa.ll"

	// FAIL: error: cannot specify -o when generating multiple output files			// FAIL: error: cannot specify -o when generating multiple output files

clang/test/Driver/hip-toolchain-rdc-separate.hip

	Show All 38 Lines
	// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"			// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"
	// CHECK-SAME: "-emit-obj"			// CHECK-SAME: "-emit-obj"
	// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"			// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
	// CHECK-SAME: "-fgpu-rdc"			// CHECK-SAME: "-fgpu-rdc"
	// CHECK-SAME: {{.}} "-o" "[[A_OBJ_HOST:.o]]" "-x" "hip"			// CHECK-SAME: {{.}} "-o" "[[A_OBJ_HOST:.o]]" "-x" "hip"
	// CHECK-SAME: {{.*}} [[A_SRC]]			// CHECK-SAME: {{.*}} [[A_SRC]]

	// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// CHECK-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900,host-x86_64-unknown-linux-gnu"			// CHECK-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900,host-x86_64-unknown-linux-gnu"
	// CHECK-SAME: "-outputs=[[A_O:.*a.o]]" "-inputs=[[A_BC1]],[[A_BC2]],[[A_OBJ_HOST]]"			// CHECK-SAME: "-outputs=[[A_O:.*a.o]]" "-inputs=[[A_BC1]],[[A_BC2]],[[A_OBJ_HOST]]"

	// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"			// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
	// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
	// CHECK-SAME: "-emit-llvm-bc"			// CHECK-SAME: "-emit-llvm-bc"
	// CHECK-SAME: {{.*}} "-main-file-name" "b.hip"			// CHECK-SAME: {{.*}} "-main-file-name" "b.hip"
	// CHECK-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions" "-fvisibility" "hidden"			// CHECK-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions" "-fvisibility" "hidden"
	// CHECK-SAME: "-fapply-global-visibility-to-externs"			// CHECK-SAME: "-fapply-global-visibility-to-externs"
	Show All 18 Lines
	// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"			// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"
	// CHECK-SAME: "-emit-obj"			// CHECK-SAME: "-emit-obj"
	// CHECK-SAME: {{.*}} "-main-file-name" "b.hip"			// CHECK-SAME: {{.*}} "-main-file-name" "b.hip"
	// CHECK-SAME: "-fgpu-rdc"			// CHECK-SAME: "-fgpu-rdc"
	// CHECK-SAME: {{.}} "-o" "[[B_OBJ_HOST:.o]]" "-x" "hip"			// CHECK-SAME: {{.}} "-o" "[[B_OBJ_HOST:.o]]" "-x" "hip"
	// CHECK-SAME: {{.*}} [[B_SRC]]			// CHECK-SAME: {{.*}} [[B_SRC]]

	// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// CHECK-SAME: "-targets=hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900,host-x86_64-unknown-linux-gnu"			// CHECK-SAME: "-targets=hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900,host-x86_64-unknown-linux-gnu"
	// CHECK-SAME: "-outputs=[[B_O:.*b.o]]" "-inputs=[[B_BC1]],[[B_BC2]],[[B_OBJ_HOST]]"			// CHECK-SAME: "-outputs=[[B_O:.*b.o]]" "-inputs=[[B_BC1]],[[B_BC2]],[[B_OBJ_HOST]]"

	// RUN: touch %T/a.o			// RUN: touch %T/a.o
	// RUN: touch %T/b.o			// RUN: touch %T/b.o
	// RUN: %clang --hip-link -### -target x86_64-linux-gnu \			// RUN: %clang --hip-link -### -target x86_64-linux-gnu \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \			// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
	// RUN: -fuse-ld=lld -fgpu-rdc -nogpuinc \			// RUN: -fuse-ld=lld -fgpu-rdc -nogpuinc \
	// RUN: %T/a.o %T/b.o \			// RUN: %T/a.o %T/b.o \
	// RUN: 2>&1 \| FileCheck -check-prefix=LINK %s			// RUN: 2>&1 \| FileCheck -check-prefix=LINK %s

	// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// LINK-SAME: "-inputs=[[A_O:.a.o]]" "-outputs=[[A_OBJ_HOST:.o]],{{.o}},{{.o}}"			// LINK-SAME: "-inputs=[[A_O:.a.o]]" "-outputs=[[A_OBJ_HOST:.o]],{{.o}},{{.o}}"
	// LINK: "-unbundle" "-allow-missing-bundles"			// LINK: "-unbundle" "-allow-missing-bundles"

	// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// LINK-SAME: "-inputs=[[B_O:.b.o]]" "-outputs=[[B_OBJ_HOST:.o]],{{.o}},{{.o}}"			// LINK-SAME: "-inputs=[[B_O:.b.o]]" "-outputs=[[B_OBJ_HOST:.o]],{{.o}},{{.o}}"
	// LINK: "-unbundle" "-allow-missing-bundles"			// LINK: "-unbundle" "-allow-missing-bundles"

	// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// LINK-SAME: "-inputs=[[A_O]]" "-outputs={{.o}},[[A_BC1:.o]],[[A_BC2:.*o]]"			// LINK-SAME: "-inputs=[[A_O]]" "-outputs={{.o}},[[A_BC1:.o]],[[A_BC2:.*o]]"
	// LINK: "-unbundle" "-allow-missing-bundles"			// LINK: "-unbundle" "-allow-missing-bundles"

	// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"			// LINK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
	// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa--gfx803,hip-amdgcn-amd-amdhsa--gfx900"			// LINK-SAME: "-targets=host-x86_64-unknown-linux-gnu,hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900"
	// LINK-SAME: "-inputs=[[B_O]]" "-outputs={{.o}},[[B_BC1:.o]],[[B_BC2:.*o]]"			// LINK-SAME: "-inputs=[[B_O]]" "-outputs={{.o}},[[B_BC1:.o]],[[B_BC2:.*o]]"
	// LINK: "-unbundle" "-allow-missing-bundles"			// LINK: "-unbundle" "-allow-missing-bundles"

	// LINK-NOT: "*.llvm-link"			// LINK-NOT: "*.llvm-link"
	// LINK-NOT: ".*opt"			// LINK-NOT: ".*opt"
	// LINK-NOT: ".*llc"			// LINK-NOT: ".*llc"
	// LINK: {{".lld."}} {{.*}} "-plugin-opt=-amdgpu-internalize-symbols"			// LINK: {{".lld."}} {{.*}} "-plugin-opt=-amdgpu-internalize-symbols"
	// LINK: "-plugin-opt=mcpu=gfx803"			// LINK: "-plugin-opt=mcpu=gfx803"
	Show All 17 Lines

clang/test/Driver/openmp-offload-multi.c

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				//
				// Legacy mode (-fopenmp-targets,-Xopenmp-target,-march) tests for
				// multi arch compilation
				//
				// RUN: %clang -### -target x86_64-linux-gnu -fopenmp\
				// RUN: -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa \
				// RUN: -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 \
				// RUN: -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 \
				// RUN: %s 2>&1 \| FileCheck %s

				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}
				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "[[HOSTOBJ:..o]]" "-x" "ir"{{.}}

				// compilation for offload target 1 : gfx906
				// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-emit-llvm-bc"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device"{{.}}"-o" "{{.}}.bc" "-x" "c"{{.*}}.c
				// CHECK: llvm-link"{{.}}openmp-offload-multi-{{.}}.bc"{{.}}"-o" "{{.}}openmp-offload-multi-{{.}}-gfx906-linked-{{.}}.bc"
				// CHECK: llc{{.}}openmp-offload-multi-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj"{{.}}"-o"{{.}}openmp-offload-multi-{{.}}-gfx906-{{.*}}.o"
				// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o" "[[GFX906OUT:..out]]" "{{.}}openmp-offload-multi-{{.}}-gfx906-{{.*}}.o"

				// compilation for offload target 1 : gfx908
				// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-emit-llvm-bc"{{.}}"-target-cpu" "gfx908" "-fcuda-is-device"{{.}}"-o" "{{.}}.bc" "-x" "c"{{.*}}.c
				// CHECK: llvm-link"{{.}}openmp-offload-multi-{{.}}.bc"{{.}}"-o" "{{.}}openmp-offload-multi-{{.}}-gfx908-linked-{{.}}.bc"
				// CHECK: llc{{.}}openmp-offload-multi-{{.}}-gfx908-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx908" "-filetype=obj"{{.}}"-o"{{.}}openmp-offload-multi-{{.}}-gfx908-{{.*}}.o"
				// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o" "[[GFX908OUT:..out]]" "{{.}}openmp-offload-multi-{{.}}-gfx908-{{.*}}.o"

				// Combining device images for offload targets
				// CHECK: clang-offload-wrapper"{{.}}" "-o" "[[COMBINEDIR:..bc]]" "--requirements=gfx906" "[[GFX906OUT]]" "--requirements=gfx908" "[[GFX908OUT]]"

				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}} "-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa"{{.}}"-o" "[[COMBINEDOBJ:..o]]" "-x" "ir" "[[COMBINEDIR]]"
				// CHECK: ld.lld"{{.}}" "-o" "a.out{{.}}[[HOSTOBJ]]" "[[COMBINEDOBJ]]{{.}}" "-lomp{{.}}-lomptarget"

clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static cl::list<std::string> Inputs(cl::Positional, cl::OneOrMore,
cl::desc("<input files>"),		cl::desc("<input files>"),
cl::cat(ClangOffloadWrapperCategory));		cl::cat(ClangOffloadWrapperCategory));

static cl::opt<std::string>		static cl::opt<std::string>
Target("target", cl::Required,		Target("target", cl::Required,
cl::desc("Target triple for the output module"),		cl::desc("Target triple for the output module"),
cl::value_desc("triple"), cl::cat(ClangOffloadWrapperCategory));		cl::value_desc("triple"), cl::cat(ClangOffloadWrapperCategory));

		static cl::list<std::string>
		OffloadArchs("requirements", cl::desc("requirements contains offload-arch"),
		cl::value_desc("requirements"),
		cl::cat(ClangOffloadWrapperCategory));

namespace {		namespace {

class BinaryWrapper {		class BinaryWrapper {
LLVMContext C;		LLVMContext C;
Module M;		Module M;

StructType *EntryTy = nullptr;		StructType *EntryTy = nullptr;
StructType *ImageTy = nullptr;		StructType *ImageTy = nullptr;
StructType *DescTy = nullptr;		StructType *DescTy = nullptr;
		StructType *ImageInfoTy = nullptr;

private:		private:
IntegerType *getSizeTTy() {		IntegerType *getSizeTTy() {
switch (M.getDataLayout().getPointerTypeSize(Type::getInt8PtrTy(C))) {		switch (M.getDataLayout().getPointerTypeSize(Type::getInt8PtrTy(C))) {
case 4u:		case 4u:
return Type::getInt32Ty(C);		return Type::getInt32Ty(C);
case 8u:		case 8u:
return Type::getInt64Ty(C);		return Type::getInt64Ty(C);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (!DescTy)
getEntryPtrTy());		getEntryPtrTy());
return DescTy;		return DescTy;
}		}

PointerType *getBinDescPtrTy() {		PointerType *getBinDescPtrTy() {
return PointerType::getUnqual(getBinDescTy());		return PointerType::getUnqual(getBinDescTy());
}		}

		// This matches the runtime struct definition of __tgt_image_info
		// declared in openmp/libomptarget/include/omptarget.h /
		// struct __tgt_image_info {
		// int32_t version;
		// int32_t image_number;
		// int32_t number_images;
		// char* requirements;
		// char* target_compile_opts;
		// };
		StructType *getImageInfoTy() {
		if (!ImageInfoTy)
		ImageInfoTy = StructType::create(
		"__tgt_image_info", Type::getInt32Ty(C), Type::getInt32Ty(C),
		Type::getInt32Ty(C), Type::getInt8PtrTy(C), Type::getInt8PtrTy(C));
		return ImageInfoTy;
		}

		PointerType *getImageInfoPtrTy() {
		return PointerType::getUnqual(getImageInfoTy());
		}

/// Creates binary descriptor for the given device images. Binary descriptor		/// Creates binary descriptor for the given device images. Binary descriptor
/// is an object that is passed to the offloading runtime at program startup		/// is an object that is passed to the offloading runtime at program startup
/// and it describes all device images available in the executable or shared		/// and it describes all device images available in the executable or shared
/// library. It is defined as follows		/// library. It is defined as follows
///		///
/// __attribute__((visibility("hidden")))		/// __attribute__((visibility("hidden")))
/// extern __tgt_offload_entry *__start_omp_offloading_entries;		/// extern __tgt_offload_entry *__start_omp_offloading_entries;
/// __attribute__((visibility("hidden")))		/// __attribute__((visibility("hidden")))
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	auto *DescInit = ConstantStruct::get(
ConstantInt::get(Type::getInt32Ty(C), ImagesInits.size()), ImagesB,		ConstantInt::get(Type::getInt32Ty(C), ImagesInits.size()), ImagesB,
EntriesB, EntriesE);		EntriesB, EntriesE);

return new GlobalVariable(M, DescInit->getType(), /isConstant/ true,		return new GlobalVariable(M, DescInit->getType(), /isConstant/ true,
GlobalValue::InternalLinkage, DescInit,		GlobalValue::InternalLinkage, DescInit,
".omp_offloading.descriptor");		".omp_offloading.descriptor");
}		}

void createRegisterFunction(GlobalVariable *BinDesc) {		void createRegisterFunction(GlobalVariable *BinDesc,
		ArrayRef<ArrayRef<char>> Requirements) {

auto FuncTy = FunctionType::get(Type::getVoidTy(C), /isVarArg*/ false);		auto FuncTy = FunctionType::get(Type::getVoidTy(C), /isVarArg*/ false);
auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,		auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
".omp_offloading.descriptor_reg", &M);		".omp_offloading.descriptor_reg", &M);
Func->setSection(".text.startup");		Func->setSection(".text.startup");

// Get __tgt_register_lib function declaration.		// Get __tgt_register_lib function declaration.
auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(),		auto *RegFuncTy = FunctionType::get(Type::getVoidTy(C), getBinDescPtrTy(),
/isVarArg/ false);		/isVarArg/ false);
FunctionCallee RegFuncC =		FunctionCallee RegFuncC =
M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);		M.getOrInsertFunction("__tgt_register_lib", RegFuncTy);

// Construct function body		// Construct function body
IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));		IRBuilder<> Builder(BasicBlock::Create(C, "entry", Func));

		// Create calls to __tgt_register_image_info for each image
		auto *NullPtr = llvm::ConstantPointerNull::get(Builder.getInt8PtrTy());
		auto *Zero = ConstantInt::get(getSizeTTy(), 0u);
		auto *RegInfoFuncTy =
		FunctionType::get(Type::getVoidTy(C), getImageInfoPtrTy(), false);
		FunctionCallee RegInfoFuncC =
		M.getOrInsertFunction("__tgt_register_image_info", RegInfoFuncTy);
		unsigned int img_count = 0;
		for (ArrayRef<char> Requirement : Requirements) {
		Constant *RequirementV = ConstantDataArray::get(C, Requirement);
		auto *GV =
		new GlobalVariable(M, RequirementV->getType(), /isConstant/ true,
		GlobalValue::InternalLinkage, RequirementV,
		Twine("__offload_arch_" + Twine(img_count)));
		GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);

		// store value of these variables (i.e. offload archs) into a custom
		// section which will be used by "offload-arch -f". It won't be
		// removed during binary stripping.
		GV->setSection(".offload_arch_list");

		auto *RequirementVPtr =
		ConstantExpr::getGetElementPtr(GV->getValueType(), GV, Zero);
		RequirementVPtr =
		ConstantExpr::getBitCast(RequirementVPtr, Type::getInt8PtrTy(C));
		auto *InfoInit = ConstantStruct::get(
		getImageInfoTy(), ConstantInt::get(Type::getInt32Ty(C), 1),
		ConstantInt::get(Type::getInt32Ty(C), img_count),
		ConstantInt::get(Type::getInt32Ty(C), (uint32_t)Requirements.size()),
		RequirementVPtr,
		NullPtr // TODO: capture target-compile-opts from clang driver
		);
		auto *ImageInfoGV = new GlobalVariable(
		M, InfoInit->getType(),
		/isConstant/ true, GlobalValue::InternalLinkage, InfoInit,
		Twine(".offload_image_info_" + Twine(img_count++)));
		ImageInfoGV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
		Builder.CreateCall(RegInfoFuncC, ImageInfoGV);
		}

Builder.CreateCall(RegFuncC, BinDesc);		Builder.CreateCall(RegFuncC, BinDesc);
Builder.CreateRetVoid();		Builder.CreateRetVoid();

// Add this function to constructors.		// Add this function to constructors.
// Set priority to 1 so that __tgt_register_lib is executed AFTER		// Set priority to 1 so that __tgt_register_lib is executed AFTER
// __tgt_register_requires (we want to know what requirements have been		// __tgt_register_requires (we want to know what requirements have been
// asked for before we load a libomptarget plugin so that by the time the		// asked for before we load a libomptarget plugin so that by the time the
// plugin is loaded it can report how many devices there are which can		// plugin is loaded it can report how many devices there are which can
Show All 23 Lines	void createUnregisterFunction(GlobalVariable *BinDesc) {
appendToGlobalDtors(M, Func, /Priority/ 1);		appendToGlobalDtors(M, Func, /Priority/ 1);
}		}

public:		public:
BinaryWrapper(StringRef Target) : M("offload.wrapper.object", C) {		BinaryWrapper(StringRef Target) : M("offload.wrapper.object", C) {
M.setTargetTriple(Target);		M.setTargetTriple(Target);
}		}

const Module &wrapBinaries(ArrayRef<ArrayRef<char>> Binaries) {		const Module &wrapBinaries(ArrayRef<ArrayRef<char>> Binaries,
		ArrayRef<ArrayRef<char>> Requirements) {
GlobalVariable *Desc = createBinDesc(Binaries);		GlobalVariable *Desc = createBinDesc(Binaries);
assert(Desc && "no binary descriptor");		assert(Desc && "no binary descriptor");
createRegisterFunction(Desc);		createRegisterFunction(Desc, Requirements);
createUnregisterFunction(Desc);		createUnregisterFunction(Desc);
return M;		return M;
}		}
};		};

} // anonymous namespace		} // anonymous namespace

int main(int argc, const char **argv) {		int main(int argc, const char **argv) {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	int main(int argc, const char **argv) {
// Create the output file to write the resulting bitcode to.		// Create the output file to write the resulting bitcode to.
std::error_code EC;		std::error_code EC;
ToolOutputFile Out(Output, EC, sys::fs::OF_None);		ToolOutputFile Out(Output, EC, sys::fs::OF_None);
if (EC) {		if (EC) {
reportError(createFileError(Output, EC));		reportError(createFileError(Output, EC));
return 1;		return 1;
}		}

		SmallVector<ArrayRef<char>, 4u> Requirements;
		Requirements.reserve(OffloadArchs.size());
		for (unsigned i = 0; i != OffloadArchs.size(); ++i) {
		OffloadArchs[i].append("\0");
		Requirements.emplace_back(OffloadArchs[i].data(),
		OffloadArchs[i].size() + 1);
		}

// Create a wrapper for device binaries and write its bitcode to the file.		// Create a wrapper for device binaries and write its bitcode to the file.
WriteBitcodeToFile(BinaryWrapper(Target).wrapBinaries(		WriteBitcodeToFile(
makeArrayRef(Images.data(), Images.size())),		BinaryWrapper(Target).wrapBinaries(
		makeArrayRef(Images.data(), Images.size()),
		makeArrayRef(Requirements.data(), Requirements.size())),
Out.os());		Out.os());
if (Out.os().has_error()) {		if (Out.os().has_error()) {
reportError(createFileError(Output, Out.os().error()));		reportError(createFileError(Output, Out.os().error()));
return 1;		return 1;
}		}

// Success.		// Success.
Out.keep();		Out.keep();
return 0;		return 0;
}		}

openmp/libomptarget/include/omptarget.h

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	/// target.			/// target.
	struct __tgt_bin_desc {			struct __tgt_bin_desc {
	int32_t NumDeviceImages; // Number of device types supported			int32_t NumDeviceImages; // Number of device types supported
	__tgt_device_image *DeviceImages; // Array of device images (1 per dev. type)			__tgt_device_image *DeviceImages; // Array of device images (1 per dev. type)
	__tgt_offload_entry *HostEntriesBegin; // Begin of table with all host entries			__tgt_offload_entry *HostEntriesBegin; // Begin of table with all host entries
	__tgt_offload_entry *HostEntriesEnd; // End of table (non inclusive)			__tgt_offload_entry *HostEntriesEnd; // End of table (non inclusive)
	};			};

				/// __tgt_image_info:
				///
				/// The information in this struct is provided in clang-offload-wrapper
				/// as a call to __tgt_register_image_info for each image in the library
				/// of images also created created by clang-offload-wrapper.
				/// __tgt_register_image_info is called for each image BEFORE the single
				/// call to __tgt_register_lib so that image information is available
				/// before they are loaded. clang-offload-wrapper gets this image information
				/// from command line arguments provided by the clang driver when it creates
				/// the call to the __clang-offload-wrapper command.
				/// This architecture allows the binary image (pointed to by ImageStart and
				/// ImageEnd in __tgt_device_image) to remain architecture indenendent.
				/// That is, the architecture independent part of the libomptarget runtime
				/// does not need to peer inside the image to determine if it is loadable
				/// even though in most cases the image is an elf object.
				/// There is one __tgt_image_info for each __tgt_device_image. For backward
				/// compabibility, no changes are allowed to either __tgt_device_image or
				/// __tgt_bin_desc. The absense of __tgt_image_info is the indication that
				/// the runtime is being used on a binary created by an old version of
				/// the compiler.
				///
				struct __tgt_image_info {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for class '__tgt_image_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for class '__tgt_image_info' [readability-identifier…
				int32_t version; // The version of this struct
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'version' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'version' [readability-identifier-naming]…
				int32_t image_number; // Image number in image library starting from 0
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'image_number' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'image_number' [readability-identifier…
				int32_t number_images; // Number of images, used for initial allocation
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'number_images' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'number_images' [readability-identifier…
				char *requirements; // e.g. sm_30, sm_70, gfx906, includes features
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'requirements' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'requirements' [readability-identifier…
				char *compile_opts; // reserved for future use
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'compile_opts' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'compile_opts' [readability-identifier…
				};

				/// __tgt_active_offload_env
				///
				/// This structure is created by __tgt_get_active_offload_env and is used
				/// to determine compatibility of the images with the current environment
				/// that is "in play".
				struct __tgt_active_offload_env {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for class '__tgt_active_offload_env' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for class '__tgt_active_offload_env' [readability…
				char *capabilities; // string returned by offload-arch -r
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'capabilities' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'capabilities' [readability-identifier…
				};

	/// This struct contains the offload entries identified by the target runtime			/// This struct contains the offload entries identified by the target runtime
	struct __tgt_target_table {			struct __tgt_target_table {
	__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries			__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries
	__tgt_offload_entry			__tgt_offload_entry
	*EntriesEnd; // End of the table with all the entries (non inclusive)			*EntriesEnd; // End of the table with all the entries (non inclusive)
	};			};

	// clang-format on			// clang-format on
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	void *llvm_omp_target_alloc_shared(size_t size, int device_num);			void *llvm_omp_target_alloc_shared(size_t size, int device_num);

	/// add the clauses of the requires directives in a given file			/// add the clauses of the requires directives in a given file
	void __tgt_register_requires(int64_t flags);			void __tgt_register_requires(int64_t flags);

	/// adds a target shared library to the target execution image			/// adds a target shared library to the target execution image
	void __tgt_register_lib(__tgt_bin_desc *desc);			void __tgt_register_lib(__tgt_bin_desc *desc);

				/// adds an image information struct, called for each image
				void __tgt_register_image_info(__tgt_image_info *imageInfo);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_register_image_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'imageInfo' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_register_image_info' [readability…

				/// gets pointer to image information for specified image number
				/// Returns nullptr for apps built with old version of compiler
				__tgt_image_info *__tgt_get_image_info(uint32_t image_num);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_get_image_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'image_num' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_get_image_info' [readability…

	/// removes a target shared library from the target execution image			/// removes a target shared library from the target execution image
	void __tgt_unregister_lib(__tgt_bin_desc *desc);			void __tgt_unregister_lib(__tgt_bin_desc *desc);

	// creates the host to target data mapping, stores it in the			// creates the host to target data mapping, stores it in the
	// libomptarget.so internal structure (an entry in a stack of data maps) and			// libomptarget.so internal structure (an entry in a stack of data maps) and
	// passes the data to the device;			// passes the data to the device;
	void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,			void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
	void args_base, void args, int64_t *arg_sizes,			void args_base, void args, int64_t *arg_sizes,
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

openmp/libomptarget/src/exports

	VERS1.0 {			VERS1.0 {
	global:			global:
	__tgt_register_requires;			__tgt_register_requires;
	__tgt_register_lib;			__tgt_register_lib;
				__tgt_register_image_info;
	__tgt_unregister_lib;			__tgt_unregister_lib;
	__tgt_target_data_begin;			__tgt_target_data_begin;
	__tgt_target_data_end;			__tgt_target_data_end;
	__tgt_target_data_update;			__tgt_target_data_update;
	__tgt_target;			__tgt_target;
	__tgt_target_teams;			__tgt_target_teams;
	__tgt_target_data_begin_nowait;			__tgt_target_data_begin_nowait;
	__tgt_target_data_end_nowait;			__tgt_target_data_end_nowait;
	Show All 34 Lines

openmp/libomptarget/src/interface.cpp

Show All 37 Lines	if (RTL.register_lib) {
if ((*RTL.register_lib)(desc) != OFFLOAD_SUCCESS) {		if ((*RTL.register_lib)(desc) != OFFLOAD_SUCCESS) {
DP("Could not register library with %s", RTL.RTLName.c_str());		DP("Could not register library with %s", RTL.RTLName.c_str());
}		}
}		}
}		}
PM->RTLs.RegisterLib(desc);		PM->RTLs.RegisterLib(desc);
}		}

		static __tgt_image_info **__tgt_AllImageInfos;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable '__tgt_AllImageInfos' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable '__tgt_AllImageInfos' [readability…
		static int __tgt_num_registered_images = 0;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable '__tgt_num_registered_images' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable '__tgt_num_registered_images' [readability…
		EXTERN void __tgt_register_image_info(__tgt_image_info *imageInfo) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'imageInfo' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'imageInfo' [readability-identifier…

		DP(" register_image_info image %d of %d requirements:%s VERSION:%d\n",
		imageInfo->image_number, imageInfo->number_images, imageInfo->requirements,
		imageInfo->version);

		if (!__tgt_AllImageInfos)
		__tgt_AllImageInfos = (__tgt_image_info **)malloc(
		sizeof(__tgt_image_info ) imageInfo->number_images);
		__tgt_AllImageInfos[imageInfo->image_number] = imageInfo;
		__tgt_num_registered_images = imageInfo->number_images;
		}

		////////////////////////////////////////////////////////////////////////////////
		/// Return pointer to image information if it was registered
		EXTERN __tgt_image_info *__tgt_get_image_info(unsigned image_number) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'image_number' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'image_number' [readability-identifier…
		if (__tgt_num_registered_images)
		return __tgt_AllImageInfos[image_number];
		else
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: do not use 'else' after 'return' [llvm-else-after-return] not useful Lint: Pre-merge checks: clang-tidy: warning: do not use 'else' after 'return' [llvm-else-after-return] [[https://github.
		return nullptr;
		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// unloads a target shared library		/// unloads a target shared library
EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {
TIMESCOPE();		TIMESCOPE();
PM->RTLs.UnregisterLib(desc);		PM->RTLs.UnregisterLib(desc);
for (auto &RTL : PM->RTLs.UsedRTLs) {		for (auto &RTL : PM->RTLs.UsedRTLs) {
if (RTL->unregister_lib) {		if (RTL->unregister_lib) {
if ((*RTL->unregister_lib)(desc) != OFFLOAD_SUCCESS) {		if ((*RTL->unregister_lib)(desc) != OFFLOAD_SUCCESS) {
DP("Could not register library with %s", RTL->RTLName.c_str());		DP("Could not register library with %s", RTL->RTLName.c_str());
}		}
}		}
}		}
		if (__tgt_num_registered_images) {
		free(__tgt_AllImageInfos);
		__tgt_num_registered_images = 0;
		}
}		}

/// creates host-to-target data mapping, stores it in the		/// creates host-to-target data mapping, stores it in the
/// libomptarget.so internal structure (an entry in a stack of data maps)		/// libomptarget.so internal structure (an entry in a stack of data maps)
/// and passes the data to the device.		/// and passes the data to the device.
EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
void args_base, void args,		void args_base, void args,
int64_t arg_sizes, int64_t arg_types) {		int64_t arg_sizes, int64_t arg_types) {
▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

openmp/libomptarget/src/rtl.cpp

Show All 14 Lines
#include "private.h"		#include "private.h"

#include <cassert>		#include <cassert>
#include <cstdlib>		#include <cstdlib>
#include <cstring>		#include <cstring>
#include <dlfcn.h>		#include <dlfcn.h>
#include <mutex>		#include <mutex>
#include <string>		#include <string>
		#include <sys/stat.h>

// List of all plugins that can support offloading.		// List of all plugins that can support offloading.
static const char *RTLNames[] = {		static const char *RTLNames[] = {
/* PowerPC target */ "libomptarget.rtl.ppc64.so",		/* PowerPC target */ "libomptarget.rtl.ppc64.so",
/* x86_64 target */ "libomptarget.rtl.x86_64.so",		/* x86_64 target */ "libomptarget.rtl.x86_64.so",
/* CUDA target */ "libomptarget.rtl.cuda.so",		/* CUDA target */ "libomptarget.rtl.cuda.so",
/* AArch64 target */ "libomptarget.rtl.aarch64.so",		/* AArch64 target */ "libomptarget.rtl.aarch64.so",
/* SX-Aurora VE target */ "libomptarget.rtl.ve.so",		/* SX-Aurora VE target */ "libomptarget.rtl.ve.so",
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	void RTLsTy::RegisterRequires(int64_t flags) {
}		}

// TODO: insert any other missing checks		// TODO: insert any other missing checks

DP("New requires flags %" PRId64 " compatible with existing %" PRId64 "!\n",		DP("New requires flags %" PRId64 " compatible with existing %" PRId64 "!\n",
flags, RequiresFlags);		flags, RequiresFlags);
}		}

		/// Query runtime capabilities of this system by calling offload-arch -c
		/// offload_arch_output_buffer is persistant storage returned by this
		/// __tgt_get_active_offload_env.
		static void
		__tgt_get_active_offload_env(__tgt_active_offload_env *active_env,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_get_active_offload_env' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'active_env' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_get_active_offload_env'…
		char *offload_arch_output_buffer,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'offload_arch_output_buffer' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'offload_arch_output_buffer' [readability…
		size_t offload_arch_output_buffer_size) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'offload_arch_output_buffer_size' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'offload_arch_output_buffer_size'…
		void *handle = dlopen("libomptarget.so", RTLD_NOW);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'handle' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'handle' [readability-identifier-naming]…
		if (!handle)
		DP("dlopen() failed: %s\n", dlerror());
		char *libomptarget_dir_name = new char[PATH_MAX];
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'libomptarget_dir_name' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'libomptarget_dir_name' [readability…
		if (dlinfo(handle, RTLD_DI_ORIGIN, libomptarget_dir_name) == -1)
		DP("RTLD_DI_ORIGIN failed: %s\n", dlerror());
		std::string cmd_bin;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'cmd_bin' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'cmd_bin' [readability-identifier-naming]…
		cmd_bin.assign(libomptarget_dir_name).append("/../bin/amdgpu-arch");
		saiislamAuthorUnsubmitted Done Reply Inline Actions Call to amdgpu-arch binary is going to be replaced with call to a new library named OffloadArch. It will return current GPU name along with enabled GPU features (i.e. requirements) in a platform-independent way. As the library and its various functionalities are self-contained I decided to post it is a separate review and use amdgpu-arch here for demonstration. I will be posting the phab review for the library soon. saiislam: Call to amdgpu-arch binary is going to be replaced with call to a new library named OffloadArch.
		saiislamAuthorUnsubmitted Done Reply Inline Actions Here is the patch for the OffloadArch library: D106960 saiislam: Here is the patch for the OffloadArch library: D106960
		struct stat stat_buffer;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'stat_buffer' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'stat_buffer' [readability-identifier…
		if (stat(cmd_bin.c_str(), &stat_buffer)) {
		DP("Missing offload-arch command at %s \n", cmd_bin.c_str());
		} else {
		// Add option to print capabilities of current system
		// cmd_bin.append(" -c");
		FILE *stream = popen(cmd_bin.c_str(), "r");
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'stream' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'stream' [readability-identifier-naming]…
		while (fgets(offload_arch_output_buffer, offload_arch_output_buffer_size,
		stream) != NULL)
		;
		pclose(stream);
		active_env->capabilities = offload_arch_output_buffer;
		size_t slen = strlen(active_env->capabilities);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'slen' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'slen' [readability-identifier-naming]…
		offload_arch_output_buffer[slen - 1] =
		'\0'; // terminate string before line feed
		offload_arch_output_buffer +=
		slen; // To store next value in offload_arch_output_buffer, not likely
		}
		delete[] libomptarget_dir_name;
		}

		std::vector<std::string> _splitstrings(char input, const char sep) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '_splitstrings' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'input' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'sep' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '_splitstrings' [readability-identifier…
		std::vector<std::string> split_strings;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'split_strings' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'split_strings' [readability-identifier…
		std::string s(input);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 's' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 's' [readability-identifier-naming]…
		std::string delimiter(sep);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'delimiter' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'delimiter' [readability-identifier…
		size_t pos = 0;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'pos' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'pos' [readability-identifier-naming]…
		while ((pos = s.find(delimiter)) != std::string::npos) {
		if (pos != 0)
		split_strings.push_back(s.substr(0, pos));
		s.erase(0, pos + delimiter.length());
		}
		if (s.length() > 1)
		split_strings.push_back(s.substr(0, s.length()));
		return split_strings;
		}

		static bool _ImageIsCompatibleWithEnv(__tgt_image_info *img_info,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '_ImageIsCompatibleWithEnv' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'img_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '_ImageIsCompatibleWithEnv' [readability…
		__tgt_active_offload_env *active_env) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'active_env' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'active_env' [readability-identifier…
		// get_image_info will return null if no image information was registered.
		// If no image information, assume application built with old compiler and
		// check each image.
		if (!img_info)
		return true;

		// Each runtime requirement for the compiled image is stored in
		// the img_info->requirements string and is separated by __ .
		// Each runtime capability obtained from "offload-arch -c" is stored in
		// actvie_env->capabilities and is separated by spaces.
		// If every requirement has a matching capability, then the image
		// is compatible with active environment

		std::vector<std::string> reqs = _splitstrings(img_info->requirements, "__");
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'reqs' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'reqs' [readability-identifier-naming]…
		std::vector<std::string> caps = _splitstrings(active_env->capabilities, " ");
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'caps' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'caps' [readability-identifier-naming]…

		bool is_compatible = true;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'is_compatible' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'is_compatible' [readability-identifier…
		for (auto req : reqs) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'req' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'req' [readability-identifier-naming]…
		bool missing_capability = true;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'missing_capability' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'missing_capability' [readability…
		for (auto capability : caps)
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'capability' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'capability' [readability-identifier…
		if (capability == req)
		missing_capability = false;
		if (missing_capability) {
		DP("Image requires %s but runtime capability %s is missing.\n",
		img_info->requirements, req.c_str());
		is_compatible = false;
		}
		}
		return is_compatible;
		}

		#define MAX_CAPS_STR_SIZE 1024
void RTLsTy::RegisterLib(__tgt_bin_desc *desc) {		void RTLsTy::RegisterLib(__tgt_bin_desc *desc) {

		// Get the current active offload environment
		__tgt_active_offload_env offload_env;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'offload_env' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'offload_env' [readability-identifier…
		// Need a buffer to hold results of offload-arch -c command
		size_t offload_arch_output_buffer_size = MAX_CAPS_STR_SIZE;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'offload_arch_output_buffer_size' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'offload_arch_output_buffer_size'…
		char *offload_arch_output_buffer =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'offload_arch_output_buffer' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'offload_arch_output_buffer' [readability…
		(char *)malloc(offload_arch_output_buffer_size);
		__tgt_get_active_offload_env(&offload_env, offload_arch_output_buffer,
		offload_arch_output_buffer_size);

		bool requires_usm = (bool)(RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'requires_usm' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'requires_usm' [readability-identifier…
		bool has_xnack = (std::string(offload_env.capabilities).find("xnack+") !=
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'has_xnack' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'has_xnack' [readability-identifier…
		std::string::npos);
		bool is_amd = (std::string(offload_env.capabilities).find("gfx") == 0);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'is_amd' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'is_amd' [readability-identifier-naming]…
		if (is_amd && requires_usm && !has_xnack) {
		fprintf(stderr, "WARNING: USM SET WITHOUT XNACK ENABLED.\n");
		fprintf(stderr, " THIS WILL BECOME FATAL ERROR IN FUTURE.\n");
		}
		#if 0
		FATAL_MESSAGE0(1, "'#pragma omp requires unified_shared_memory' requires "
		"environment with xnack+ capability!");
		#endif

		RTLInfoTy *FoundRTL = NULL;
PM->RTLsMtx.lock();		PM->RTLsMtx.lock();
// Register the images with the RTLs that understand them, if any.		// Register the images with the RTLs that understand them, if any.
for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {		for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {
// Obtain the image.		// Obtain the image.
__tgt_device_image *img = &desc->DeviceImages[i];		__tgt_device_image *img = &desc->DeviceImages[i];

RTLInfoTy *FoundRTL = NULL;		// Get corresponding image info requirements and check with runtime
		__tgt_image_info *img_info = __tgt_get_image_info(i);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'img_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'img_info' [readability-identifier-naming]…
		if (!_ImageIsCompatibleWithEnv(img_info, &offload_env))
		continue;
		FoundRTL = NULL;
// Scan the RTLs that have associated images until we find one that supports		// Scan the RTLs that have associated images until we find one that supports
// the current image.		// the current image.
for (auto &R : AllRTLs) {		for (auto &R : AllRTLs) {

if (!R.is_valid_binary(img)) {		if (!R.is_valid_binary(img)) {
DP("Image " DPxMOD " is NOT compatible with RTL %s!\n",		DP("Image " DPxMOD " is NOT compatible with RTL %s!\n",
DPxPTR(img->ImageStart), R.RTLName.c_str());		DPxPTR(img->ImageStart), R.RTLName.c_str());
continue;		continue;
}		}

DP("Image " DPxMOD " is compatible with RTL %s!\n",		DP("Image " DPxMOD " is compatible with RTL %s!\n",
DPxPTR(img->ImageStart), R.RTLName.c_str());		DPxPTR(img->ImageStart), R.RTLName.c_str());
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {
}		}

if (!FoundRTL) {		if (!FoundRTL) {
DP("No RTL found for image " DPxMOD "!\n", DPxPTR(img->ImageStart));		DP("No RTL found for image " DPxMOD "!\n", DPxPTR(img->ImageStart));
}		}
}		}
PM->RTLsMtx.unlock();		PM->RTLsMtx.unlock();

		if (!FoundRTL) {
		if (PM->TargetOffloadPolicy == tgt_mandatory)
		fprintf(stderr, "ERROR:\
		Runtime capabilities do NOT meet any offload image requirements\n\
		and the OMP_TARGET_OFFLOAD policy is mandatory. Terminating!\n\
		Runtime capabilities : %s\n",
		offload_env.capabilities);
		else if (PM->TargetOffloadPolicy == tgt_disabled)
		fprintf(stderr, "WARNING: Offloading is disabled.\n");
		else
		fprintf(
		stderr,
		"WARNING: Runtime capabilities do NOT meet any image requirements.\n\
		So device offloading is now disabled.\n\
		Runtime capabilities : %s\n",
		offload_env.capabilities);
		if (PM->TargetOffloadPolicy != tgt_disabled) {
		for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		__tgt_image_info *img_info = __tgt_get_image_info(i);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'img_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'img_info' [readability-identifier-naming]…
		if (img_info)
		fprintf(stderr, "\
		Image %d requirements : %s\n",
		i, img_info->requirements);
		else
		fprintf(stderr, "\
		Image %d has no requirements. Could be from older compiler\n",
		i);
		}
		}
		if (PM->TargetOffloadPolicy == tgt_mandatory)
		exit(1);
		}

DP("Done registering entries!\n");		DP("Done registering entries!\n");
		free(offload_arch_output_buffer);
}		}

void RTLsTy::UnregisterLib(__tgt_bin_desc *desc) {		void RTLsTy::UnregisterLib(__tgt_bin_desc *desc) {
DP("Unloading target library!\n");		DP("Unloading target library!\n");

PM->RTLsMtx.lock();		PM->RTLsMtx.lock();
// Find which RTL understands each image, if any.		// Find which RTL understands each image, if any.
for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {		for (int32_t i = 0; i < desc->NumDeviceImages; ++i) {
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Multi architecture compilation supportNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 362002

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Driver/ToolChain.h

clang/lib/Driver/Action.cpp

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/Cuda.h

clang/lib/Driver/ToolChains/Cuda.cpp

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

clang/test/Driver/amdgpu-openmp-toolchain.c

clang/test/Driver/hip-rdc-device-only.hip

clang/test/Driver/hip-toolchain-rdc-separate.hip

clang/test/Driver/openmp-offload-multi.c

clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/src/exports

openmp/libomptarget/src/interface.cpp

openmp/libomptarget/src/rtl.cpp

[OpenMP] Multi architecture compilation support
Needs ReviewPublic