This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/
-
Driver/
-
Driver.cpp
-
ToolChains/
1/2
AMDGPUOpenMP.cpp
-
Cuda.cpp
-
test/Driver/
-
Driver/
3/7
amdgpu-openmp-toolchain-new.c
1/2
openmp-offload-gpu-new.c

Differential D124721

[OpenMP] Allow compiling multiple target architectures with OpenMP
ClosedPublic

Authored by jhuber6 on Apr 30 2022, 4:04 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
yaxunl
saiislam
tianshilei1992
tra

Commits

rG8477a0d769a0: [OpenMP] Allow compiling multiple target architectures with OpenMP

Summary

This patch adds support for OpenMP to use the --offload-arch and
--no-offload-arch options. Traditionally, OpenMP has only supported
compiling for a single architecture via the -Xopenmp-target option.
Now we can pass in a bound architecture and use that if given, otherwise
we default to the value of the -march option as before.

Note that this only applies the basic support, the OpenMP target runtime
does not yet know how to choose between multiple architectures.
Additionally other parts of the offloading toolchain (e.g. LTO) require
the -march option, these should be worked out later.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Apr 30 2022, 4:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 4:04 PM

Herald added subscribers: kerbowa, guansong, jvesely. · View Herald Transcript

jhuber6 requested review of this revision.Apr 30 2022, 4:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 4:04 PM

Herald added subscribers: cfe-commits, sstefan1, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B162124: Diff 426254.Apr 30 2022, 4:37 PM

saiislam added inline comments.May 1 2022, 10:41 PM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	Wouldn't it be better if the user is not required to specify the triple in this shorthand version? We can infer the triple from the GPUArch. We have this support in our downstream branch. clang --target=x86_64-unknown-linux-gnu -fopenmp --offload-arch=gfx906 helloworld.c -o helloworld

jhuber6 added inline comments.May 2 2022, 5:15 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	We could, HIP and CUDA both use some kind of `getAMDOffloadTargetTriple`. I guess in this case we would consider OpenMP offloading active if the user specified `-fopenmp` and `--offload-arch`? I could do this in a separate patch.

saiislam added inline comments.May 2 2022, 5:20 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	Yes, exactly. OpenMP offloading should be active when `-fopenmp` and `--offload-arch` both are present. Thank you!

saiislam added inline comments.May 2 2022, 5:28 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	Following code might be useful for your patch (it assumes that OffloadArch is associated with each device tool chain so that multiple archs of same triple can be compiled together): GetTargetInfoFromOffloadArch() Driver::GetTargetInfoFromMarch() Driver::GetTargetInfoFromOffloadArchOpts() modified definition of Driver::CreateOffloadingDeviceToolChains()

jhuber6 added inline comments.May 2 2022, 5:41 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	I'll look into it, I was thinking of a good way to specify architectures per triple. So we could theoretically have `--offload-arch=sm_70` and `--offload_arch=gfx908` work in unison and it might just be easy to group the triples from the architecture.

saiislam added inline comments.May 2 2022, 6:44 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	Along with this, we would also like to support --offload-arch=gfx906 and --offload-arch=gfx908 in the same command.

jhuber6 added inline comments.May 2 2022, 6:54 AM

clang/test/Driver/amdgpu-openmp-toolchain-new.c
6	This patch already supports that, we'll compile for all the architectures and they'll all end up linked in the linker wrapper. What's missing is the changes to select an appropriate image in the `libomptarget` runtime.

Changing slightly, I'm using the getArgsForToolchain to only get the --offload-arch options for that toolchain. This lets us quality it with options like -Xopenmp-target= so we can now specify architectures per-toolchain without it causing an error. For example, the following should work:

clang input.c -fopenmp -fopenmp-targets=nvptx64,amdgcn -Xopenmp-targets=amdgcn --offload-arch=gfx803 -Xopenmp-targets=nvptx64 --offload-arch=sm_70 -c

clang/test/Driver/openmp-offload-gpu-new.c
56	You may want to add a test that when no `--offload-arch=` is specified, driver makes a reasonable default choice for both nvptx and amdgpu.

Harbormaster completed remote builds in B162329: Diff 426521.May 2 2022, 3:43 PM

jhuber6 added inline comments.May 2 2022, 3:43 PM

clang/test/Driver/openmp-offload-gpu-new.c
56	OpenMP uses a different default from CUDA / HIP. If not specified, it will use the `CLANG_OPENMP_NVPTX_DEFAULT_ARCH` definition (Should be set automatically when CMake configures Clang) or the `amdgpu-arch` tool and pass it in via the `-march=` option. This is indicated by passing an empty StringRef to the bound architecture while CUDA / HIP use some default. Elsewhere, if we see the BoundArchitecture is empty we just get the value of the `-march` option instead. This should be covered by some other tests and this doesn't change the default behaviour.

jhuber6 mentioned this in D125050: [OpenMP] Try to Infer target triples using the offloading architecture.May 5 2022, 3:15 PM

ping

LGTM in general.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

313

I'd change checkSystemForAMDGPU to return the Arch or empty string.

I'd also simplify the code to something like this:

std::string Arch = DAL->getLastArgValue(options::OPT_march_EQ).str();
if (Arch.empty()) {
  Arch = !BoundArch.empty() ? BoundArch :  checkSystemForAMDGPU(Args, *this);
  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), Arch);
}

This revision is now accepted and ready to land.May 5 2022, 3:47 PM

jhuber6 added a child revision: D125050: [OpenMP] Try to Infer target triples using the offloading architecture.May 5 2022, 4:41 PM

jhuber6 added inline comments.May 5 2022, 4:46 PM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
313	Having `checkSystemForAMDGPU` cause errors when trying to test this was a pain. It'll take a bit more plumbing to address that so I think I'll leave this as-is for this patch and address it in a follow-up.

Closed by commit rG8477a0d769a0: [OpenMP] Allow compiling multiple target architectures with OpenMP (authored by jhuber6). · Explain WhyMay 6 2022, 1:57 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG8477a0d769a0: [OpenMP] Allow compiling multiple target architectures with OpenMP.

jhuber6 mentioned this in rG509b631f84e9: [OpenMP] Try to Infer target triples using the offloading architecture.

Revision Contents

Path

Size

clang/

lib/

Driver/

Driver.cpp

38 lines

ToolChains/

AMDGPUOpenMP.cpp

7 lines

Cuda.cpp

6 lines

test/

Driver/

amdgpu-openmp-toolchain-new.c

4 lines

openmp-offload-gpu-new.c

25 lines

Diff 427734

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 4,208 Lines • ▼ Show 20 Lines	void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
// passed to non-CUDA compilations and should not trigger warnings there.		// passed to non-CUDA compilations and should not trigger warnings there.
Args.ClaimAllArgs(options::OPT_offload_host_only);		Args.ClaimAllArgs(options::OPT_offload_host_only);
Args.ClaimAllArgs(options::OPT_offload_host_device);		Args.ClaimAllArgs(options::OPT_offload_host_device);
}		}

/// Returns the canonical name for the offloading architecture when using HIP or		/// Returns the canonical name for the offloading architecture when using HIP or
/// CUDA.		/// CUDA.
static StringRef getCanonicalArchString(Compilation &C,		static StringRef getCanonicalArchString(Compilation &C,
llvm::opt::DerivedArgList &Args,		const llvm::opt::DerivedArgList &Args,
StringRef ArchStr,		StringRef ArchStr,
Action::OffloadKind Kind) {		Action::OffloadKind Kind,
if (Kind == Action::OFK_Cuda) {		const ToolChain *TC) {
		if (Kind == Action::OFK_Cuda \|\|
		(Kind == Action::OFK_OpenMP && TC->getTriple().isNVPTX())) {
CudaArch Arch = StringToCudaArch(ArchStr);		CudaArch Arch = StringToCudaArch(ArchStr);
if (Arch == CudaArch::UNKNOWN \|\| !IsNVIDIAGpuArch(Arch)) {		if (Arch == CudaArch::UNKNOWN \|\| !IsNVIDIAGpuArch(Arch)) {
C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch) << ArchStr;		C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch) << ArchStr;
return StringRef();		return StringRef();
}		}
return Args.MakeArgStringRef(CudaArchToString(Arch));		return Args.MakeArgStringRef(CudaArchToString(Arch));
} else if (Kind == Action::OFK_HIP) {		} else if (Kind == Action::OFK_HIP \|\|
		(Kind == Action::OFK_OpenMP && TC->getTriple().isAMDGPU())) {
llvm::StringMap<bool> Features;		llvm::StringMap<bool> Features;
// getHIPOffloadTargetTriple() is known to return valid value as it has		// getHIPOffloadTargetTriple() is known to return valid value as it has
// been called successfully in the CreateOffloadingDeviceToolChains().		// been called successfully in the CreateOffloadingDeviceToolChains().
auto Arch = parseTargetID(		auto Arch = parseTargetID(
*getHIPOffloadTargetTriple(C.getDriver(), C.getInputArgs()), ArchStr,		*getHIPOffloadTargetTriple(C.getDriver(), C.getInputArgs()), ArchStr,
&Features);		&Features);
if (!Arch) {		if (!Arch) {
C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << ArchStr;		C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << ArchStr;
C.setContainsError();		C.setContainsError();
return StringRef();		return StringRef();
}		}
return Args.MakeArgStringRef(		return Args.MakeArgStringRef(
getCanonicalTargetID(Arch.getValue(), Features));		getCanonicalTargetID(Arch.getValue(), Features));
}		}
return StringRef();		// If the input isn't CUDA or HIP just return the architecture.
		return ArchStr;
}		}

/// Checks if the set offloading architectures does not conflict. Returns the		/// Checks if the set offloading architectures does not conflict. Returns the
/// incompatible pair if a conflict occurs.		/// incompatible pair if a conflict occurs.
static llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>		static llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
getConflictOffloadArchCombination(const llvm::DenseSet<StringRef> &Archs,		getConflictOffloadArchCombination(const llvm::DenseSet<StringRef> &Archs,
Action::OffloadKind Kind) {		Action::OffloadKind Kind) {
if (Kind != Action::OFK_HIP)		if (Kind != Action::OFK_HIP)
return None;		return None;

std::set<StringRef> ArchSet;		std::set<StringRef> ArchSet;
llvm::copy(Archs, std::inserter(ArchSet, ArchSet.begin()));		llvm::copy(Archs, std::inserter(ArchSet, ArchSet.begin()));
return getConflictTargetIDCombination(ArchSet);		return getConflictTargetIDCombination(ArchSet);
}		}

/// Returns the set of bound architectures active for this compilation kind.		/// Returns the set of bound architectures active for this compilation kind.
/// This function returns a set of bound architectures, if there are no bound		/// This function returns a set of bound architectures, if there are no bound
/// architctures we return a set containing only the empty string.		/// architctures we return a set containing only the empty string.
static llvm::DenseSet<StringRef>		static llvm::DenseSet<StringRef>
getOffloadArchs(Compilation &C, llvm::opt::DerivedArgList &Args,		getOffloadArchs(Compilation &C, const llvm::opt::DerivedArgList &Args,
Action::OffloadKind Kind) {		Action::OffloadKind Kind, const ToolChain *TC) {

// If this is OpenMP offloading we don't use a bound architecture.
if (Kind == Action::OFK_OpenMP)
return llvm::DenseSet<StringRef>{StringRef()};

// --offload and --offload-arch options are mutually exclusive.		// --offload and --offload-arch options are mutually exclusive.
if (Args.hasArgNoClaim(options::OPT_offload_EQ) &&		if (Args.hasArgNoClaim(options::OPT_offload_EQ) &&
Args.hasArgNoClaim(options::OPT_offload_arch_EQ,		Args.hasArgNoClaim(options::OPT_offload_arch_EQ,
options::OPT_no_offload_arch_EQ)) {		options::OPT_no_offload_arch_EQ)) {
C.getDriver().Diag(diag::err_opt_not_valid_with_opt)		C.getDriver().Diag(diag::err_opt_not_valid_with_opt)
<< "--offload"		<< "--offload"
<< (Args.hasArgNoClaim(options::OPT_offload_arch_EQ)		<< (Args.hasArgNoClaim(options::OPT_offload_arch_EQ)
? "--offload-arch"		? "--offload-arch"
: "--no-offload-arch");		: "--no-offload-arch");
}		}

llvm::DenseSet<StringRef> Archs;		llvm::DenseSet<StringRef> Archs;
for (auto &Arg : Args) {		for (auto &Arg : Args) {
if (Arg->getOption().matches(options::OPT_offload_arch_EQ)) {		if (Arg->getOption().matches(options::OPT_offload_arch_EQ)) {
Archs.insert(getCanonicalArchString(C, Args, Arg->getValue(), Kind));		Archs.insert(getCanonicalArchString(C, Args, Arg->getValue(), Kind, TC));
} else if (Arg->getOption().matches(options::OPT_no_offload_arch_EQ)) {		} else if (Arg->getOption().matches(options::OPT_no_offload_arch_EQ)) {
if (Arg->getValue() == StringRef("all"))		if (Arg->getValue() == StringRef("all"))
Archs.clear();		Archs.clear();
else		else
Archs.erase(getCanonicalArchString(C, Args, Arg->getValue(), Kind));		Archs.erase(getCanonicalArchString(C, Args, Arg->getValue(), Kind, TC));
}		}
}		}

if (auto ConflictingArchs = getConflictOffloadArchCombination(Archs, Kind)) {		if (auto ConflictingArchs = getConflictOffloadArchCombination(Archs, Kind)) {
C.getDriver().Diag(clang::diag::err_drv_bad_offload_arch_combo)		C.getDriver().Diag(clang::diag::err_drv_bad_offload_arch_combo)
<< ConflictingArchs.getValue().first		<< ConflictingArchs.getValue().first
<< ConflictingArchs.getValue().second;		<< ConflictingArchs.getValue().second;
C.setContainsError();		C.setContainsError();
}		}

if (Archs.empty()) {		if (Archs.empty()) {
if (Kind == Action::OFK_Cuda)		if (Kind == Action::OFK_Cuda)
Archs.insert(CudaArchToString(CudaArch::CudaDefault));		Archs.insert(CudaArchToString(CudaArch::CudaDefault));
else if (Kind == Action::OFK_HIP)		else if (Kind == Action::OFK_HIP)
Archs.insert(CudaArchToString(CudaArch::HIPDefault));		Archs.insert(CudaArchToString(CudaArch::HIPDefault));
		else if (Kind == Action::OFK_OpenMP)
		Archs.insert(StringRef());
		} else {
		Args.ClaimAllArgs(options::OPT_offload_arch_EQ);
		Args.ClaimAllArgs(options::OPT_no_offload_arch_EQ);
}		}

return Archs;		return Archs;
}		}

Action *Driver::BuildOffloadingActions(Compilation &C,		Action *Driver::BuildOffloadingActions(Compilation &C,
llvm::opt::DerivedArgList &Args,		llvm::opt::DerivedArgList &Args,
const InputTy &Input,		const InputTy &Input,
Show All 29 Lines	if (ToolChains.empty())
continue;		continue;

types::ID InputType = Input.first;		types::ID InputType = Input.first;
const Arg *InputArg = Input.second;		const Arg *InputArg = Input.second;

// Get the product of all bound architectures and toolchains.		// Get the product of all bound architectures and toolchains.
SmallVector<std::pair<const ToolChain *, StringRef>> TCAndArchs;		SmallVector<std::pair<const ToolChain *, StringRef>> TCAndArchs;
for (const ToolChain *TC : ToolChains)		for (const ToolChain *TC : ToolChains)
for (StringRef Arch : getOffloadArchs(C, Args, Kind))		for (StringRef Arch : getOffloadArchs(
		C, C.getArgsForToolChain(TC, "generic", Kind), Kind, TC))
TCAndArchs.push_back(std::make_pair(TC, Arch));		TCAndArchs.push_back(std::make_pair(TC, Arch));

for (unsigned I = 0, E = TCAndArchs.size(); I != E; ++I)		for (unsigned I = 0, E = TCAndArchs.size(); I != E; ++I)
DeviceActions.push_back(C.MakeAction<InputAction>(*InputArg, InputType));		DeviceActions.push_back(C.MakeAction<InputAction>(*InputArg, InputType));

if (DeviceActions.empty())		if (DeviceActions.empty())
return HostAction;		return HostAction;

Show All 12 Lines	for (phases::ID Phase : PL) {
if (isa<CompileJobAction>(A) && isa<CompileJobAction>(HostAction) &&		if (isa<CompileJobAction>(A) && isa<CompileJobAction>(HostAction) &&
Kind == Action::OFK_OpenMP) {		Kind == Action::OFK_OpenMP) {
// OpenMP offloading has a dependency on the host compile action to		// OpenMP offloading has a dependency on the host compile action to
// identify which declarations need to be emitted. This shouldn't be		// identify which declarations need to be emitted. This shouldn't be
// collapsed with any other actions so we can use it in the device.		// collapsed with any other actions so we can use it in the device.
HostAction->setCannotBeCollapsedWithNextDependentAction();		HostAction->setCannotBeCollapsedWithNextDependentAction();
OffloadAction::HostDependence HDep(		OffloadAction::HostDependence HDep(
HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),		HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),
/BoundArch=/nullptr, Kind);		TCAndArch->second.data(), Kind);
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, TCAndArch->first, /BoundArch=/nullptr, Kind);		DDep.add(A, TCAndArch->first, TCAndArch->second.data(), Kind);
A = C.MakeAction<OffloadAction>(HDep, DDep);		A = C.MakeAction<OffloadAction>(HDep, DDep);
} else if (isa<AssembleJobAction>(A) && Kind == Action::OFK_Cuda) {		} else if (isa<AssembleJobAction>(A) && Kind == Action::OFK_Cuda) {
// The Cuda toolchain uses fatbinary as the linker phase to bundle the		// The Cuda toolchain uses fatbinary as the linker phase to bundle the
// PTX and Cubin output.		// PTX and Cubin output.
ActionList FatbinActions;		ActionList FatbinActions;
for (Action *A : {A, A->getInputs()[0]}) {		for (Action *A : {A, A->getInputs()[0]}) {
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, TCAndArch->first, TCAndArch->second.data(), Kind);		DDep.add(A, TCAndArch->first, TCAndArch->second.data(), Kind);
▲ Show 20 Lines • Show All 1,813 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(

const OptTable &Opts = getDriver().getOpts();		const OptTable &Opts = getDriver().getOpts();

if (DeviceOffloadKind == Action::OFK_OpenMP) {		if (DeviceOffloadKind == Action::OFK_OpenMP) {
for (Arg *A : Args)		for (Arg *A : Args)
if (!llvm::is_contained(*DAL, A))		if (!llvm::is_contained(*DAL, A))
DAL->append(A);		DAL->append(A);

std::string Arch = DAL->getLastArgValue(options::OPT_march_EQ).str();		if (!DAL->hasArg(options::OPT_march_EQ)) {
if (Arch.empty()) {		std::string Arch = BoundArch.str();
		if (BoundArch.empty())
checkSystemForAMDGPU(Args, *this, Arch);		checkSystemForAMDGPU(Args, *this, Arch);
		traUnsubmitted Not Done Reply Inline Actions I'd change `checkSystemForAMDGPU` to return the Arch or empty string. I'd also simplify the code to something like this: std::string Arch = DAL->getLastArgValue(options::OPT_march_EQ).str(); if (Arch.empty()) { Arch = !BoundArch.empty() ? BoundArch : checkSystemForAMDGPU(Args, this); DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), Arch); } tra:* I'd change `checkSystemForAMDGPU` to return the Arch or empty string. I'd also simplify the…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Having `checkSystemForAMDGPU` cause errors when trying to test this was a pain. It'll take a bit more plumbing to address that so I think I'll leave this as-is for this patch and address it in a follow-up. jhuber6: Having `checkSystemForAMDGPU` cause errors when trying to test this was a pain. It'll take a…
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), Arch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), Arch);
}		}

return DAL;		return DAL;
}		}

for (Arg *A : Args) {		for (Arg *A : Args) {
DAL->append(A);		DAL->append(A);
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 841 Lines • ▼ Show 20 Lines	CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
// For OpenMP device offloading, append derived arguments. Make sure		// For OpenMP device offloading, append derived arguments. Make sure
// flags are not duplicated.		// flags are not duplicated.
// Also append the compute capability.		// Also append the compute capability.
if (DeviceOffloadKind == Action::OFK_OpenMP) {		if (DeviceOffloadKind == Action::OFK_OpenMP) {
for (Arg *A : Args)		for (Arg *A : Args)
if (!llvm::is_contained(*DAL, A))		if (!llvm::is_contained(*DAL, A))
DAL->append(A);		DAL->append(A);

StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);		if (!DAL->hasArg(options::OPT_march_EQ))
if (Arch.empty())
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
CLANG_OPENMP_NVPTX_DEFAULT_ARCH);		!BoundArch.empty() ? BoundArch
		: CLANG_OPENMP_NVPTX_DEFAULT_ARCH);

return DAL;		return DAL;
}		}

for (Arg *A : Args) {		for (Arg *A : Args) {
DAL->append(A);		DAL->append(A);
}		}

▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

clang/test/Driver/amdgpu-openmp-toolchain-new.c

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
	// RUN: -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib %s 2>&1 \			// RUN: -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib %s 2>&1 \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
				saiislamUnsubmitted Not Done Reply Inline Actions Wouldn't it be better if the user is not required to specify the triple in this shorthand version? We can infer the triple from the GPUArch. We have this support in our downstream branch. clang --target=x86_64-unknown-linux-gnu -fopenmp --offload-arch=gfx906 helloworld.c -o helloworld saiislam: Wouldn't it be better if the user is not required to specify the triple in this shorthand…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions We could, HIP and CUDA both use some kind of `getAMDOffloadTargetTriple`. I guess in this case we would consider OpenMP offloading active if the user specified `-fopenmp` and `--offload-arch`? I could do this in a separate patch. jhuber6: We could, HIP and CUDA both use some kind of `getAMDOffloadTargetTriple`. I guess in this case…
				saiislamUnsubmitted Not Done Reply Inline Actions Yes, exactly. OpenMP offloading should be active when `-fopenmp` and `--offload-arch` both are present. Thank you! saiislam: Yes, exactly. OpenMP offloading should be active when `-fopenmp` and `--offload-arch` both are…
				saiislamUnsubmitted Not Done Reply Inline Actions Following code might be useful for your patch (it assumes that OffloadArch is associated with each device tool chain so that multiple archs of same triple can be compiled together): GetTargetInfoFromOffloadArch() Driver::GetTargetInfoFromMarch() Driver::GetTargetInfoFromOffloadArchOpts() modified definition of Driver::CreateOffloadingDeviceToolChains() saiislam: Following code might be useful for your patch (it assumes that OffloadArch is associated with…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions I'll look into it, I was thinking of a good way to specify architectures per triple. So we could theoretically have `--offload-arch=sm_70` and `--offload_arch=gfx908` work in unison and it might just be easy to group the triples from the architecture. jhuber6: I'll look into it, I was thinking of a good way to specify architectures per triple. So we…
				saiislamUnsubmitted Not Done Reply Inline Actions Along with this, we would also like to support --offload-arch=gfx906 and --offload-arch=gfx908 in the same command. saiislam: Along with this, we would also like to support --offload-arch=gfx906 and --offload-arch=gfx908…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions This patch already supports that, we'll compile for all the architectures and they'll all end up linked in the linker wrapper. What's missing is the changes to select an appropriate image in the `libomptarget` runtime. jhuber6: This patch already supports that, we'll compile for all the architectures and they'll all end…
				// RUN: --offload-arch=gfx906 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib %s 2>&1 \
				// RUN: \| FileCheck %s

	// verify the tools invocations			// verify the tools invocations
	// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-emit-llvm-bc"{{.}}"-x" "c"			// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-emit-llvm-bc"{{.}}"-x" "c"
	// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"{{.}}"-target-cpu" "gfx906"{{.}}"-fcuda-is-device"{{.}}"-mlink-builtin-bitcode" "{{.}}libomptarget-amdgpu-gfx906.bc"			// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu"{{.}}"-target-cpu" "gfx906"{{.}}"-fcuda-is-device"{{.}}"-mlink-builtin-bitcode" "{{.}}libomptarget-amdgpu-gfx906.bc"
	// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-emit-obj"			// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-emit-obj"
	// CHECK: clang-linker-wrapper{{.}}"--"{{.}} "-o" "a.out"			// CHECK: clang-linker-wrapper{{.}}"--"{{.}} "-o" "a.out"

	// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \			// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
	Show All 15 Lines
	// handling of --libomptarget-amdgpu-bc-path			// handling of --libomptarget-amdgpu-bc-path
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgpu-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET
	// CHECK-LIBOMPTARGET: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc"{{.*}}			// CHECK-LIBOMPTARGET: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgpu-gfx803.bc"{{.*}}

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB
	// CHECK-NOGPULIB-NOT: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgpu-gfx803.bc"{{.*}}			// CHECK-NOGPULIB-NOT: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgpu-gfx803.bc"{{.*}}

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-BINDINGS			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-BINDINGS
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-BINDINGS
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.]]"], output: "[[HOST_BC:.]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.]]"], output: "[[HOST_BC:.]]"
	// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"			// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_BC]]"], output: "[[HOST_OBJ:.*]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_BC]]"], output: "[[HOST_OBJ:.*]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
	// CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"			// CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"

	// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW			// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW
	// CHECK-LIB-DEVICE-NEW: {{.}}clang-linker-wrapper{{.}}-target-library=openmp-amdgcn-amd-amdhsa-gfx803={{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"			// CHECK-LIB-DEVICE-NEW: {{.}}clang-linker-wrapper{{.}}-target-library=openmp-amdgcn-amd-amdhsa-gfx803={{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"

clang/test/Driver/openmp-offload-gpu-new.c

	///			///
	/// Perform several driver tests for OpenMP offloading			/// Perform several driver tests for OpenMP offloading
	///			///

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda \			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda \
	// RUN: -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 \			// RUN: -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 \
	// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc %s 2>&1 \			// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc %s 2>&1 \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda \
				// RUN: --offload-arch=sm_52 \
				// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-nvptx-test.bc %s 2>&1 \
				// RUN: \| FileCheck %s

	// verify the tools invocations			// verify the tools invocations
	// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-emit-llvm-bc"{{.}}"-x" "c"			// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-emit-llvm-bc"{{.}}"-x" "c"
	// CHECK: "-cc1" "-triple" "nvptx64-nvidia-cuda" "-aux-triple" "x86_64-unknown-linux-gnu"{{.*}}"-target-cpu" "sm_52"			// CHECK: "-cc1" "-triple" "nvptx64-nvidia-cuda" "-aux-triple" "x86_64-unknown-linux-gnu"{{.*}}"-target-cpu" "sm_52"
	// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-emit-obj"			// CHECK: "-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-emit-obj"
	// CHECK: clang-linker-wrapper{{.}}"--"{{.}} "-o" "a.out"			// CHECK: clang-linker-wrapper{{.}}"--"{{.}} "-o" "a.out"

	// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 %s 2>&1 \			// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 %s 2>&1 \
	Show All 14 Lines

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-BINDINGS			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-BINDINGS
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.]]"], output: "[[HOST_BC:.]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.]]"], output: "[[HOST_BC:.]]"
	// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"			// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"
	// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC]]"], output: "[[DEVICE_OBJ:.*]]"			// CHECK-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC]]"], output: "[[DEVICE_OBJ:.*]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ]]"], output: "[[HOST_OBJ:.*]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ]]"], output: "[[HOST_OBJ:.*]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"

				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda --offload-arch=sm_52 --offload-arch=sm_70 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-ARCH-BINDINGS
				// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.]]"], output: "[[HOST_BC:.]]"
				// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC_SM_52:.*]]"
				// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_52]]"], output: "[[DEVICE_OBJ_SM_52:.*]]"
				// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC_SM_70:.*]]"
				// CHECK-ARCH-BINDINGS: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[DEVICE_BC_SM_70]]"], output: "[[DEVICE_OBJ_SM_70:.*]]"
				// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[DEVICE_OBJ_SM_52]]", "[[DEVICE_OBJ_SM_70]]"], output: "[[HOST_OBJ:.*]]"
				// CHECK-ARCH-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"

				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \
				traUnsubmitted Not Done Reply Inline Actions You may want to add a test that when no `--offload-arch=` is specified, driver makes a reasonable default choice for both nvptx and amdgpu. tra: You may want to add a test that when no `--offload-arch=` is specified, driver makes a…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions OpenMP uses a different default from CUDA / HIP. If not specified, it will use the `CLANG_OPENMP_NVPTX_DEFAULT_ARCH` definition (Should be set automatically when CMake configures Clang) or the `amdgpu-arch` tool and pass it in via the `-march=` option. This is indicated by passing an empty StringRef to the bound architecture while CUDA / HIP use some default. Elsewhere, if we see the BoundArchitecture is empty we just get the value of the `-march` option instead. This should be covered by some other tests and this doesn't change the default behaviour. jhuber6: OpenMP uses a different default from CUDA / HIP. If not specified, it will use the…
				// RUN: -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70 \
				// RUN: -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx908 \
				// RUN: -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NVIDIA-AMDGPU

				// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
				// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[NVIDIA_PTX:.+]]"
				// CHECK-NVIDIA-AMDGPU: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[NVIDIA_PTX]]"], output: "[[NVIDIA_CUBIN:.+]]"
				// CHECK-NVIDIA-AMDGPU: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[AMD_BC:.+]]"
				// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[NVIDIA_CUBIN]]", "[[AMD_BC]]"], output: "[[HOST_OBJ:.+]]"
				// CHECK-NVIDIA-AMDGPU: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_52 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
	// CHECK-EMIT-LLVM-IR: "-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"{{.}}"-emit-llvm"			// CHECK-EMIT-LLVM-IR: "-cc1"{{.}}"-triple" "nvptx64-nvidia-cuda"{{.}}"-emit-llvm"

	// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \			// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvida-cuda -march=sm_70 \
	// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc \			// RUN: --libomptarget-nvptx-bc-path=%S/Inputs/libomptarget/libomptarget-new-nvptx-test.bc \
	// RUN: -nogpulib %s -o openmp-offload-gpu 2>&1 \			// RUN: -nogpulib %s -o openmp-offload-gpu 2>&1 \
	// RUN: \| FileCheck -check-prefix=DRIVER_EMBEDDING %s			// RUN: \| FileCheck -check-prefix=DRIVER_EMBEDDING %s

	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Allow compiling multiple target architectures with OpenMPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 427734

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/Cuda.cpp

clang/test/Driver/amdgpu-openmp-toolchain-new.c

clang/test/Driver/openmp-offload-gpu-new.c

[OpenMP] Allow compiling multiple target architectures with OpenMP
ClosedPublic