This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/
-
Driver/
-
CMakeLists.txt
3/9
Driver.cpp
-
ToolChains/
-
AMDGPUOpenMP.h
2/6
AMDGPUOpenMP.cpp
-
test/Driver/
-
Driver/
1/3
amdgpu-openmp-toolchain.c

Differential D94961

[OpenMP] Add OpenMP offloading toolchain for AMDGPU
ClosedPublic

Authored by pdhaliwal on Jan 19 2021, 3:45 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
grokos
JonChesterfield
ronlieb
ABataev
yaxunl
saiislam

Commits

rGfcf03e728007: [OpenMP] Add OpenMP offloading toolchain for AMDGPU

Summary

This patch adds AMDGPUOpenMPToolChain for supporting OpenMP
offloading to AMD GPU's.

Originally authored by Greg Rodgers

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	450 ms	x64 debian > Clang.Driver::openmp-offload-gpu.c
	1,000 ms	x64 debian > Clang.Driver::openmp-offload.c
	650 ms	x64 windows > Clang.Driver::amdgpu-openmp-toolchain.c
	630 ms	x64 windows > Clang.Driver::openmp-offload-gpu.c
	3,900 ms	x64 windows > Clang.Driver::openmp-offload.c

Event Timeline

pdhaliwal created this revision.Jan 19 2021, 3:45 AM

Herald added subscribers: kerbowa, guansong, t-tye and 6 others. · View Herald TranscriptJan 19 2021, 3:45 AM

pdhaliwal requested review of this revision.Jan 19 2021, 3:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 3:45 AM

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

This patch was written, roughly, by:

copying the known-working openmp driver from rocm into the trunk source tree
deleting lots of stuff that didn't look necessary
deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

host plugin works, same code in trunk / rocm / aomp
device plugin will work once it's building as openmp, modulo printf and malloc
compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Harbormaster completed remote builds in B85692: Diff 317514.Jan 19 2021, 4:31 AM

Fix clang-tidy error

Won't this just prevent us from building clang due to the missing cmake changes? We need somewhat testable chunks.

In D94961#2506460, @JonChesterfield wrote:

This patch was written, roughly, by:

copying the known-working openmp driver from rocm into the trunk source tree

deleting lots of stuff that didn't look necessary

deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

host plugin works, same code in trunk / rocm / aomp

device plugin will work once it's building as openmp, modulo printf and malloc

compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl

Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Could you add the tests for the tools invocation to check that the newly added classes/functions correctly translate options/flags?

Harbormaster completed remote builds in B85714: Diff 317553.Jan 19 2021, 7:43 AM

Won't this just prevent us from building clang due to the missing cmake changes?

It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.

The idea for this patch was basically to introduce AMDGPUToolChain classes without much of the functionality in order
to keep its size in check. And the second patch would have integrated the toolchain with driver along with testing.
But during the intermediate time of the two patches, bare files would have existed (never built and tested).

I have updated this patch to now include somewhat functional driver along with tests.

Herald added a subscriber: mgorny. · View Herald TranscriptJan 20 2021, 1:49 AM

pdhaliwal retitled this revision from [OpenMP] Add OpenMP offloading toolchain skeleton for AMDGPU to [OpenMP] Add OpenMP offloading toolchain for AMDGPU.Jan 20 2021, 2:16 AM

pdhaliwal edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B85856: Diff 317810.Jan 20 2021, 2:22 AM

Fixed failing debian tests

Harbormaster completed remote builds in B85860: Diff 317831.Jan 20 2021, 4:41 AM

Moved common methods of HIP and OpenMP to base AMDGPUToolChain
Removed unnecessary asserts

pdhaliwal added a reviewer: yaxunl.Jan 20 2021, 11:53 PM

Harbormaster completed remote builds in B86047: Diff 318120.Jan 21 2021, 1:17 AM

The deviceRTL is changing from cuda/hip to openmp at present. It would be good to be able to compile that as openmp for amdgpu, which needs a patch roughly like this and probably some follow up.

It's plausible that the contents of this file are only really of interest to AMD people, in which case we should probably land it in order to iterate in tree instead of downstream. It now has the cmake hook and a proof of concept test case.

@ABataev @grokos @jdoerfert your guidance would be very welcome, but equally if you'd prefer leave us to sink or swim that's completely reasonable too.

Thanks!

pdhaliwal added a reviewer: saiislam.Jan 27 2021, 6:42 AM

Ping!

LGTM. But, please wait from someone outside AMD to accept it.

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess. Some nits below.

clang/lib/Driver/Driver.cpp
2999	Why the `substr` copy?
clang/lib/Driver/ToolChains/AMDGPU.h
72 ↗	(On Diff #318120)	What is the coding style here? Pick upper case or lower case, form the above I assume upper case.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	To verify, O0 is not mapped to O2, correct?

In D94961#2533848, @jdoerfert wrote:

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess.

This is loosely based on the nvptx and hip toolchain files, but looking at others in clang the style seems similar.

clang/lib/Driver/Driver.cpp
2999	As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though
clang/lib/Driver/ToolChains/AMDGPU.h
72 ↗	(On Diff #318120)	Yeah, should be `supportsProfiling()` as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead. Which seems bad.

JonChesterfield added inline comments.Feb 1 2021, 8:21 AM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default.

I believe this is 'safe', in the sense that it can't break anything other than AMDGPU's openmp implementation, which doesn't work yet anyway. The change to non-amd code is to OpenMPActionBuilder, and while parsing -march=gfx isn't pretty, it won't fire on nvptx. I'm not particularly confident it is correct, but it's hard to do comprehensive testing before there's an end to end spike. I'd like to iterate in tree, with an expectation that parts of this will need to change - @jdoerfert does the OpenMP patch look adequate for now?

clang/lib/Driver/Driver.cpp
755	This follows the same logic as nvptx so is likely to be correct

ronlieb mentioned this in D95752: [OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant.Feb 1 2021, 5:01 PM

Generally OK, the more I look at it the more things pop up. Other than my comments, I'm fine with this going in.

clang/lib/Driver/Driver.cpp
767	what if we say `TT.isNVPTX() \|\| TT.isAMDGCN()` in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future.
3004	This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp-target` flag. In general, why skip these passes here. Isn't what you really want to do, pass `-emit-llvm[-bc]` to the device compilation job?
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
67	Just to make it clear, we should default to O0.
clang/test/Driver/amdgpu-openmp-toolchain.c
3	Would this at all work without the march flag?

Addressed review comments.

Combined the toolchain creation logic for nvptx and amdgcn
Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
Removed opt from pipeline

After addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine.

clang/lib/Driver/Driver.cpp
3004	I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP.
clang/test/Driver/amdgpu-openmp-toolchain.c
3	This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch.

JonChesterfield added inline comments.Feb 2 2021, 3:03 AM

clang/test/Driver/amdgpu-openmp-toolchain.c
3	The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906` could be friendlier. Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed.

Looks much simpler, thanks!

clang/lib/Driver/Driver.cpp
758	Minor suggestion, if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... } ? Semantically equivalent, but `if () else if ()` looks like there is a missing else clause.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
44	Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain
65	Default("0") and update the comment 'we map anything else'

Harbormaster completed remote builds in B87497: Diff 320722.Feb 2 2021, 3:53 AM

Use 0 for default -O option
Rename addOptLevelArgs to addLLCOptArg

clang/lib/Driver/Driver.cpp
758	I think, having assert in last else is bit cleaner. What do you think?

JonChesterfield added inline comments.Feb 2 2021, 4:02 AM

clang/lib/Driver/Driver.cpp
758	I also prefer the if () else {assert()} construct.

Harbormaster completed remote builds in B87505: Diff 320740.Feb 2 2021, 4:45 AM

fine with me, assuming this doesn't disturb anything outside the AMD pipeline.

This revision is now accepted and ready to land.Feb 2 2021, 10:50 AM

Closed by commit rGfcf03e728007: [OpenMP] Add OpenMP offloading toolchain for AMDGPU (authored by pdhaliwal). · Explain WhyFeb 2 2021, 9:43 PM

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rGfcf03e728007: [OpenMP] Add OpenMP offloading toolchain for AMDGPU.

Revision Contents

Path

Size

clang/

lib/

Driver/

CMakeLists.txt

1 line

Driver.cpp

23 lines

ToolChains/

AMDGPUOpenMP.h

123 lines

AMDGPUOpenMP.cpp

302 lines

test/

Driver/

amdgpu-openmp-toolchain.c

35 lines

Diff 317810

clang/lib/Driver/CMakeLists.txt

Show All 30 Lines	add_clang_library(clangDriver
ToolChains/Arch/RISCV.cpp		ToolChains/Arch/RISCV.cpp
ToolChains/Arch/Sparc.cpp		ToolChains/Arch/Sparc.cpp
ToolChains/Arch/SystemZ.cpp		ToolChains/Arch/SystemZ.cpp
ToolChains/Arch/VE.cpp		ToolChains/Arch/VE.cpp
ToolChains/Arch/X86.cpp		ToolChains/Arch/X86.cpp
ToolChains/AIX.cpp		ToolChains/AIX.cpp
ToolChains/Ananas.cpp		ToolChains/Ananas.cpp
ToolChains/AMDGPU.cpp		ToolChains/AMDGPU.cpp
		ToolChains/AMDGPUOpenMP.cpp
ToolChains/AVR.cpp		ToolChains/AVR.cpp
ToolChains/BareMetal.cpp		ToolChains/BareMetal.cpp
ToolChains/Clang.cpp		ToolChains/Clang.cpp
ToolChains/CloudABI.cpp		ToolChains/CloudABI.cpp
ToolChains/CommonArgs.cpp		ToolChains/CommonArgs.cpp
ToolChains/Contiki.cpp		ToolChains/Contiki.cpp
ToolChains/CrossWindows.cpp		ToolChains/CrossWindows.cpp
ToolChains/Cuda.cpp		ToolChains/Cuda.cpp
Show All 40 Lines

clang/lib/Driver/Driver.cpp

//===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//		//===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "InputInfo.h"		#include "InputInfo.h"
#include "ToolChains/AIX.h"		#include "ToolChains/AIX.h"
#include "ToolChains/AMDGPU.h"		#include "ToolChains/AMDGPU.h"
		#include "ToolChains/AMDGPUOpenMP.h"
#include "ToolChains/AVR.h"		#include "ToolChains/AVR.h"
#include "ToolChains/Ananas.h"		#include "ToolChains/Ananas.h"
#include "ToolChains/BareMetal.h"		#include "ToolChains/BareMetal.h"
#include "ToolChains/Clang.h"		#include "ToolChains/Clang.h"
#include "ToolChains/CloudABI.h"		#include "ToolChains/CloudABI.h"
#include "ToolChains/Contiki.h"		#include "ToolChains/Contiki.h"
#include "ToolChains/CrossWindows.h"		#include "ToolChains/CrossWindows.h"
#include "ToolChains/Cuda.h"		#include "ToolChains/Cuda.h"
▲ Show 20 Lines • Show All 725 Lines • ▼ Show 20 Lines	if (OpenMPTargets->getNumValues()) {
C.getSingleOffloadToolChain<Action::OFK_Host>();		C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "Host toolchain should be always defined.");		assert(HostTC && "Host toolchain should be always defined.");
auto &CudaTC =		auto &CudaTC =
ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];		ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
if (!CudaTC)		if (!CudaTC)
CudaTC = std::make_unique<toolchains::CudaToolChain>(		CudaTC = std::make_unique<toolchains::CudaToolChain>(
this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP);		this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP);
TC = CudaTC.get();		TC = CudaTC.get();
		} else if (TT.isAMDGCN()) {
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This follows the same logic as nvptx so is likely to be correct JonChesterfield: This follows the same logic as nvptx so is likely to be correct
		const ToolChain *HostTC =
		C.getSingleOffloadToolChain<Action::OFK_Host>();
		const llvm::Triple &HostTriple = HostTC->getTriple();
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Minor suggestion, if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... } ? Semantically equivalent, but `if () else if ()` looks like there is a missing else clause. JonChesterfield: Minor suggestion, ``` if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... }``` ?
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I think, having assert in last else is bit cleaner. What do you think? pdhaliwal: I think, having assert in last else is bit cleaner. What do you think?
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions I also prefer the if () else {assert()} construct. JonChesterfield: I also prefer the if () else {assert()} construct.
		llvm::Triple AMDGPUTriple("amdgcn-amd-amdhsa");
		auto &AMDGPUOpenMPTC =
		ToolChains[AMDGPUTriple.str() + "/" + HostTriple.str()];
		if (!AMDGPUOpenMPTC) {
		AMDGPUOpenMPTC =
		std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
		this, AMDGPUTriple, HostTC, C.getInputArgs());
		}
		TC = AMDGPUOpenMPTC.get();
		jdoerfertUnsubmitted Done Reply Inline Actions what if we say `TT.isNVPTX() \|\| TT.isAMDGCN()` in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future. jdoerfert: what if we say `TT.isNVPTX() \|\| TT.isAMDGCN()` in line 745, rename the Cuda things to "device"…
} else		} else
TC = &getToolChain(C.getInputArgs(), TT);		TC = &getToolChain(C.getInputArgs(), TT);
C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);		C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
}		}
}		}
} else		} else
Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);		Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
} else		} else
▲ Show 20 Lines • Show All 2,209 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
PhasesTy &Phases) override {		PhasesTy &Phases) override {
if (OpenMPDeviceActions.empty())		if (OpenMPDeviceActions.empty())
return ABRT_Inactive;		return ABRT_Inactive;

// We should always have an action for each input.		// We should always have an action for each input.
assert(OpenMPDeviceActions.size() == ToolChains.size() &&		assert(OpenMPDeviceActions.size() == ToolChains.size() &&
"Number of OpenMP actions and toolchains do not match.");		"Number of OpenMP actions and toolchains do not match.");

		const ToolChain *OpenMPTC =
		C.getSingleOffloadToolChain<Action::OFK_OpenMP>();

		// amdgcn does not support linking of object files, therefore we skip
		// backend and assemble phases to output LLVM IR.
		if (OpenMPTC->getTriple().isAMDGCN() &&
		(CurPhase == phases::Backend \|\| CurPhase == phases::Assemble))
		jdoerfertUnsubmitted Not Done Reply Inline Actions Why the `substr` copy? jdoerfert: Why the `substr` copy?
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though JonChesterfield: As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this…
		return ABRT_Success;

// The host only depends on device action in the linking phase, when all		// The host only depends on device action in the linking phase, when all
// the device images have to be embedded in the host image.		// the device images have to be embedded in the host image.
if (CurPhase == phases::Link) {		if (CurPhase == phases::Link) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp-target` flag. In general, why skip these passes here. Isn't what you really want to do, pass `-emit-llvm[-bc]` to the device compilation job? jdoerfert: This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP. pdhaliwal: I have removed the logic from here instead now I have added -emit-llvm-bc using…
assert(ToolChains.size() == DeviceLinkerInputs.size() &&		assert(ToolChains.size() == DeviceLinkerInputs.size() &&
"Toolchains and linker inputs sizes do not match.");		"Toolchains and linker inputs sizes do not match.");
auto LI = DeviceLinkerInputs.begin();		auto LI = DeviceLinkerInputs.begin();
for (auto *A : OpenMPDeviceActions) {		for (auto *A : OpenMPDeviceActions) {
LI->push_back(A);		LI->push_back(A);
++LI;		++LI;
}		}

▲ Show 20 Lines • Show All 2,400 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

This file was added.

				//===- AMDGPUOpenMP.h - AMDGPUOpenMP ToolChain Implementation -- C++ ----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
				#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H

				#include "AMDGPU.h"
				#include "clang/Driver/Tool.h"
				#include "clang/Driver/ToolChain.h"

				namespace clang {
				namespace driver {

				namespace tools {

				namespace AMDGCN {
				// Runs llvm-link/opt/llc/lld, which links multiple LLVM bitcode, together with
				// device library, then compiles it to ISA in a shared object.
				class LLVM_LIBRARY_VISIBILITY OpenMPLinker : public Tool {
				public:
				OpenMPLinker(const ToolChain &TC)
				: Tool("AMDGCN::OpenMPLinker", "amdgcn-link", TC) {}

				bool hasIntegratedCPP() const override { return false; }

				void ConstructJob(Compilation &C, const JobAction &JA,
				const InputInfo &Output, const InputInfoList &Inputs,
				const llvm::opt::ArgList &TCArgs,
				const char *LinkingOutput) const override;

				private:
				/// \return llvm-link output file name.
				const char *constructLLVMLinkCommand(Compilation &C, const JobAction &JA,
				const InputInfoList &Inputs,
				const llvm::opt::ArgList &Args,
				llvm::StringRef SubArchName,
				llvm::StringRef OutputFilePrefix) const;

				/// \return opt output file name.
				const char *constructOptCommand(Compilation &C, const JobAction &JA,
				const InputInfoList &Inputs,
				const llvm::opt::ArgList &Args,
				llvm::StringRef SubArchName,
				llvm::StringRef OutputFilePrefix,
				const char *InputFileName) const;

				/// \return llc output file name.
				const char *constructLlcCommand(Compilation &C, const JobAction &JA,
				const InputInfoList &Inputs,
				const llvm::opt::ArgList &Args,
				llvm::StringRef SubArchName,
				llvm::StringRef OutputFilePrefix,
				const char *InputFileName,
				bool OutputIsAsm = false) const;

				void constructLldCommand(Compilation &C, const JobAction &JA,
				const InputInfoList &Inputs, const InputInfo &Output,
				const llvm::opt::ArgList &Args,
				const char *InputFileName) const;
				};

				} // end namespace AMDGCN
				} // end namespace tools

				namespace toolchains {

				class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
				: public ROCMToolChain {
				public:
				AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
				const ToolChain &HostTC,
				const llvm::opt::ArgList &Args);

				const llvm::Triple *getAuxTriple() const override {
				return &HostTC.getTriple();
				}

				llvm::opt::DerivedArgList *
				TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
				Action::OffloadKind DeviceOffloadKind) const override;
				void
				addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
				llvm::opt::ArgStringList &CC1Args,
				Action::OffloadKind DeviceOffloadKind) const override;

				bool useIntegratedAs() const override { return true; }
				bool isCrossCompiling() const override { return true; }
				bool isPICDefault() const override { return false; }
				bool isPIEDefault() const override { return false; }
				bool isPICDefaultForced() const override { return false; }
				bool SupportsProfiling() const override { return false; }
				bool IsMathErrnoDefault() const override { return false; }

				void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
				CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
				void
				AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
				llvm::opt::ArgStringList &CC1Args) const override;
				void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
				llvm::opt::ArgStringList &CC1Args) const override;

				SanitizerMask getSupportedSanitizers() const override;

				VersionTuple
				computeMSVCVersion(const Driver *D,
				const llvm::opt::ArgList &Args) const override;

				const ToolChain &HostTC;

				protected:
				Tool *buildLinker() const override;
				};

				} // end namespace toolchains
				} // end namespace driver
				} // end namespace clang

				#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

This file was added.

				//===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPUOpenMP.h"
				#include "AMDGPU.h"
				#include "CommonArgs.h"
				#include "InputInfo.h"
				#include "clang/Driver/Compilation.h"
				#include "clang/Driver/Driver.h"
				#include "clang/Driver/DriverDiagnostic.h"
				#include "clang/Driver/Options.h"
				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/Path.h"

				using namespace clang::driver;
				using namespace clang::driver::toolchains;
				using namespace clang::driver::tools;
				using namespace clang;
				using namespace llvm::opt;

				namespace {

				static const char *getOutputFileName(Compilation &C, StringRef Base,
				const char *Postfix,
				const char *Extension) {
				const char *OutputFileName;
				if (C.getDriver().isSaveTempsEnabled()) {
				OutputFileName =
				C.getArgs().MakeArgString(Base.str() + Postfix + "." + Extension);
				} else {
				std::string TmpName =
				C.getDriver().GetTemporaryPath(Base.str() + Postfix, Extension);
				OutputFileName = C.addTempFile(C.getArgs().MakeArgString(TmpName));
				}
				return OutputFileName;
				}

				static void addOptLevelArgs(const llvm::opt::ArgList &Args,
				llvm::opt::ArgStringList &CmdArgs,
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain JonChesterfield: Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably…
				bool IsLlc = false) {
				if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
				StringRef OOpt = "3";
				if (A->getOption().matches(options::OPT_O4) \|\|
				A->getOption().matches(options::OPT_Ofast))
				OOpt = "3";
				else if (A->getOption().matches(options::OPT_O0))
				OOpt = "0";
				else if (A->getOption().matches(options::OPT_O)) {
				// Clang and opt support -Os/-Oz; llc only supports -O0, -O1, -O2 and -O3
				// so we map -Os/-Oz to -O2.
				// Only clang supports -Og, and maps it to -O1.
				// We map anything else to -O2.
				OOpt = llvm::StringSwitch<const char *>(A->getValue())
				.Case("1", "1")
				.Case("2", "2")
				.Case("3", "3")
				.Case("s", IsLlc ? "2" : "s")
				.Case("z", IsLlc ? "2" : "z")
				.Case("g", "1")
				.Default("2");
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Default("0") and update the comment 'we map anything else' JonChesterfield: Default("0") and update the comment 'we map anything else'
				}
				jdoerfertUnsubmitted Not Done Reply Inline Actions To verify, O0 is not mapped to O2, correct? jdoerfert: To verify, O0 is not mapped to O2, correct?
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead. Which seems bad. JonChesterfield: I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead.
				JonChesterfieldUnsubmitted Done Reply Inline Actions Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default. JonChesterfield: Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are…
				CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));
				jdoerfertUnsubmitted Done Reply Inline Actions Just to make it clear, we should default to O0. jdoerfert: Just to make it clear, we should default to O0.
				}
				}
				} // namespace

				const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
				Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
				const ArgList &Args, StringRef SubArchName,
				StringRef OutputFilePrefix) const {
				ArgStringList CmdArgs;

				for (const auto &II : Inputs)
				if (II.isFilename())
				CmdArgs.push_back(II.getFilename());
				// Add an intermediate output file.
				CmdArgs.push_back("-o");
				const char *OutputFileName =
				getOutputFileName(C, OutputFilePrefix, "-linked", "bc");
				CmdArgs.push_back(OutputFileName);
				const char *Exec =
				Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));
				C.addCommand(std::make_unique<Command>(
				JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,
				InputInfo(&JA, Args.MakeArgString(OutputFileName))));
				return OutputFileName;
				}

				const char *AMDGCN::OpenMPLinker::constructOptCommand(
				Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
				const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
				llvm::StringRef OutputFilePrefix, const char *InputFileName) const {
				// Construct opt command.
				ArgStringList OptArgs;
				// The input to opt is the output from llvm-link.
				OptArgs.push_back(InputFileName);
				// Pass optimization arg to opt.
				addOptLevelArgs(Args, OptArgs);
				OptArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
				OptArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));

				for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
				OptArgs.push_back(A->getValue(0));
				}

				OptArgs.push_back("-o");
				const char *OutputFileName =
				getOutputFileName(C, OutputFilePrefix, "-optimized", "bc");
				OptArgs.push_back(OutputFileName);
				const char *OptExec =
				Args.MakeArgString(getToolChain().GetProgramPath("opt"));
				C.addCommand(std::make_unique<Command>(
				JA, *this, ResponseFileSupport::AtFileCurCP(), OptExec, OptArgs, Inputs,
				InputInfo(&JA, Args.MakeArgString(OutputFileName))));
				return OutputFileName;
				}

				const char *AMDGCN::OpenMPLinker::constructLlcCommand(
				Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
				const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
				llvm::StringRef OutputFilePrefix, const char *InputFileName,
				bool OutputIsAsm) const {
				// Construct llc command.
				ArgStringList LlcArgs;
				// The input to llc is the output from opt.
				LlcArgs.push_back(InputFileName);
				// Pass optimization arg to llc.
				addOptLevelArgs(Args, LlcArgs, /IsLlc=/true);
				LlcArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
				LlcArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));
				LlcArgs.push_back(
				Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj")));

				for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
				LlcArgs.push_back(A->getValue(0));
				}

				// Add output filename
				LlcArgs.push_back("-o");
				const char *LlcOutputFile =
				getOutputFileName(C, OutputFilePrefix, "", OutputIsAsm ? "s" : "o");
				LlcArgs.push_back(LlcOutputFile);
				const char *Llc = Args.MakeArgString(getToolChain().GetProgramPath("llc"));
				C.addCommand(std::make_unique<Command>(
				JA, *this, ResponseFileSupport::AtFileCurCP(), Llc, LlcArgs, Inputs,
				InputInfo(&JA, Args.MakeArgString(LlcOutputFile))));
				return LlcOutputFile;
				}

				void AMDGCN::OpenMPLinker::constructLldCommand(
				Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
				const InputInfo &Output, const llvm::opt::ArgList &Args,
				const char *InputFileName) const {
				// Construct lld command.
				// The output from ld.lld is an HSA code object file.
				ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined",
				"-shared", "-o", Output.getFilename(),
				InputFileName};

				const char *Lld = Args.MakeArgString(getToolChain().GetProgramPath("lld"));
				C.addCommand(std::make_unique<Command>(
				JA, *this, ResponseFileSupport::AtFileCurCP(), Lld, LldArgs, Inputs,
				InputInfo(&JA, Args.MakeArgString(Output.getFilename()))));
				}

				// For amdgcn the inputs of the linker job are device bitcode and output is
				// object file. It calls llvm-link, opt, llc, then lld steps.
				void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
				const InputInfo &Output,
				const InputInfoList &Inputs,
				const ArgList &Args,
				const char *LinkingOutput) const {
				assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");

				StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
				assert(GPUArch.startswith("gfx") && "Unsupported sub arch");

				// Prefix for temporary file name.
				std::string Prefix;
				for (const auto &II : Inputs)
				if (II.isFilename())
				Prefix =
				llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
				assert(Prefix.length() && "no linker inputs are files ");

				// Each command outputs different files.
				const char *LLVMLinkCommand =
				constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);
				const char *OptCommand = constructOptCommand(C, JA, Inputs, Args, GPUArch,
				Prefix, LLVMLinkCommand);
				const char *LlcCommand =
				constructLlcCommand(C, JA, Inputs, Args, GPUArch, Prefix, OptCommand);
				constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);
				}

				AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
				const llvm::Triple &Triple,
				const ToolChain &HostTC,
				const ArgList &Args)
				: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
				// Lookup binaries into the driver directory, this is used to
				// discover the clang-offload-bundler executable.
				getProgramPaths().push_back(getDriver().Dir);
				}

				void AMDGPUOpenMPToolChain::addClangTargetOptions(
				const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
				Action::OffloadKind DeviceOffloadingKind) const {
				HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

				StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
				assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
				assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
				"Only OpenMP offloading kinds are supported.");

				CC1Args.push_back("-target-cpu");
				CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
				CC1Args.push_back("-fcuda-is-device");

				// Default to "hidden" visibility, as object level linking will not be
				// supported for the foreseeable future.
				if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
				options::OPT_fvisibility_ms_compat) &&
				DeviceOffloadingKind != Action::OFK_OpenMP) {
				CC1Args.append({"-fvisibility", "hidden"});
				CC1Args.push_back("-fapply-global-visibility-to-externs");
				}
				}

				llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
				const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
				Action::OffloadKind DeviceOffloadKind) const {
				DerivedArgList *DAL =
				HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
				if (!DAL)
				DAL = new DerivedArgList(Args.getBaseArgs());

				const OptTable &Opts = getDriver().getOpts();

				if (DeviceOffloadKind != Action::OFK_OpenMP) {
				for (Arg *A : Args) {
				DAL->append(A);
				}
				}

				if (!BoundArch.empty()) {
				DAL->eraseArg(options::OPT_march_EQ);
				DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
				BoundArch);
				}

				return DAL;
				}

				Tool *AMDGPUOpenMPToolChain::buildLinker() const {
				assert(getTriple().isAMDGCN());
				return new tools::AMDGCN::OpenMPLinker(*this);
				}

				void AMDGPUOpenMPToolChain::addClangWarningOptions(
				ArgStringList &CC1Args) const {
				HostTC.addClangWarningOptions(CC1Args);
				}

				ToolChain::CXXStdlibType
				AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
				return HostTC.GetCXXStdlibType(Args);
				}

				void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
				const ArgList &DriverArgs, ArgStringList &CC1Args) const {
				HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
				}

				void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
				ArgStringList &CC1Args) const {
				HostTC.AddIAMCUIncludeArgs(Args, CC1Args);
				}

				SanitizerMask AMDGPUOpenMPToolChain::getSupportedSanitizers() const {
				// The AMDGPUOpenMPToolChain only supports sanitizers in the sense that it
				// allows sanitizer arguments on the command line if they are supported by the
				// host toolchain. The AMDGPUOpenMPToolChain will actually ignore any command
				// line arguments for any of these "supported" sanitizers. That means that no
				// sanitization of device code is actually supported at this time.
				//
				// This behavior is necessary because the host and device toolchains
				// invocations often share the command line, so the device toolchain must
				// tolerate flags meant only for the host toolchain.
				return HostTC.getSupportedSanitizers();
				}

				VersionTuple
				AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
				const ArgList &Args) const {
				return HostTC.computeMSVCVersion(D, Args);
				}

clang/test/Driver/amdgpu-openmp-toolchain.c

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang -### -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s 2>&1 \
				// RUN: \| FileCheck %s
				jdoerfertUnsubmitted Not Done Reply Inline Actions Would this at all work without the march flag? jdoerfert: Would this at all work without the march flag?
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch. pdhaliwal: This would not work without -march flag as amdgcn is not forward/backward compatible and there…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906` could be friendlier. Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed. JonChesterfield: The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp…

				// verify the tools invocations
				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}
				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}
				// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu" "-emit-llvm-bc" "-emit-llvm-uselists"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device"{{.}}"-fopenmp" "-fopenmp-cuda-parallel-target-regions"{{.}}"-fopenmp-is-device"{{.}}"-o" {{.}}amdgpu-openmp-toolchain-{{.}}.bc{{.}}"-x" "c"{{.}}amdgpu-openmp-toolchain.c{{.*}}
				// CHECK: llvm-link{{.}}amdgpu-openmp-toolchain-{{.}}.bc" "-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.*}}.bc"
				// CHECK: opt{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}} "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-optimized-{{.}}.bc"
				// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-optimized-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
				// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
				// CHECK: clang-offload-wrapper{{.}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a-{{.}}.bc" {{.}}amdgpu-openmp-toolchain-{{.*}}.out"
				// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a-{{.}}.o" "-x" "ir" "{{.}}a-{{.}}.bc"
				// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a-{{.}}.o" "-lomp" "-lomptarget"

				// RUN: %clang -ccc-print-phases -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 %s 2>&1 \
				// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s
				// phases
				// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
				// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
				// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
				// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
				// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
				// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
				// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
				// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
				// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
				// CHECK-PHASES: 9: linker, {8}, image, (device-openmp)
				// CHECK-PHASES: 10: offload, "device-openmp (amdgcn-amd-amdhsa)" {9}, image
				// CHECK-PHASES: 11: clang-offload-wrapper, {10}, ir, (host-openmp)
				// CHECK-PHASES: 12: backend, {11}, assembler, (host-openmp)
				// CHECK-PHASES: 13: assembler, {12}, object, (host-openmp)
				// CHECK-PHASES: 14: linker, {4, 13}, image, (host-openmp)