Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
ronlieb
arsenm
yaxunl
tianshilei1992
ye-luo

Commits

rG194ec844f5c6: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

Summary

Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.

I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Sep 12 2022, 2:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 2:25 PM

Herald added subscribers: kosarev, kerbowa, guansong and 5 others. · View Herald Transcript

jhuber6 requested review of this revision.Sep 12 2022, 2:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 2:25 PM

Herald added subscribers: cfe-commits, sstefan1, MaskRay, wdng. · View Herald Transcript

Harbormaster completed remote builds in B186227: Diff 459558.Sep 12 2022, 3:00 PM

We can do this but should expect an increase in code size from having multiple internalised copies of the same function. There may be an incidental benefit if we can specialise some functions to the call site without additional cloning. Address of the same functions from different TUs will be inequal, which is wrong, but probably doesn't matter in practice.

It does have the major advantage that mlink-builtin-bitcode patches up the invalid IR on the fly, which is likely easier than changing the device libs or making IR mcpu-agnostic.

JonChesterfield added inline comments.Sep 12 2022, 3:41 PM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
58	Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. I'd expect us to want only the common set. Hip.bc shouldn't have non-hip stuff in it.

jhuber6 added inline comments.Sep 12 2022, 3:41 PM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
58	Existing virtual function, just re-used it.

In D133726#3785040, @JonChesterfield wrote:

We can do this but should expect an increase in code size from having multiple internalised copies of the same function. There may be an incidental benefit if we can specialise some functions to the call site without additional cloning. Address of the same functions from different TUs will be inequal, which is wrong, but probably doesn't matter in practice.

It does have the major advantage that mlink-builtin-bitcode patches up the invalid IR on the fly, which is likely easier than changing the device libs or making IR mcpu-agnostic.

It will probably decrease code size in the final executable now that this will forcefully internalize all the protected functions in ocml.bc that were sticking around because LTO couldn't remove them due to visibility.

JonChesterfield added inline comments.Sep 12 2022, 9:54 PM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h
58	Rename it rocm perhaps?

Renaming virtual function to make it more generic.

Harbormaster completed remote builds in B186346: Diff 459716.Sep 13 2022, 5:26 AM

Adding a test for using -nogpulib.

Harbormaster completed remote builds in B186352: Diff 459721.Sep 13 2022, 6:26 AM

yaxunl added inline comments.Sep 13 2022, 7:51 AM

clang/lib/Driver/ToolChains/AMDGPU.cpp
717–723	can we emit an error if both -march and -mcpu are specified and the values are different? This is a potential source of error.

jhuber6 added inline comments.Sep 13 2022, 8:10 AM

clang/lib/Driver/ToolChains/AMDGPU.cpp
717–723	I'm not exactly sure about the semantics for this, maybe someone else can help. AFAIK `-mcpu` is sometimes treated as an alias of `-mtune` which is generally implied by `-march`. But having `-march` and `-mtune` state different things isn't technically disallowed. `-march` seems to imply that we generate code that can run on a certain architecture. Since these are in some respects equivalent it would probably be fine to just take the last one specified.

Does this fix the weird behavior where you needed to use -lm to use anything in the device libraries? I don't see that being removed

In D133726#3786607, @arsenm wrote:

Does this fix the weird behavior where you needed to use -lm to use anything in the device libraries? I don't see that being removed

That was removed earlier when these files were just sent directly to the link job, you can see that removed from this patch.

yaxunl added inline comments.Sep 13 2022, 8:28 AM

clang/lib/Driver/ToolChains/AMDGPU.cpp
717–723	Some toolchains use -march to specify the target processor, some toolchains use -mcpu to specify the target processor. For AMDGPU toolchain, it uses -mcpu to specify the target processor in the beginning for OpenCL. Then we add HIP support and let HIPAMD toolchain adopt using -mcpu to specify the target processor. It is better to have a consistent way for specifying target processor for AMDGPU toolchains. HIPAMD toolchain does this by https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L275 . Probably AMDGPUOpenMP toolchain can make a similar change? Then we have a consistent way to specify target processor for AMDGPU toolchains.

jhuber6 added inline comments.Sep 13 2022, 8:39 AM

clang/lib/Driver/ToolChains/AMDGPU.cpp
717–723	The rest of the OpenMP toolchain uses `-march` so I don't see a compelling reason to change it. If this is a stopper I'll just remove the changes to this function and check `-march` directly in the new `getROCmDeviceLibs` function for the OpenMP toolchain.

yaxunl added inline comments.Sep 13 2022, 9:29 AM

clang/lib/Driver/ToolChains/AMDGPU.cpp
717–723	There are a bunch of places in AMDGPU and HIPAMD toolchains that check the `-mcpu` option. If we allow -march as an alternative way to specify the target processor for the AMDGPU toolchain, it will introduce inconsistency and probably incur more changes. Probably leaving this code in AMDGPUOpenMP toolchain is a better choice for now.

march for openmp, mcpu for hip seems ok. Notably llc needs to be told this as well, using mcpu, which may be an issue for save-temps

Removing use of getGPUArch and just using -march directly for OpenMP

Harbormaster completed remote builds in B186397: Diff 459786.Sep 13 2022, 10:37 AM

yaxunl added inline comments.Sep 13 2022, 12:49 PM

clang/lib/Driver/ToolChains/AMDGPU.cpp
720–722	It seems the code still there.

jhuber6 added inline comments.Sep 13 2022, 12:50 PM

clang/lib/Driver/ToolChains/AMDGPU.cpp
720–722	Sorry thought I reverted this file. I'll fix it.

Removing old function update for 'mcpu'

Harbormaster completed remote builds in B186435: Diff 459848.Sep 13 2022, 1:31 PM

yaxunl added inline comments.Sep 13 2022, 1:44 PM

clang/include/clang/Driver/ToolChain.h
719	HIPSPV toolchain is not implemented on ROCm platform. Probably we need to make it even more generic. How about getDeviceLibs ? Also, you need to update the comment.

Changing interface to getAMDGPUDeviceLibs.

Harbormaster completed remote builds in B186453: Diff 459874.Sep 13 2022, 2:59 PM

yaxunl added inline comments.Sep 13 2022, 3:39 PM

clang/include/clang/Driver/ToolChain.h
719	well, HIPSPV toolchain is not for AMDGPU. My thinking is that this API is to get device libraries for the toolchain, not for a specific target or platform.

jhuber6 added inline comments.Sep 13 2022, 3:48 PM

clang/include/clang/Driver/ToolChain.h
719	I guess the problem is that this option can't be totally generic since it's in the top-level `Toolchain` namespace. Otherwise it would get confused with CUDA's device library. What's a good common name to use for it?

yaxunl added inline comments.Sep 13 2022, 4:06 PM

clang/include/clang/Driver/ToolChain.h
719	This API return device libs for the toolchain. For AMDGPU, HIPAMD, AMDGPUOpenMP toolchains, it returns device libs for AMDGPU. For HIPSPV toolchain, it returns device libs for HIP for Intel GPUs. If a CUDA toolchain decides to support this API, it will return device libs for CUDA. If a toolchain needs to return device libs for multiple targets, we can add parameters to control that, although so far there is no such need. Therefore, I don't see issue to call it getDeviceLibs.

Changing to getDeviceLibs. I suppose in the future we could make this work for CUDA, but for now it won't be defined for that toolchain so it's fine.

Harbormaster completed remote builds in B186482: Diff 459911.Sep 13 2022, 4:39 PM

I don't like this but will concede it's quicker than changing device libs to contain IR that doesn't have to be patched on the fly.

This revision is now accepted and ready to land.Sep 14 2022, 6:43 AM

Closed by commit rG194ec844f5c6: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU (authored by jhuber6). · Explain WhySep 14 2022, 7:42 AM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG194ec844f5c6: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU.

Diff 459786

clang/include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	virtual void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;		llvm::opt::ArgStringList &CC1Args) const;

/// On Windows, returns the MSVC compatibility version.		/// On Windows, returns the MSVC compatibility version.
virtual VersionTuple computeMSVCVersion(const Driver *D,		virtual VersionTuple computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const;		const llvm::opt::ArgList &Args) const;

/// Get paths of HIP device libraries.		/// Get paths of HIP device libraries.
virtual llvm::SmallVector<BitCodeLibraryInfo, 12>		virtual llvm::SmallVector<BitCodeLibraryInfo, 12>
getHIPDeviceLibs(const llvm::opt::ArgList &Args) const;		getROCmDeviceLibs(const llvm::opt::ArgList &Args) const;
		yaxunlUnsubmitted Not Done Reply Inline Actions HIPSPV toolchain is not implemented on ROCm platform. Probably we need to make it even more generic. How about getDeviceLibs ? Also, you need to update the comment. yaxunl: HIPSPV toolchain is not implemented on ROCm platform. Probably we need to make it even more…
		yaxunlUnsubmitted Not Done Reply Inline Actions well, HIPSPV toolchain is not for AMDGPU. My thinking is that this API is to get device libraries for the toolchain, not for a specific target or platform. yaxunl: well, HIPSPV toolchain is not for AMDGPU. My thinking is that this API is to get device…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I guess the problem is that this option can't be totally generic since it's in the top-level `Toolchain` namespace. Otherwise it would get confused with CUDA's device library. What's a good common name to use for it? jhuber6: I guess the problem is that this option can't be totally generic since it's in the top-level…
		yaxunlUnsubmitted Not Done Reply Inline Actions This API return device libs for the toolchain. For AMDGPU, HIPAMD, AMDGPUOpenMP toolchains, it returns device libs for AMDGPU. For HIPSPV toolchain, it returns device libs for HIP for Intel GPUs. If a CUDA toolchain decides to support this API, it will return device libs for CUDA. If a toolchain needs to return device libs for multiple targets, we can add parameters to control that, although so far there is no such need. Therefore, I don't see issue to call it getDeviceLibs. yaxunl: This API return device libs for the toolchain. For AMDGPU, HIPAMD, AMDGPUOpenMP toolchains, it…

/// Add the system specific linker arguments to use		/// Add the system specific linker arguments to use
/// for the given HIP runtime library type.		/// for the given HIP runtime library type.
virtual void AddHIPRuntimeLibArgs(const llvm::opt::ArgList &Args,		virtual void AddHIPRuntimeLibArgs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const {}		llvm::opt::ArgStringList &CmdArgs) const {}

/// Return sanitizers which are available in this toolchain.		/// Return sanitizers which are available in this toolchain.
virtual SanitizerMask getSupportedSanitizers() const;		virtual SanitizerMask getSupportedSanitizers() const;
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChain.cpp

	Show First 20 Lines • Show All 1,093 Lines • ▼ Show 20 Lines

	void ToolChain::AddCudaIncludeArgs(const ArgList &DriverArgs,			void ToolChain::AddCudaIncludeArgs(const ArgList &DriverArgs,
	ArgStringList &CC1Args) const {}			ArgStringList &CC1Args) const {}

	void ToolChain::AddHIPIncludeArgs(const ArgList &DriverArgs,			void ToolChain::AddHIPIncludeArgs(const ArgList &DriverArgs,
	ArgStringList &CC1Args) const {}			ArgStringList &CC1Args) const {}

	llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>			llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>
	ToolChain::getHIPDeviceLibs(const ArgList &DriverArgs) const {			ToolChain::getROCmDeviceLibs(const ArgList &DriverArgs) const {
	return {};			return {};
	}			}

	void ToolChain::AddIAMCUIncludeArgs(const ArgList &DriverArgs,			void ToolChain::AddIAMCUIncludeArgs(const ArgList &DriverArgs,
	ArgStringList &CC1Args) const {}			ArgStringList &CC1Args) const {}

	static VersionTuple separateMSVCFullVersion(unsigned Version) {			static VersionTuple separateMSVCFullVersion(unsigned Version) {
	if (Version < 100)			if (Version < 100)
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines	if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.push_back("-fvisibility=hidden");		CC1Args.push_back("-fvisibility=hidden");
CC1Args.push_back("-fapply-global-visibility-to-externs");		CC1Args.push_back("-fapply-global-visibility-to-externs");
}		}
}		}

StringRef		StringRef
AMDGPUToolChain::getGPUArch(const llvm::opt::ArgList &DriverArgs) const {		AMDGPUToolChain::getGPUArch(const llvm::opt::ArgList &DriverArgs) const {
		if (DriverArgs.hasArg(options::OPT_mcpu_EQ))
return getProcessorFromTargetID(		return getProcessorFromTargetID(
getTriple(), DriverArgs.getLastArgValue(options::OPT_mcpu_EQ));		getTriple(), DriverArgs.getLastArgValue(options::OPT_mcpu_EQ));
		if (DriverArgs.hasArg(options::OPT_march_EQ))
		return getProcessorFromTargetID(
		getTriple(), DriverArgs.getLastArgValue(options::OPT_march_EQ));
		yaxunlUnsubmitted Not Done Reply Inline Actions It seems the code still there. yaxunl: It seems the code still there.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Sorry thought I reverted this file. I'll fix it. jhuber6: Sorry thought I reverted this file. I'll fix it.
		return "";
		yaxunlUnsubmitted Not Done Reply Inline Actions can we emit an error if both -march and -mcpu are specified and the values are different? This is a potential source of error. yaxunl: can we emit an error if both -march and -mcpu are specified and the values are different? This…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I'm not exactly sure about the semantics for this, maybe someone else can help. AFAIK `-mcpu` is sometimes treated as an alias of `-mtune` which is generally implied by `-march`. But having `-march` and `-mtune` state different things isn't technically disallowed. `-march` seems to imply that we generate code that can run on a certain architecture. Since these are in some respects equivalent it would probably be fine to just take the last one specified. jhuber6: I'm not exactly sure about the semantics for this, maybe someone else can help. AFAIK `-mcpu`…
		yaxunlUnsubmitted Not Done Reply Inline Actions Some toolchains use -march to specify the target processor, some toolchains use -mcpu to specify the target processor. For AMDGPU toolchain, it uses -mcpu to specify the target processor in the beginning for OpenCL. Then we add HIP support and let HIPAMD toolchain adopt using -mcpu to specify the target processor. It is better to have a consistent way for specifying target processor for AMDGPU toolchains. HIPAMD toolchain does this by https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L275 . Probably AMDGPUOpenMP toolchain can make a similar change? Then we have a consistent way to specify target processor for AMDGPU toolchains. yaxunl: Some toolchains use -march to specify the target processor, some toolchains use -mcpu to…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions The rest of the OpenMP toolchain uses `-march` so I don't see a compelling reason to change it. If this is a stopper I'll just remove the changes to this function and check `-march` directly in the new `getROCmDeviceLibs` function for the OpenMP toolchain. jhuber6: The rest of the OpenMP toolchain uses `-march` so I don't see a compelling reason to change it.
		yaxunlUnsubmitted Not Done Reply Inline Actions There are a bunch of places in AMDGPU and HIPAMD toolchains that check the `-mcpu` option. If we allow -march as an alternative way to specify the target processor for the AMDGPU toolchain, it will introduce inconsistency and probably incur more changes. Probably leaving this code in AMDGPUOpenMP toolchain is a better choice for now. yaxunl: There are a bunch of places in AMDGPU and HIPAMD toolchains that check the `-mcpu` option. If…
}		}

AMDGPUToolChain::ParsedTargetIDType		AMDGPUToolChain::ParsedTargetIDType
AMDGPUToolChain::getParsedTargetID(const llvm::opt::ArgList &DriverArgs) const {		AMDGPUToolChain::getParsedTargetID(const llvm::opt::ArgList &DriverArgs) const {
StringRef TargetID = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);		StringRef TargetID = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);
if (TargetID.empty())		if (TargetID.empty())
return {None, None, None};		return {None, None, None};

▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

VersionTuple		VersionTuple
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

		llvm::SmallVector<BitCodeLibraryInfo, 12>
		getROCmDeviceLibs(const llvm::opt::ArgList &Args) const override;
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. I'd expect us to want only the common set. Hip.bc shouldn't have non-hip stuff in it. JonChesterfield: Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. I'd expect us to…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Existing virtual function, just re-used it. jhuber6: Existing virtual function, just re-used it.
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Rename it rocm perhaps? JonChesterfield: Rename it rocm perhaps?

const ToolChain &HostTC;		const ToolChain &HostTC;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	void AMDGPUOpenMPToolChain::addClangTargetOptions(

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;

		for (auto BCFile : getROCmDeviceLibs(DriverArgs)) {
		CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"
		: "-mlink-bitcode-file");
		CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));
		}

// Link the bitcode library late if we're using device LTO.		// Link the bitcode library late if we're using device LTO.
if (getDriver().isUsingLTO(/* IsOffload */ true))		if (getDriver().isUsingLTO(/* IsOffload */ true))
return;		return;

addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, GPUArch, getTriple());		addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, GPUArch, getTriple());
}		}

llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(		llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	SanitizerMask AMDGPUOpenMPToolChain::getSupportedSanitizers() const {
return HostTC.getSupportedSanitizers();		return HostTC.getSupportedSanitizers();
}		}

VersionTuple		VersionTuple
AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,		AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {		const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);		return HostTC.computeMSVCVersion(D, Args);
}		}

		llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>
		AMDGPUOpenMPToolChain::getROCmDeviceLibs(const llvm::opt::ArgList &Args) const {
		if (Args.hasArg(options::OPT_nogpulib))
		return {};

		if (!RocmInstallation.hasDeviceLibrary()) {
		getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;
		return {};
		}

		StringRef GpuArch = getProcessorFromTargetID(
		getTriple(), Args.getLastArgValue(options::OPT_march_EQ));

		SmallVector<BitCodeLibraryInfo, 12> BCLibs;
		for (auto BCLib : getCommonDeviceLibNames(Args, GpuArch.str(),
		/IsOpenMP=/true))
		BCLibs.emplace_back(BCLib);

		return BCLibs;
		}

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,361 Lines • ▼ Show 20 Lines

	void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,			void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
	const InputInfo &Output,			const InputInfo &Output,
	const InputInfoList &Inputs,			const InputInfoList &Inputs,
	const ArgList &Args,			const ArgList &Args,
	const char *LinkingOutput) const {			const char *LinkingOutput) const {
	const Driver &D = getToolChain().getDriver();			const Driver &D = getToolChain().getDriver();
	const llvm::Triple TheTriple = getToolChain().getTriple();			const llvm::Triple TheTriple = getToolChain().getTriple();
	auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();
	ArgStringList CmdArgs;			ArgStringList CmdArgs;

	// Pass the CUDA path to the linker wrapper tool.			// Pass the CUDA path to the linker wrapper tool.
	for (Action::OffloadKind Kind : {Action::OFK_Cuda, Action::OFK_OpenMP}) {			for (Action::OffloadKind Kind : {Action::OFK_Cuda, Action::OFK_OpenMP}) {
	auto TCRange = C.getOffloadToolChains(Kind);			auto TCRange = C.getOffloadToolChains(Kind);
	for (auto &I : llvm::make_range(TCRange.first, TCRange.second)) {			for (auto &I : llvm::make_range(TCRange.first, TCRange.second)) {
	const ToolChain *TC = I.second;			const ToolChain *TC = I.second;
	if (TC->getTriple().isNVPTX()) {			if (TC->getTriple().isNVPTX()) {
	CudaInstallationDetector CudaInstallation(D, TheTriple, Args);			CudaInstallationDetector CudaInstallation(D, TheTriple, Args);
	if (CudaInstallation.isValid())			if (CudaInstallation.isValid())
	CmdArgs.push_back(Args.MakeArgString(			CmdArgs.push_back(Args.MakeArgString(
	"--cuda-path=" + CudaInstallation.getInstallPath()));			"--cuda-path=" + CudaInstallation.getInstallPath()));
	break;			break;
	}			}
	}			}
	}			}

	// Get the AMDGPU math libraries.
	// FIXME: This method is bad, remove once AMDGPU has a proper math library
	// (see AMDGCN::OpenMPLinker::constructLLVMLinkCommand).
	for (auto &I : llvm::make_range(OpenMPTCRange.first, OpenMPTCRange.second)) {
	const ToolChain *TC = I.second;

	if (!TC->getTriple().isAMDGPU() \|\| Args.hasArg(options::OPT_nogpulib))
	continue;

	const ArgList &TCArgs = C.getArgsForToolChain(TC, "", Action::OFK_OpenMP);
	StringRef Arch = TCArgs.getLastArgValue(options::OPT_march_EQ);
	const toolchains::ROCMToolChain RocmTC(TC->getDriver(), TC->getTriple(),
	TCArgs);

	SmallVector<std::string, 12> BCLibs =
	RocmTC.getCommonDeviceLibNames(TCArgs, Arch.str());

	for (StringRef LibName : BCLibs)
	CmdArgs.push_back(Args.MakeArgString(
	"--bitcode-library=" +
	Action::GetOffloadKindName(Action::OFK_OpenMP) + "-" +
	TC->getTripleString() + "-" + Arch + "=" + LibName));
	}

	if (D.isUsingLTO(/* IsOffload */ true)) {			if (D.isUsingLTO(/* IsOffload */ true)) {
	// Pass in the optimization level to use for LTO.			// Pass in the optimization level to use for LTO.
	if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {			if (const Arg *A = Args.getLastArg(options::OPT_O_Group)) {
	StringRef OOpt;			StringRef OOpt;
	if (A->getOption().matches(options::OPT_O4) \|\|			if (A->getOption().matches(options::OPT_O4) \|\|
	A->getOption().matches(options::OPT_Ofast))			A->getOption().matches(options::OPT_Ofast))
	OOpt = "3";			OOpt = "3";
	else if (A->getOption().matches(options::OPT_O)) {			else if (A->getOption().matches(options::OPT_O)) {
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIPAMD.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:
void AddClangCXXStdlibIncludeArgs(		void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
llvm::SmallVector<BitCodeLibraryInfo, 12>		llvm::SmallVector<BitCodeLibraryInfo, 12>
getHIPDeviceLibs(const llvm::opt::ArgList &Args) const override;		getROCmDeviceLibs(const llvm::opt::ArgList &Args) const override;

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

VersionTuple		VersionTuple
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

unsigned GetDefaultDwarfVersion() const override { return 5; }		unsigned GetDefaultDwarfVersion() const override { return 5; }
Show All 13 Lines

clang/lib/Driver/ToolChains/HIPAMD.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	void HIPAMDToolChain::addClangTargetOptions(
// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.append({"-fvisibility=hidden"});		CC1Args.append({"-fvisibility=hidden"});
CC1Args.push_back("-fapply-global-visibility-to-externs");		CC1Args.push_back("-fapply-global-visibility-to-externs");
}		}

for (auto BCFile : getHIPDeviceLibs(DriverArgs)) {		for (auto BCFile : getROCmDeviceLibs(DriverArgs)) {
CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"		CC1Args.push_back(BCFile.ShouldInternalize ? "-mlink-builtin-bitcode"
: "-mlink-bitcode-file");		: "-mlink-bitcode-file");
CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));		CC1Args.push_back(DriverArgs.MakeArgString(BCFile.Path));
}		}
}		}

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
HIPAMDToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,		HIPAMDToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
}		}

VersionTuple HIPAMDToolChain::computeMSVCVersion(const Driver *D,		VersionTuple HIPAMDToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {		const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);		return HostTC.computeMSVCVersion(D, Args);
}		}

llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>		llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>
HIPAMDToolChain::getHIPDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {		HIPAMDToolChain::getROCmDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {
llvm::SmallVector<BitCodeLibraryInfo, 12> BCLibs;		llvm::SmallVector<BitCodeLibraryInfo, 12> BCLibs;
if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return {};		return {};
ArgStringList LibraryPaths;		ArgStringList LibraryPaths;

// Find in --hip-device-lib-path and HIP_LIBRARY_PATH.		// Find in --hip-device-lib-path and HIP_LIBRARY_PATH.
for (auto Path : RocmInstallation.getRocmDeviceLibPathArg())		for (auto Path : RocmInstallation.getRocmDeviceLibPathArg())
LibraryPaths.push_back(DriverArgs.MakeArgString(Path));		LibraryPaths.push_back(DriverArgs.MakeArgString(Path));
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIPSPV.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	public:
void AddClangCXXStdlibIncludeArgs(		void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
llvm::SmallVector<BitCodeLibraryInfo, 12>		llvm::SmallVector<BitCodeLibraryInfo, 12>
getHIPDeviceLibs(const llvm::opt::ArgList &Args) const override;		getROCmDeviceLibs(const llvm::opt::ArgList &Args) const override;

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

VersionTuple		VersionTuple
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,		void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,
Show All 23 Lines

clang/lib/Driver/ToolChains/HIPSPV.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	void HIPSPVToolChain::addClangTargetOptions(

// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat))		options::OPT_fvisibility_ms_compat))
CC1Args.append(		CC1Args.append(
{"-fvisibility=hidden", "-fapply-global-visibility-to-externs"});		{"-fvisibility=hidden", "-fapply-global-visibility-to-externs"});

llvm::for_each(getHIPDeviceLibs(DriverArgs),		llvm::for_each(getROCmDeviceLibs(DriverArgs),
[&](const BitCodeLibraryInfo &BCFile) {		[&](const BitCodeLibraryInfo &BCFile) {
CC1Args.append({"-mlink-builtin-bitcode",		CC1Args.append({"-mlink-builtin-bitcode",
DriverArgs.MakeArgString(BCFile.Path)});		DriverArgs.MakeArgString(BCFile.Path)});
});		});
}		}

Tool *HIPSPVToolChain::buildLinker() const {		Tool *HIPSPVToolChain::buildLinker() const {
assert(getTriple().getArch() == llvm::Triple::spirv64);		assert(getTriple().getArch() == llvm::Triple::spirv64);
Show All 35 Lines	if (hipPath.empty()) {
return;		return;
}		}
SmallString<128> P(hipPath);		SmallString<128> P(hipPath);
llvm::sys::path::append(P, "include");		llvm::sys::path::append(P, "include");
CC1Args.append({"-isystem", DriverArgs.MakeArgString(P)});		CC1Args.append({"-isystem", DriverArgs.MakeArgString(P)});
}		}

llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>		llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12>
HIPSPVToolChain::getHIPDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {		HIPSPVToolChain::getROCmDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {
llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12> BCLibs;		llvm::SmallVector<ToolChain::BitCodeLibraryInfo, 12> BCLibs;
if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return {};		return {};

ArgStringList LibraryPaths;		ArgStringList LibraryPaths;
// Find device libraries in --hip-device-lib-path and HIP_DEVICE_LIB_PATH.		// Find device libraries in --hip-device-lib-path and HIP_DEVICE_LIB_PATH.
auto HipDeviceLibPathArgs = DriverArgs.getAllArgValues(		auto HipDeviceLibPathArgs = DriverArgs.getAllArgValues(
// --hip-device-lib-path is alias to this option.		// --hip-device-lib-path is alias to this option.
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

clang/test/Driver/amdgpu-openmp-toolchain.c

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.+]]"			// CHECK-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.+]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_BC]]"], output: "[[BINARY:.+]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[DEVICE_BC]]"], output: "[[BINARY:.+]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
	// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"			// CHECK-BINDINGS: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -emit-llvm -S -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-EMIT-LLVM-IR
	// CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"			// CHECK-EMIT-LLVM-IR: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-emit-llvm"

	// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -lm --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NEW			// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx803 \
	// CHECK-LIB-DEVICE-NEW: {{.}}clang-linker-wrapper{{.}}--bitcode-library=openmp-amdgcn-amd-amdhsa-gfx803={{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"			// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \
				// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE
				// CHECK-LIB-DEVICE: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"

				// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx803 -nogpulib \
				// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \
				// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NOGPULIB
				// CHECK-LIB-DEVICE-NOGPULIB-NOT: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459786

clang/include/clang/Driver/ToolChain.h

clang/lib/Driver/ToolChain.cpp

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/HIPAMD.h

clang/lib/Driver/ToolChains/HIPAMD.cpp

clang/lib/Driver/ToolChains/HIPSPV.h

clang/lib/Driver/ToolChains/HIPSPV.cpp

clang/test/Driver/amdgpu-openmp-toolchain.c

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TUClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459786

clang/include/clang/Driver/ToolChain.h

clang/lib/Driver/ToolChain.cpp

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/HIPAMD.h

clang/lib/Driver/ToolChains/HIPAMD.cpp

clang/lib/Driver/ToolChains/HIPSPV.h

clang/lib/Driver/ToolChains/HIPSPV.cpp

clang/test/Driver/amdgpu-openmp-toolchain.c

[OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU
ClosedPublic