This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Driver/ToolChains/
-
lib/
-
Driver/
-
ToolChains/
-
Cuda.h
-
Cuda.cpp

Differential D128752

[CUDA] Stop adding CUDA features twice
ClosedPublic

Authored by jhuber6 on Jun 28 2022, 12:24 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
tra
yaxunl

Commits

rG56ab966a04dd: [CUDA] Stop adding CUDA features twice

Summary

We currently call the addNVPTXFeatures function in two places, inside
of the CUDA Toolchain and inside of Clang in the standard entry point.
We normally add features to the job in Clang, so the call inside of the
CUDA toolchain is redundant and results in +ptx features being added.
Since we remove this call, we no longer will have a cached CUDA
installation so we will usually create it twice.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Jun 28 2022, 12:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:24 PM

Herald added subscribers: mattd, carlosgalvezp. · View Herald Transcript

jhuber6 requested review of this revision.Jun 28 2022, 12:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:24 PM

Herald added subscribers: cfe-commits, MaskRay. · View Herald Transcript

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

In D128752#3616553, @tra wrote:

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.

In D128752#3616579, @jhuber6 wrote:

In D128752#3616553, @tra wrote:

we no longer will have a cached CUDA installation so we will usually create it twice.

Does that result in extra output in case we find an unexpected CUDA version, or when compiler is run with -v ?

We may want to wrap installation detector creation into some sort of singleton creation function.

We already create another one for the call coming from Clang, this patch just gets rid of the other call. I think the printouts you're talking about come from the variable in the CUDA toolchain specifically. Here we simply create one to get the version and throw it away. It's not ideal to do the same work twice, so we could wrap this into some singleton interface. Maybe a static optional value inside of the Toolchains/Cuda.cpp file wither a getter that returns a reference to it. Though I don't think this is likely to be a bottleneck.

We already heard complaints that searching for CUDA installation in multiple places does add a measurable delay when the search hits NFS-mounted directories.

Replacing uses of CudaInstallation with a getter function returning a reference to a singleton would be great.

In D128752#3616675, @tra wrote:

We already heard complaints that searching for CUDA installation in multiple places does add a measurable delay when the search hits NFS-mounted directories.

Replacing uses of CudaInstallation with a getter function returning a reference to a singleton would be great.

Sounds good, I'll make a patch for it. Considering that this one doesn't change that this one doesn't change anything we don't already do on that front is it good to land separately?

Harbormaster completed remote builds in B172555: Diff 440720.Jun 28 2022, 1:41 PM

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

This revision is now accepted and ready to land.Jun 28 2022, 1:52 PM

In D128752#3616831, @tra wrote:

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight

In D128752#3616837, @jhuber6 wrote:

In D128752#3616831, @tra wrote:

Do we have tests that verify -target-feature arguments? It may be worth adding a test case there checking for redundant features.

Yeah, we have some existing tests that check for including the target features at least once. I felt like there was no need to include a test for what was more or less an oversight

The test helps a lot to illustrate what the patch does. There are enough moving parts in the driver that, while I do believe that what the patch description says is intended to be true, I would like to see specific evidence that it's indeed the case.
Think of it as a test case that should've been added when we've started passing the feature flags.

Closed by commit rG56ab966a04dd: [CUDA] Stop adding CUDA features twice (authored by jhuber6). · Explain WhyJun 29 2022, 6:34 AM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG56ab966a04dd: [CUDA] Stop adding CUDA features twice.

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Cuda.h

3 lines

Cuda.cpp

14 lines

Diff 440987

clang/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	public:
void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

void getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,		void getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features,		std::vector<StringRef> &Features);
Optional<clang::CudaVersion> Version = None);

} // end namespace NVPTX		} // end namespace NVPTX
} // end namespace tools		} // end namespace tools

namespace toolchains {		namespace toolchains {

class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {		class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
public:		public:
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 630 Lines • ▼ Show 20 Lines	C.addCommand(std::make_unique<Command>(
JA, *this,		JA, *this,
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,		ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
"--options-file"},		"--options-file"},
Exec, CmdArgs, Inputs, Output));		Exec, CmdArgs, Inputs, Output));
}		}

void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,		void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features,		std::vector<StringRef> &Features) {
Optional<clang::CudaVersion> Version) {
if (Args.hasArg(options::OPT_cuda_feature_EQ)) {		if (Args.hasArg(options::OPT_cuda_feature_EQ)) {
StringRef PtxFeature =		StringRef PtxFeature =
Args.getLastArgValue(options::OPT_cuda_feature_EQ, "+ptx42");		Args.getLastArgValue(options::OPT_cuda_feature_EQ, "+ptx42");
Features.push_back(Args.MakeArgString(PtxFeature));		Features.push_back(Args.MakeArgString(PtxFeature));
return;		return;
} else if (!Version) {
CudaInstallationDetector CudaInstallation(D, Triple, Args);
Version = CudaInstallation.version();
}		}
		CudaInstallationDetector CudaInstallation(D, Triple, Args);

// New CUDA versions often introduce new instructions that are only supported		// New CUDA versions often introduce new instructions that are only supported
// by new PTX version, so we need to raise PTX level to enable them in NVPTX		// by new PTX version, so we need to raise PTX level to enable them in NVPTX
// back-end.		// back-end.
const char *PtxFeature = nullptr;		const char *PtxFeature = nullptr;
switch (*Version) {		switch (CudaInstallation.version()) {
#define CASE_CUDA_VERSION(CUDA_VER, PTX_VER) \		#define CASE_CUDA_VERSION(CUDA_VER, PTX_VER) \
case CudaVersion::CUDA_##CUDA_VER: \		case CudaVersion::CUDA_##CUDA_VER: \
PtxFeature = "+ptx" #PTX_VER; \		PtxFeature = "+ptx" #PTX_VER; \
break;		break;
CASE_CUDA_VERSION(115, 75);		CASE_CUDA_VERSION(115, 75);
CASE_CUDA_VERSION(114, 74);		CASE_CUDA_VERSION(114, 74);
CASE_CUDA_VERSION(113, 73);		CASE_CUDA_VERSION(113, 73);
CASE_CUDA_VERSION(112, 72);		CASE_CUDA_VERSION(112, 72);
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	if (LibDeviceFile.empty()) {
return;		return;
}		}

CC1Args.push_back("-mlink-builtin-bitcode");		CC1Args.push_back("-mlink-builtin-bitcode");
CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));		CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));

clang::CudaVersion CudaInstallationVersion = CudaInstallation.version();		clang::CudaVersion CudaInstallationVersion = CudaInstallation.version();

std::vector<StringRef> Features;
NVPTX::getNVPTXTargetFeatures(getDriver(), getTriple(), DriverArgs, Features,
CudaInstallationVersion);
for (StringRef PtxFeature : Features)
CC1Args.append({"-target-feature", DriverArgs.MakeArgString(PtxFeature)});
if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,		if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,
options::OPT_fno_cuda_short_ptr, false))		options::OPT_fno_cuda_short_ptr, false))
CC1Args.append({"-mllvm", "--nvptx-short-ptr"});		CC1Args.append({"-mllvm", "--nvptx-short-ptr"});

if (CudaInstallationVersion >= CudaVersion::UNKNOWN)		if (CudaInstallationVersion >= CudaVersion::UNKNOWN)
CC1Args.push_back(		CC1Args.push_back(
DriverArgs.MakeArgString(Twine("-target-sdk-version=") +		DriverArgs.MakeArgString(Twine("-target-sdk-version=") +
CudaVersionToString(CudaInstallationVersion)));		CudaVersionToString(CudaInstallationVersion)));
▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines