This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/
-
Driver/
-
Driver.cpp
-
ToolChains/
4/4
Cuda.h
9/10
Cuda.cpp
-
test/Driver/
-
Driver/
2/2
cuda-cross-compiling.c

Differential D140158

[CUDA] Allow targeting NVPTX directly without a host toolchain
ClosedPublic

Authored by jhuber6 on Dec 15 2022, 1:00 PM.

Download Raw Diff

Details

Reviewers

tra
yaxunl
JonChesterfield
jdoerfert

Commits

rG0660397e6809: [CUDA] Allow targeting NVPTX directly without a host toolchain

Summary

Currently, the NVPTX compilation toolchain can only be invoked either
through CUDA or OpenMP via --offload-device-only. This is because we
cannot build a CUDA toolchain without an accompanying host toolchain for
the offloading. When using --target=nvptx64-nvidia-cuda this results
in generating calls to the GNU assembler and linker, leading to errors.

This patch abstracts the portions of the CUDA toolchain that are
independent of the host toolchain or offloading kind into a new base
class called NVPTXToolChain. We still need to read the host's triple
to build the CUDA installation, so if not present we just assume it will
match the host's system for now, or the user can provide the path
explicitly.

This should allow the compiler driver to create NVPTX device images
directly from C/C++ code.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Dec 15 2022, 1:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 1:00 PM

Herald added subscribers: mattd, gchakrabarti, carlosgalvezp, asavonic. · View Herald Transcript

jhuber6 requested review of this revision.Dec 15 2022, 1:00 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptDec 15 2022, 1:00 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, sstefan1, MaskRay. · View Herald Transcript

LGTM overall.

So, essentially the patch refactors what we've already been doing for OpenMP and made it usable manually, which will be useful for things like GPU-side libc tests.

clang/lib/Driver/ToolChains/Cuda.cpp
636–637	Nit: I'd just do it within the `make_unique` below. It should end up on a line by itself and should not hurt readability.

I just realized the method of copying the .o to a .cubin doesn't work if the link step is done in the same compilation because it doesn't exist yet. To fix this I could either make the tool chain emit .cubin if we're going straight to nvlink, or use a symbolic link. The former is uglier, the latter will probably only work on Linux.

Also do you think I should include the CUDA headers in with this? We can always get rid of them with nogpuinc or similar if they're not needed. The AMDGPU compilation still links in the device libraries so I'm wondering if we should at least be consistent.

In D140158#3999716, @jhuber6 wrote:

Also do you think I should include the CUDA headers in with this? We can always get rid of them with nogpuinc or similar if they're not needed. The AMDGPU compilation still links in the device libraries so I'm wondering if we should at least be consistent.

For OpenMP at least we want this mode to include the device runtime and the respective native runtimes as well (except if we do LTO than both are included late).

In D140158#3999716, @jhuber6 wrote:

I just realized the method of copying the .o to a .cubin doesn't work if the link step is done in the same compilation because it doesn't exist yet. To fix this I could either make the tool chain emit .cubin if we're going straight to nvlink, or use a symbolic link. The former is uglier, the latter will probably only work on Linux.

Does it have to be .cubin ? Does nvlink require it?

Also do you think I should include the CUDA headers in with this? We can always get rid of them with nogpuinc or similar if they're not needed. The AMDGPU compilation still links in the device libraries so I'm wondering if we should at least be consistent.

Only some CUDA headers can be used from C++ and they tend to contain host-only declarations, and stubs or CPU implementations of GPU-side functions in that case.
The rest rely on CUDA extensions, attributes, So, in this case doing a C++ compilation will give you a host-side view of the headers in the best case, which is not going to be particularly useful.

If we want to make GPU-side functions available, we'll need to have a pre-included wrapper with more preprocessor magic to make CUDA headers usable. Doing that in a C++ compilation would be questionable as for a C++ compilation a user would normally assume that no headers are pre-included by the compiler.

In D140158#3999783, @tra wrote:

In D140158#3999716, @jhuber6 wrote:

I just realized the method of copying the .o to a .cubin doesn't work if the link step is done in the same compilation because it doesn't exist yet. To fix this I could either make the tool chain emit .cubin if we're going straight to nvlink, or use a symbolic link. The former is uglier, the latter will probably only work on Linux.

Does it have to be .cubin ? Does nvlink require it?

Yes, it's one of the dumbest things in the CUDA toolchain. If the filename is a .o they assume it's a host file containing embedded RDC-code that needs to be combined. If it's a .cubin they assume it's device code. I have no clue why they couldn't just check the ELF flags, it's trivial.

Also do you think I should include the CUDA headers in with this? We can always get rid of them with nogpuinc or similar if they're not needed. The AMDGPU compilation still links in the device libraries so I'm wondering if we should at least be consistent.

Only some CUDA headers can be used from C++ and they tend to contain host-only declarations, and stubs or CPU implementations of GPU-side functions in that case.
The rest rely on CUDA extensions, attributes, So, in this case doing a C++ compilation will give you a host-side view of the headers in the best case, which is not going to be particularly useful.

If we want to make GPU-side functions available, we'll need to have a pre-included wrapper with more preprocessor magic to make CUDA headers usable. Doing that in a C++ compilation would be questionable as for a C++ compilation a user would normally assume that no headers are pre-included by the compiler.

We might want to at least include the libdevice files, most of our wrappers definitely won't work without CUDA or OpenMP language modes.

If we do magic header including, we should check for the freestanding argument and not include them with that set. I would prefer we not include cuda headers into C++ source that isn't being compiled as cuda, and also not link in misc cuda library files.

Anyone who has decided to compile raw c or c++ to ptx is doing something off the beaten path, I don't think we should assume they want implicit behaviour from other programming models thrown in.

In D140158#3999810, @JonChesterfield wrote:

I don't think we should assume they want implicit behaviour from other programming models thrown in.

Agreed. Also, removing things is often surprisingly hard.
Let's keep things simlpe, get this compilation mode to the point where it's practically usable, see what we really want/need, and then make those common cases the default or easier to use.

As far as libdevice is concerned, it's only needed for sources that use CUDA headers, as those map the standard math functions to their implementations in libdevice. Stand-alone sources should not need it. We should also be able to compile libdevice.bc into a libdevice.o and then link with it, if needed, during the final executable link phase.

Harbormaster completed remote builds in B203446: Diff 483306.Dec 15 2022, 4:15 PM

Addressing comments, I did the symbolic link method. It's a stupid hack that's only necessary because of nvlink's poor handling but I think this works around it well enough.

Harbormaster completed remote builds in B203509: Diff 483389.Dec 15 2022, 8:08 PM

Fix format

Harbormaster completed remote builds in B203593: Diff 483507.Dec 16 2022, 6:58 AM

Accidentally deleted the old getInputFilename routine which we need. The
symlink worked fine but would break on Windows, so I ended up writing a hack
that would only use .cubin if we have the nvlink linker active and the input
comes from the assembler. It's extremely ugly so let me know what you think.

Harbormaster completed remote builds in B203646: Diff 483576.Dec 16 2022, 10:36 AM

Updating. Used a different method to determine if we need to use .cubin or .o. It's a little ugly but I don't think there's a better way to do it.

Also I just realized that if this goes through I could probably heavily simplify linker wrapper code by just invoking clang --target=nvptx64 instead of calling the tools manually. So that's another benefit to this.

Harbormaster completed remote builds in B208054: Diff 489554.Jan 16 2023, 8:42 AM

FWIW, creating CUBIN from C/C++ directly would be really useful when debugging (and in combination with our soon to be available JIT object loader).

@tra Can we get this in somehow?

LGTM overall, with few nits.

clang/lib/Driver/ToolChains/Cuda.cpp
448–450	Can't say that I like this approach. It heavily relies on "happens to work". Perhaps a better way to deal with this is to create the temporary GPU object file name with ".cubin" extension to start with.
450	typo: getInputFilename
470	Nit: I'd add an `else { Relocatable = false; // comment on what use cases use relocatable compilation by default. }` and leave it uninitialized here. At the very least it may be worth a comment. Silently defaulting to `true` makes me ask "what other compilation modes we may have?" and stand-alone compilation targeting NVPTX is hard to infer here from the surrounding details.
clang/lib/Driver/ToolChains/Cuda.h
211–212	Nit: it's hard to tell whether the whitespace additions are spaces or tabs. They show up as ">" to me which suggests it may be tabs. Just in case it is indeed the case, please make sure to un-tabify the changes.
clang/test/Driver/cuda-cross-compiling.c
38–39	This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I think we typically use something like `fatbinary{{.*}}"` in other tests.

jhuber6 added inline comments.Jan 18 2023, 12:27 PM

clang/lib/Driver/ToolChains/Cuda.cpp
448–450	As far as I know, the files are created independently of the tool-chains so I'm not sure if we'd be able to check there. The current way is to use `getInputFilename` but that doesn't have access to the compilation. As far as I can come up with there's the following solutions Do this and check the temp files Create a symbolic link if the file is empty (Doesn't work on Windows) Make a random global that's true if the Linker tool was built at some point
470	Works for me.
clang/test/Driver/cuda-cross-compiling.c
38–39	Good point.

jhuber6 marked an inline comment as done.Jan 18 2023, 12:28 PM

jhuber6 added inline comments.

clang/lib/Driver/ToolChains/Cuda.h
211–212	I think it's because the original file was formatted incorrectly, so all my new changes are formatted correctly. I'm somewhat more tempted to just clang-format it upstream and rebase it.

Addressing some comments. I don't know if there's a cleaner way to mess around with the .cubin nonsense. I liked symbolic links but that doesn't work on Windows.

jhuber6 marked 3 inline comments as done.Jan 18 2023, 3:24 PM

LGTM with few minor nits and questions.

In D140158#4063689, @jhuber6 wrote:

Addressing some comments. I don't know if there's a cleaner way to mess around with the .cubin nonsense. I liked symbolic links but that doesn't work on Windows.

Can we do a copy or rename instead. That could be more portable.

clang/lib/Driver/ToolChains/Cuda.cpp
448–450	Naming is hard. :-( OK, let's add a FIXME to this comment and hope to get it fixed when NVIDIA's tools become less obsessive about file extensions. I'll file a bug with them to get the ball rolling on their end.
519–522	Nit: Looks like there are tabs.
clang/lib/Driver/ToolChains/Cuda.h
211–212	Clang-formatting the file(s) and submitting that change separately before the patch SGTM.

This revision is now accepted and ready to land.Jan 18 2023, 3:35 PM

In D140158#4063720, @tra wrote:

LGTM with few minor nits and questions.

In D140158#4063689, @jhuber6 wrote:

Addressing some comments. I don't know if there's a cleaner way to mess around with the .cubin nonsense. I liked symbolic links but that doesn't work on Windows.

Can we do a copy or rename instead. That could be more portable.

I do a copy in this patch already. The problem is that if it's an internal link, e.g. clang --target=nvptx64-nvidia-cuda foo.c bar.c, then there will be no file to copy. The symbol link got around this by linking the files so once the .o was written the contents would automatically appear in the .cubin. The other approach is to detect this and use the correct names which is what this patch does, albeit a little jankily.

clang/lib/Driver/ToolChains/Cuda.cpp
448–450	I would like that. it's a very simple check if you actually know how ELF headers work... Also while you're at it, ask for how they set their machine flags in Nvidia so we can display the arch via `llvm-readelf`. I'll add some FIXME's.
519–522	I think that's just the diff showing that it was indented, there's no tabs in the file.
clang/lib/Driver/ToolChains/Cuda.h
211–212	Already did :)

Harbormaster completed remote builds in B208614: Diff 490317.Jan 18 2023, 4:05 PM

This revision was landed with ongoing or failed builds.Jan 18 2023, 4:18 PM

Closed by commit rG0660397e6809: [CUDA] Allow targeting NVPTX directly without a host toolchain (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 marked 3 inline comments as done.

jhuber6 added a commit: rG0660397e6809: [CUDA] Allow targeting NVPTX directly without a host toolchain.

jhuber6 mentioned this in D142133: [LinkerWrapper] Use `clang` to perform the device linking.Jan 19 2023, 9:23 AM

jhuber6 mentioned this in rGbec49b1d803c: [LinkerWrapper] Use `clang` to perform the device linking.Jan 19 2023, 10:48 AM

This patch breaks our cuda compilations. The output file isn't created after it:

$ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
$ good-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-good.o \
  && bad-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-bad.o \
  && ls qqq*.o
qqq-good.o

In D140158#4082789, @alexfh wrote:

This patch breaks our cuda compilations. The output file isn't created after it:

$ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
$ good-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-good.o \
  && bad-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-bad.o \
  && ls qqq*.o
qqq-good.o

https://github.com/llvm/llvm-project/issues/60301 Still broken after this fix?

In D140158#4082804, @jhuber6 wrote:

In D140158#4082789, @alexfh wrote:

This patch breaks our cuda compilations. The output file isn't created after it:

$ echo 'extern "C" __attribute__((global)) void q() {}' >q.cc
$ good-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-good.o \
  && bad-clang \
    -nocudainc -x cuda \
    --cuda-path=somepath/cuda/ \
    -Wno-unknown-cuda-version --cuda-device-only \
    -c q.cc -o qqq-bad.o \
  && ls qqq*.o
qqq-good.o

https://github.com/llvm/llvm-project/issues/60301 Still broken after this fix?

That works. Thanks!

Revision Contents

Path

Size

clang/

lib/

Driver/

Driver.cpp

8 lines

ToolChains/

Cuda.h

76 lines

Cuda.cpp

271 lines

test/

Driver/

cuda-cross-compiling.c

68 lines

Diff 490329

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 6,033 Lines • ▼ Show 20 Lines	case llvm::Triple::NaCl:
TC = std::make_unique<toolchains::NaClToolChain>(*this, Target, Args);		TC = std::make_unique<toolchains::NaClToolChain>(*this, Target, Args);
break;		break;
case llvm::Triple::Fuchsia:		case llvm::Triple::Fuchsia:
TC = std::make_unique<toolchains::Fuchsia>(*this, Target, Args);		TC = std::make_unique<toolchains::Fuchsia>(*this, Target, Args);
break;		break;
case llvm::Triple::Solaris:		case llvm::Triple::Solaris:
TC = std::make_unique<toolchains::Solaris>(*this, Target, Args);		TC = std::make_unique<toolchains::Solaris>(*this, Target, Args);
break;		break;
		case llvm::Triple::CUDA:
		TC = std::make_unique<toolchains::NVPTXToolChain>(*this, Target, Args);
		break;
case llvm::Triple::AMDHSA:		case llvm::Triple::AMDHSA:
TC = std::make_unique<toolchains::ROCMToolChain>(*this, Target, Args);		TC = std::make_unique<toolchains::ROCMToolChain>(*this, Target, Args);
break;		break;
case llvm::Triple::AMDPAL:		case llvm::Triple::AMDPAL:
case llvm::Triple::Mesa3D:		case llvm::Triple::Mesa3D:
TC = std::make_unique<toolchains::AMDGPUToolChain>(*this, Target, Args);		TC = std::make_unique<toolchains::AMDGPUToolChain>(*this, Target, Args);
break;		break;
case llvm::Triple::Win32:		case llvm::Triple::Win32:
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	default:
else if (Target.isOSBinFormatMachO())		else if (Target.isOSBinFormatMachO())
TC = std::make_unique<toolchains::MachO>(*this, Target, Args);		TC = std::make_unique<toolchains::MachO>(*this, Target, Args);
else		else
TC = std::make_unique<toolchains::Generic_GCC>(*this, Target, Args);		TC = std::make_unique<toolchains::Generic_GCC>(*this, Target, Args);
}		}
}		}
}		}

// Intentionally omitted from the switch above: llvm::Triple::CUDA. CUDA
// compiles always need two toolchains, the CUDA toolchain and the host
// toolchain. So the only valid way to create a CUDA toolchain is via
// CreateOffloadingDeviceToolChains.

return *TC;		return *TC;
}		}

const ToolChain &Driver::getOffloadingDeviceToolChain(		const ToolChain &Driver::getOffloadingDeviceToolChain(
const ArgList &Args, const llvm::Triple &Target, const ToolChain &HostTC,		const ArgList &Args, const llvm::Triple &Target, const ToolChain &HostTC,
const Action::OffloadKind &TargetDeviceOffloadKind) const {		const Action::OffloadKind &TargetDeviceOffloadKind) const {
// Use device / host triples as the key into the ToolChains map because the		// Use device / host triples as the key into the ToolChains map because the
// device ToolChain we create depends on both.		// device ToolChain we create depends on both.
▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	public:
void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

// Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX		// Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
// assembly into a single output file.		// assembly into a single output file.
		class LLVM_LIBRARY_VISIBILITY FatBinary : public Tool {
		public:
		FatBinary(const ToolChain &TC) : Tool("NVPTX::Linker", "fatbinary", TC) {}

		bool hasIntegratedCPP() const override { return false; }

		void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output, const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

		// Runs nvlink, which links GPU object files ("cubin" files) into a single file.
class LLVM_LIBRARY_VISIBILITY Linker : public Tool {		class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
public:		public:
Linker(const ToolChain &TC) : Tool("NVPTX::Linker", "fatbinary", TC) {}		Linker(const ToolChain &TC) : Tool("NVPTX::Linker", "fatbinary", TC) {}

bool hasIntegratedCPP() const override { return false; }		bool hasIntegratedCPP() const override { return false; }

void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

void getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,		void getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features);		std::vector<StringRef> &Features);

} // end namespace NVPTX		} // end namespace NVPTX
} // end namespace tools		} // end namespace tools

namespace toolchains {		namespace toolchains {

class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {		class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
		public:
		NVPTXToolChain(const Driver &D, const llvm::Triple &Triple,
		const llvm::Triple &HostTriple,
		const llvm::opt::ArgList &Args);

		NVPTXToolChain(const Driver &D, const llvm::Triple &Triple,
		const llvm::opt::ArgList &Args);

		llvm::opt::DerivedArgList *
		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
		Action::OffloadKind DeviceOffloadKind) const override;

		// Never try to use the integrated assembler with CUDA; always fork out to
		// ptxas.
		bool useIntegratedAs() const override { return false; }
		bool isCrossCompiling() const override { return true; }
		bool isPICDefault() const override { return false; }
		bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
		return false;
		}
		bool isPICDefaultForced() const override { return false; }
		bool SupportsProfiling() const override { return false; }

		bool IsMathErrnoDefault() const override { return false; }

		bool supportsDebugInfoOption(const llvm::opt::Arg *A) const override;
		void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,
		const llvm::opt::ArgList &Args) const override;

		// NVPTX supports only DWARF2.
		unsigned GetDefaultDwarfVersion() const override { return 2; }
		unsigned getMaxDwarfVersion() const override { return 2; }

		CudaInstallationDetector CudaInstallation;

		protected:
		Tool *buildAssembler() const override; // ptxas.
		Tool *buildLinker() const override; // nvlink.
		};

		class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
public:		public:
CudaToolChain(const Driver &D, const llvm::Triple &Triple,		CudaToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC, const llvm::opt::ArgList &Args);		const ToolChain &HostTC, const llvm::opt::ArgList &Args);

const llvm::Triple *getAuxTriple() const override {		const llvm::Triple *getAuxTriple() const override {
return &HostTC.getTriple();		return &HostTC.getTriple();
}		}

std::string getInputFilename(const InputInfo &Input) const override;		std::string getInputFilename(const InputInfo &Input) const override;

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;
void		void
addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;

llvm::DenormalMode getDefaultDenormalModeForType(		llvm::DenormalMode getDefaultDenormalModeForType(
const llvm::opt::ArgList &DriverArgs, const JobAction &JA,		const llvm::opt::ArgList &DriverArgs, const JobAction &JA,
const llvm::fltSemantics *FPType = nullptr) const override;		const llvm::fltSemantics *FPType = nullptr) const override;

// Never try to use the integrated assembler with CUDA; always fork out to
// ptxas.
bool useIntegratedAs() const override { return false; }
bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }
bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
return false;
}
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
bool supportsDebugInfoOption(const llvm::opt::Arg *A) const override;
void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,
const llvm::opt::ArgList &Args) const override;
bool IsMathErrnoDefault() const override { return false; }

void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;

void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;		void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;		CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void		void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,		AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddClangCXXStdlibIncludeArgs(		void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

VersionTuple		VersionTuple
		traUnsubmitted Done Reply Inline Actions Nit: it's hard to tell whether the whitespace additions are spaces or tabs. They show up as ">" to me which suggests it may be tabs. Just in case it is indeed the case, please make sure to un-tabify the changes. tra: Nit: it's hard to tell whether the whitespace additions are spaces or tabs. They show up as ">"…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I think it's because the original file was formatted incorrectly, so all my new changes are formatted correctly. I'm somewhat more tempted to just clang-format it upstream and rebase it. jhuber6: I think it's because the original file was formatted incorrectly, so all my new changes are…
		traUnsubmitted Done Reply Inline Actions Clang-formatting the file(s) and submitting that change separately before the patch SGTM. tra: Clang-formatting the file(s) and submitting that change separately before the patch SGTM.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Already did :) jhuber6: Already did :)
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

unsigned GetDefaultDwarfVersion() const override { return 2; }
// NVPTX supports only DWARF2.
unsigned getMaxDwarfVersion() const override { return 2; }

const ToolChain &HostTC;		const ToolChain &HostTC;
CudaInstallationDetector CudaInstallation;

/// Uses nvptx-arch tool to get arch of the system GPU. Will return error		/// Uses nvptx-arch tool to get arch of the system GPU. Will return error
/// if unable to find one.		/// if unable to find one.
virtual Expected<SmallVector<std::string>>		virtual Expected<SmallVector<std::string>>
getSystemGPUArchs(const llvm::opt::ArgList &Args) const override;		getSystemGPUArchs(const llvm::opt::ArgList &Args) const override;

protected:		protected:
Tool *buildAssembler() const override; // ptxas		Tool *buildAssembler() const override; // ptxas
Tool *buildLinker() const override; // fatbinary (ok, not really a linker)		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines
}		}

void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::NVPTXToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert(TC.getTriple().isNVPTX() && "Wrong platform");

StringRef GPUArchName;		StringRef GPUArchName;
// If this is an OpenMP action we need to extract the device architecture		// If this is a CUDA action we need to extract the device architecture
// from the -march=arch option. This option may come from -Xopenmp-target		// from the Job's associated architecture, otherwise use the -march=arch
// flag or the default value.		// option. This option may come from -Xopenmp-target flag or the default
if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {		// value.
		if (JA.isDeviceOffloading(Action::OFK_Cuda)) {
		GPUArchName = JA.getOffloadingArch();
		} else {
GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);		GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
assert(!GPUArchName.empty() && "Must have an architecture passed in.");		assert(!GPUArchName.empty() && "Must have an architecture passed in.");
} else		}
GPUArchName = JA.getOffloadingArch();

// Obtain architecture from the action.		// Obtain architecture from the action.
CudaArch gpu_arch = StringToCudaArch(GPUArchName);		CudaArch gpu_arch = StringToCudaArch(GPUArchName);
assert(gpu_arch != CudaArch::UNKNOWN &&		assert(gpu_arch != CudaArch::UNKNOWN &&
"Device action expected to have an architecture.");		"Device action expected to have an architecture.");

// Check that our installation's ptxas supports gpu_arch.		// Check that our installation's ptxas supports gpu_arch.
if (!Args.hasArg(options::OPT_no_cuda_version_check)) {		if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
// Pass -v to ptxas if it was passed to the driver.		// Pass -v to ptxas if it was passed to the driver.
if (Args.hasArg(options::OPT_v))		if (Args.hasArg(options::OPT_v))
CmdArgs.push_back("-v");		CmdArgs.push_back("-v");

CmdArgs.push_back("--gpu-name");		CmdArgs.push_back("--gpu-name");
CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));		CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
CmdArgs.push_back("--output-file");		CmdArgs.push_back("--output-file");
const char *OutputFileName = Args.MakeArgString(TC.getInputFilename(Output));		const char *OutputFileName = Args.MakeArgString(TC.getInputFilename(Output));
if (std::string(OutputFileName) != std::string(Output.getFilename()))
		// If we are invoking `nvlink` internally we need to output a `.cubin` file.
		// Checking if the output is a temporary is the cleanest way to determine
		// this. Putting this logic in `getInputFilename` isn't an option because it
		traUnsubmitted Done Reply Inline Actions typo: getInputFilename tra: typo: getInputFilename
		traUnsubmitted Not Done Reply Inline Actions Can't say that I like this approach. It heavily relies on "happens to work". Perhaps a better way to deal with this is to create the temporary GPU object file name with ".cubin" extension to start with. tra: Can't say that I like this approach. It heavily relies on "happens to work". Perhaps a better…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions As far as I know, the files are created independently of the tool-chains so I'm not sure if we'd be able to check there. The current way is to use `getInputFilename` but that doesn't have access to the compilation. As far as I can come up with there's the following solutions Do this and check the temp files Create a symbolic link if the file is empty (Doesn't work on Windows) Make a random global that's true if the Linker tool was built at some point jhuber6: As far as I know, the files are created independently of the tool-chains so I'm not sure if…
		traUnsubmitted Done Reply Inline Actions Naming is hard. :-( OK, let's add a FIXME to this comment and hope to get it fixed when NVIDIA's tools become less obsessive about file extensions. I'll file a bug with them to get the ball rolling on their end. tra: Naming is hard. :-( OK, let's add a FIXME to this comment and hope to get it fixed when…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I would like that. it's a very simple check if you actually know how ELF headers work... Also while you're at it, ask for how they set their machine flags in Nvidia so we can display the arch via `llvm-readelf`. I'll add some FIXME's. jhuber6: I would like that. it's a very simple check if you actually know how ELF headers work... Also…
		// relies on the compilation.
		// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
		if (Output.isFilename() &&
		llvm::find(C.getTempFiles(), Output.getFilename()) !=
		C.getTempFiles().end()) {
		SmallString<256> Filename(Output.getFilename());
		llvm::sys::path::replace_extension(Filename, "cubin");
		OutputFileName = Args.MakeArgString(Filename);
		}
		if (Output.isFilename() && OutputFileName != Output.getFilename())
C.addTempFile(OutputFileName);		C.addTempFile(OutputFileName);

CmdArgs.push_back(OutputFileName);		CmdArgs.push_back(OutputFileName);
for (const auto &II : Inputs)		for (const auto &II : Inputs)
CmdArgs.push_back(Args.MakeArgString(II.getFilename()));		CmdArgs.push_back(Args.MakeArgString(II.getFilename()));

for (const auto &A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))		for (const auto &A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
CmdArgs.push_back(Args.MakeArgString(A));		CmdArgs.push_back(Args.MakeArgString(A));

bool Relocatable = false;		bool Relocatable;
		traUnsubmitted Done Reply Inline Actions Nit: I'd add an `else { Relocatable = false; // comment on what use cases use relocatable compilation by default. }` and leave it uninitialized here. At the very least it may be worth a comment. Silently defaulting to `true` makes me ask "what other compilation modes we may have?" and stand-alone compilation targeting NVPTX is hard to infer here from the surrounding details. tra: Nit: I'd add an `else { Relocatable = false; // comment on what use cases use relocatable…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Works for me. jhuber6: Works for me.
if (JA.isOffloading(Action::OFK_OpenMP))		if (JA.isOffloading(Action::OFK_OpenMP))
// In OpenMP we need to generate relocatable code.		// In OpenMP we need to generate relocatable code.
Relocatable = Args.hasFlag(options::OPT_fopenmp_relocatable_target,		Relocatable = Args.hasFlag(options::OPT_fopenmp_relocatable_target,
options::OPT_fnoopenmp_relocatable_target,		options::OPT_fnoopenmp_relocatable_target,
/Default=/true);		/Default=/true);
else if (JA.isOffloading(Action::OFK_Cuda))		else if (JA.isOffloading(Action::OFK_Cuda))
		// In CUDA we generate relocatable code by default.
Relocatable = Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,		Relocatable = Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
/Default=/false);		/Default=/false);
		else
		// Otherwise, we are compiling directly and should create linkable output.
		Relocatable = true;

if (Relocatable)		if (Relocatable)
CmdArgs.push_back("-c");		CmdArgs.push_back("-c");

const char *Exec;		const char *Exec;
if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))		if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))
Exec = A->getValue();		Exec = A->getValue();
else		else
Show All 19 Lines	for (Arg *A : Args) {
}		}
}		}
return includePTX;		return includePTX;
}		}

// All inputs to this linker must be from CudaDeviceActions, as we need to look		// All inputs to this linker must be from CudaDeviceActions, as we need to look
// at the Inputs' Actions in order to figure out which GPU architecture they		// at the Inputs' Actions in order to figure out which GPU architecture they
// correspond to.		// correspond to.
void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::FatBinary::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
		traUnsubmitted Done Reply Inline Actions Nit: Looks like there are tabs. tra: Nit: Looks like there are tabs.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I think that's just the diff showing that it was indented, there's no tabs in the file. jhuber6: I think that's just the diff showing that it was indented, there's no tabs in the file.
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::CudaToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert(TC.getTriple().isNVPTX() && "Wrong platform");

ArgStringList CmdArgs;		ArgStringList CmdArgs;
if (TC.CudaInstallation.version() <= CudaVersion::CUDA_100)		if (TC.CudaInstallation.version() <= CudaVersion::CUDA_100)
CmdArgs.push_back("--cuda");		CmdArgs.push_back("--cuda");
CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-64" : "-32");		CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-64" : "-32");
Show All 30 Lines	void NVPTX::FatBinary::ConstructJob(Compilation &C, const JobAction &JA,
const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));		const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));
C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this,		JA, *this,
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,		ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
"--options-file"},		"--options-file"},
Exec, CmdArgs, Inputs, Output));		Exec, CmdArgs, Inputs, Output));
}		}

		void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const ArgList &Args,
		const char *LinkingOutput) const {
		const auto &TC =
		static_cast<const toolchains::NVPTXToolChain &>(getToolChain());
		assert(TC.getTriple().isNVPTX() && "Wrong platform");

		ArgStringList CmdArgs;
		if (Output.isFilename()) {
		CmdArgs.push_back("-o");
		CmdArgs.push_back(Output.getFilename());
		} else {
		assert(Output.isNothing() && "Invalid output.");
		}

		if (mustEmitDebugInfo(Args) == EmitSameDebugInfoAsHost)
		CmdArgs.push_back("-g");

		if (Args.hasArg(options::OPT_v))
		CmdArgs.push_back("-v");

		StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
		assert(!GPUArch.empty() && "At least one GPU Arch required for nvlink.");

		CmdArgs.push_back("-arch");
		CmdArgs.push_back(Args.MakeArgString(GPUArch));

		// Add paths specified in LIBRARY_PATH environment variable as -L options.
		addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");

		// Add paths for the default clang library path.
		SmallString<256> DefaultLibPath =
		llvm::sys::path::parent_path(TC.getDriver().Dir);
		llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
		CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));

		for (const auto &II : Inputs) {
		if (II.getType() == types::TY_LLVM_IR \|\| II.getType() == types::TY_LTO_IR \|\|
		II.getType() == types::TY_LTO_BC \|\| II.getType() == types::TY_LLVM_BC) {
		C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
		<< getToolChain().getTripleString();
		continue;
		}

		// Currently, we only pass the input files to the linker, we do not pass
		// any libraries that may be valid only for the host.
		if (!II.isFilename())
		continue;

		// The 'nvlink' application performs RDC-mode linking when given a '.o'
		// file and device linking when given a '.cubin' file. We always want to
		// perform device linking, so just rename any '.o' files.
		// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
		auto InputFile = getToolChain().getInputFilename(II);
		if (llvm::sys::path::extension(InputFile) != ".cubin") {
		// If there are no actions above this one then this is direct input and we
		// can copy it. Otherwise the input is internal so a `.cubin` file should
		// exist.
		if (II.getAction() && II.getAction()->getInputs().size() == 0) {
		const char *CubinF =
		Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
		llvm::sys::path::stem(InputFile), "cubin"));
		if (std::error_code EC =
		llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
		continue;

		CmdArgs.push_back(CubinF);
		traUnsubmitted Done Reply Inline Actions Nit: I'd just do it within the `make_unique` below. It should end up on a line by itself and should not hurt readability. tra: Nit: I'd just do it within the `make_unique` below. It should end up on a line by itself and…
		} else {
		SmallString<256> Filename(InputFile);
		llvm::sys::path::replace_extension(Filename, "cubin");
		CmdArgs.push_back(Args.MakeArgString(Filename));
		}
		} else {
		CmdArgs.push_back(Args.MakeArgString(InputFile));
		}
		}

		C.addCommand(std::make_unique<Command>(
		JA, *this,
		ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
		"--options-file"},
		Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
		Inputs, Output));
		}

void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,		void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features) {		std::vector<StringRef> &Features) {
if (Args.hasArg(options::OPT_cuda_feature_EQ)) {		if (Args.hasArg(options::OPT_cuda_feature_EQ)) {
StringRef PtxFeature =		StringRef PtxFeature =
Args.getLastArgValue(options::OPT_cuda_feature_EQ, "+ptx42");		Args.getLastArgValue(options::OPT_cuda_feature_EQ, "+ptx42");
Features.push_back(Args.MakeArgString(PtxFeature));		Features.push_back(Args.MakeArgString(PtxFeature));
return;		return;
Show All 26 Lines	case CudaVersion::CUDA_##CUDA_VER: \
CASE_CUDA_VERSION(90, 60);		CASE_CUDA_VERSION(90, 60);
#undef CASE_CUDA_VERSION		#undef CASE_CUDA_VERSION
default:		default:
PtxFeature = "+ptx42";		PtxFeature = "+ptx42";
}		}
Features.push_back(PtxFeature);		Features.push_back(PtxFeature);
}		}

/// CUDA toolchain. Our assembler is ptxas, and our "linker" is fatbinary,		/// NVPTX toolchain. Our assembler is ptxas, and our linker is nvlink. This
/// which isn't properly a linker but nonetheless performs the step of stitching		/// operates as a stand-alone version of the NVPTX tools without the host
/// together object files from the assembler into a single blob.		/// toolchain.
		NVPTXToolChain::NVPTXToolChain(const Driver &D, const llvm::Triple &Triple,
CudaToolChain::CudaToolChain(const Driver &D, const llvm::Triple &Triple,		const llvm::Triple &HostTriple,
const ToolChain &HostTC, const ArgList &Args)		const ArgList &Args)
: ToolChain(D, Triple, Args), HostTC(HostTC),		: ToolChain(D, Triple, Args), CudaInstallation(D, HostTriple, Args) {
CudaInstallation(D, HostTC.getTriple(), Args) {
if (CudaInstallation.isValid()) {		if (CudaInstallation.isValid()) {
CudaInstallation.WarnIfUnsupportedVersion();		CudaInstallation.WarnIfUnsupportedVersion();
getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));		getProgramPaths().push_back(std::string(CudaInstallation.getBinPath()));
}		}
// Lookup binaries into the driver directory, this is used to		// Lookup binaries into the driver directory, this is used to
// discover the clang-offload-bundler executable.		// discover the clang-offload-bundler executable.
getProgramPaths().push_back(getDriver().Dir);		getProgramPaths().push_back(getDriver().Dir);
}		}

std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {		/// We only need the host triple to locate the CUDA binary utilities, use the
// Only object files are changed, for example assembly files keep their .s		/// system's default triple if not provided.
// extensions. If the user requested device-only compilation don't change it.		NVPTXToolChain::NVPTXToolChain(const Driver &D, const llvm::Triple &Triple,
if (Input.getType() != types::TY_Object \|\| getDriver().offloadDeviceOnly())		const ArgList &Args)
return ToolChain::getInputFilename(Input);		: NVPTXToolChain(D, Triple,
		llvm::Triple(llvm::sys::getDefaultTargetTriple()), Args) {}

// Replace extension for object files with cubin because nvlink relies on		llvm::opt::DerivedArgList *
// these particular file names.		NVPTXToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
SmallString<256> Filename(ToolChain::getInputFilename(Input));		StringRef BoundArch,
llvm::sys::path::replace_extension(Filename, "cubin");		Action::OffloadKind DeviceOffloadKind) const {
return std::string(Filename.str());		DerivedArgList *DAL =
		ToolChain::TranslateArgs(Args, BoundArch, DeviceOffloadKind);
		if (!DAL)
		DAL = new DerivedArgList(Args.getBaseArgs());

		const OptTable &Opts = getDriver().getOpts();

		for (Arg *A : Args)
		if (!llvm::is_contained(*DAL, A))
		DAL->append(A);

		if (!DAL->hasArg(options::OPT_march_EQ))
		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
		CudaArchToString(CudaArch::CudaDefault));

		return DAL;
}		}

		bool NVPTXToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {
		const Option &O = A->getOption();
		return (O.matches(options::OPT_gN_Group) &&
		!O.matches(options::OPT_gmodules)) \|\|
		O.matches(options::OPT_g_Flag) \|\|
		O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|
		O.matches(options::OPT_gdwarf) \|\| O.matches(options::OPT_gdwarf_2) \|\|
		O.matches(options::OPT_gdwarf_3) \|\| O.matches(options::OPT_gdwarf_4) \|\|
		O.matches(options::OPT_gdwarf_5) \|\|
		O.matches(options::OPT_gcolumn_info);
		}

		void NVPTXToolChain::adjustDebugInfoKind(
		codegenoptions::DebugInfoKind &DebugInfoKind, const ArgList &Args) const {
		switch (mustEmitDebugInfo(Args)) {
		case DisableDebugInfo:
		DebugInfoKind = codegenoptions::NoDebugInfo;
		break;
		case DebugDirectivesOnly:
		DebugInfoKind = codegenoptions::DebugDirectivesOnly;
		break;
		case EmitSameDebugInfoAsHost:
		// Use same debug info level as the host.
		break;
		}
		}

		/// CUDA toolchain. Our assembler is ptxas, and our "linker" is fatbinary,
		/// which isn't properly a linker but nonetheless performs the step of stitching
		/// together object files from the assembler into a single blob.

		CudaToolChain::CudaToolChain(const Driver &D, const llvm::Triple &Triple,
		const ToolChain &HostTC, const ArgList &Args)
		: NVPTXToolChain(D, Triple, HostTC.getTriple(), Args), HostTC(HostTC) {}

void CudaToolChain::addClangTargetOptions(		void CudaToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|		assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
options::OPT_fno_gpu_flush_denormals_to_zero, false))		options::OPT_fno_gpu_flush_denormals_to_zero, false))
return llvm::DenormalMode::getPreserveSign();		return llvm::DenormalMode::getPreserveSign();
}		}

assert(JA.getOffloadingDeviceKind() != Action::OFK_Host);		assert(JA.getOffloadingDeviceKind() != Action::OFK_Host);
return llvm::DenormalMode::getIEEE();		return llvm::DenormalMode::getIEEE();
}		}

bool CudaToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {
const Option &O = A->getOption();
return (O.matches(options::OPT_gN_Group) &&
!O.matches(options::OPT_gmodules)) \|\|
O.matches(options::OPT_g_Flag) \|\|
O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|
O.matches(options::OPT_gdwarf) \|\| O.matches(options::OPT_gdwarf_2) \|\|
O.matches(options::OPT_gdwarf_3) \|\| O.matches(options::OPT_gdwarf_4) \|\|
O.matches(options::OPT_gdwarf_5) \|\|
O.matches(options::OPT_gcolumn_info);
}

void CudaToolChain::adjustDebugInfoKind(
codegenoptions::DebugInfoKind &DebugInfoKind, const ArgList &Args) const {
switch (mustEmitDebugInfo(Args)) {
case DisableDebugInfo:
DebugInfoKind = codegenoptions::NoDebugInfo;
break;
case DebugDirectivesOnly:
DebugInfoKind = codegenoptions::DebugDirectivesOnly;
break;
case EmitSameDebugInfoAsHost:
// Use same debug info level as the host.
break;
}
}

void CudaToolChain::AddCudaIncludeArgs(const ArgList &DriverArgs,		void CudaToolChain::AddCudaIncludeArgs(const ArgList &DriverArgs,
ArgStringList &CC1Args) const {		ArgStringList &CC1Args) const {
// Check our CUDA version if we're going to include the CUDA headers.		// Check our CUDA version if we're going to include the CUDA headers.
if (!DriverArgs.hasArg(options::OPT_nogpuinc) &&		if (!DriverArgs.hasArg(options::OPT_nogpuinc) &&
!DriverArgs.hasArg(options::OPT_no_cuda_version_check)) {		!DriverArgs.hasArg(options::OPT_no_cuda_version_check)) {
StringRef Arch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef Arch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!Arch.empty() && "Must have an explicit GPU arch.");		assert(!Arch.empty() && "Must have an explicit GPU arch.");
CudaInstallation.CheckCudaVersionSupportsArch(StringToCudaArch(Arch));		CudaInstallation.CheckCudaVersionSupportsArch(StringToCudaArch(Arch));
}		}
CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);		CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
}		}

		std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
		// Only object files are changed, for example assembly files keep their .s
		// extensions. If the user requested device-only compilation don't change it.
		if (Input.getType() != types::TY_Object \|\| getDriver().offloadDeviceOnly())
		return ToolChain::getInputFilename(Input);

		// Replace extension for object files with cubin because nvlink relies on
		// these particular file names.
		SmallString<256> Filename(ToolChain::getInputFilename(Input));
		llvm::sys::path::replace_extension(Filename, "cubin");
		return std::string(Filename.str());
		}

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,		CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
StringRef BoundArch,		StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {		Action::OffloadKind DeviceOffloadKind) const {
DerivedArgList *DAL =		DerivedArgList *DAL =
HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);		HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
if (!DAL)		if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());		DAL = new DerivedArgList(Args.getBaseArgs());
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	CudaToolChain::getSystemGPUArchs(const ArgList &Args) const {

if (GPUArchs.empty())		if (GPUArchs.empty())
return llvm::createStringError(std::error_code(),		return llvm::createStringError(std::error_code(),
"No NVIDIA GPU detected in the system");		"No NVIDIA GPU detected in the system");

return std::move(GPUArchs);		return std::move(GPUArchs);
}		}

		Tool *NVPTXToolChain::buildAssembler() const {
		return new tools::NVPTX::Assembler(*this);
		}

		Tool *NVPTXToolChain::buildLinker() const {
		return new tools::NVPTX::Linker(*this);
		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
return new tools::NVPTX::Linker(*this);		return new tools::NVPTX::FatBinary(*this);
}		}

void CudaToolChain::addClangWarningOptions(ArgStringList &CC1Args) const {		void CudaToolChain::addClangWarningOptions(ArgStringList &CC1Args) const {
HostTC.addClangWarningOptions(CC1Args);		HostTC.addClangWarningOptions(CC1Args);
}		}

ToolChain::CXXStdlibType		ToolChain::CXXStdlibType
CudaToolChain::GetCXXStdlibType(const ArgList &Args) const {		CudaToolChain::GetCXXStdlibType(const ArgList &Args) const {
Show All 40 Lines

clang/test/Driver/cuda-cross-compiling.c

This file was added.

				// Tests the driver when targeting the NVPTX architecture directly without a
				// host toolchain to perform CUDA mappings.

				// REQUIRES: nvptx-registered-target

				//
				// Test the generated phases when targeting NVPTX.
				//
				// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-phases %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=PHASES %s

				// PHASES: 0: input, "[[INPUT:.+]]", c
				// PHASES-NEXT: 1: preprocessor, {0}, cpp-output
				// PHASES-NEXT: 2: compiler, {1}, ir
				// PHASES-NEXT: 3: backend, {2}, assembler
				// PHASES-NEXT: 4: assembler, {3}, object
				// PHASES-NEXT: 5: linker, {4}, image

				//
				// Test the generated bindings when targeting NVPTX.
				//
				// RUN: %clang -target nvptx64-nvidia-cuda -ccc-print-bindings %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=BINDINGS %s

				// BINDINGS: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]].s"
				// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]].s"], output: "[[CUBIN:.+]].o"
				// BINDINGS-NEXT: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]].o"], output: "a.out"

				//
				// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
				// Ensure that the '.o' files are converted to '.cubin' if produced internally.
				//
				// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=ARGS %s

				// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.}} "-o" "[[PTX:.+]].s"
				// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
				// ARGS-NEXT: nvlink{{.}}"-o" "a.out" "-arch" "sm_61" {{.}} "[[CUBIN]].cubin"

				traUnsubmitted Done Reply Inline Actions This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I think we typically use something like `fatbinary{{.}}"` in other tests. tra:* This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I think we…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Good point. jhuber6: Good point.
				//
				// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
				// Ensure that we emit '.o' files if compiled with '-c'
				//
				// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -c -### %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=OBJECT %s

				// OBJECT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.}} "-o" "[[PTX:.+]].s"
				// OBJECT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[OBJ:.+]].o" "[[PTX]].s" "-c"

				//
				// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
				// Ensure that we copy input '.o' files to '.cubin' files when linking.
				//
				// RUN: touch %t.o
				// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
				// RUN: \| FileCheck -check-prefix=LINK %s

				// LINK: nvlink{{.}}"-o" "a.out" "-arch" "sm_61" {{.}} "{{.*}}.cubin"

				//
				// Test the generated arguments default to a value with no architecture.
				//
				// RUN: %clang -target nvptx64-nvidia-cuda -### %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=DEFAULT %s

				// DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.}} "-target-cpu" "sm_35" "-target-feature" "+ptx{{[0-9]+}}" {{.}} "-o" "[[PTX:.+]].s"
				// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
				// DEFAULT-NEXT: nvlink{{.}}"-o" "a.out" "-arch" "sm_35" {{.}} "[[CUBIN]].cubin"

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Allow targeting NVPTX directly without a host toolchainClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 490329

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/Cuda.h

clang/lib/Driver/ToolChains/Cuda.cpp

clang/test/Driver/cuda-cross-compiling.c

[CUDA] Allow targeting NVPTX directly without a host toolchain
ClosedPublic