This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Invoke ptxas and fatbinary during compilation.
ClosedPublic

Authored by jlebar on Jan 11 2016, 1:20 PM.

Download Raw Diff

Details

Reviewers

tra
echristo

Commits

rG21e5d4fcfa4c: [CUDA] Invoke ptxas and fatbinary during compilation.
rC257809: [CUDA] Invoke ptxas and fatbinary during compilation.
rL257809: [CUDA] Invoke ptxas and fatbinary during compilation.

Summary

Previously we compiled CUDA device code to PTX assembly and embedded
that asm as text in our host binary. Now we compile to PTX assembly and
then invoke ptxas to assemble the PTX into a cubin file. We gather the
ptx and cubin files for each of our --cuda-gpu-archs and combine them
using fatbinary, and then embed that into the host binary.

Adds two new command-line flags, -Xcuda_ptxas and -Xcuda_fatbinary,
which pass args down to the external tools.

(The -S test removed in cuda-options.cu is added back with some modifications in D16081.)

Diff Detail

Event Timeline

jlebar updated this revision to Diff 44543.Jan 11 2016, 1:20 PM

jlebar retitled this revision from to [CUDA] Invoke ptxas and fatbinary during compilation..

jlebar updated this object.

jlebar added reviewers: tra, echristo.

jlebar added subscribers: jhen, cfe-commits.

jlebar updated this object.Jan 11 2016, 1:37 PM

Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.

lib/Driver/Driver.cpp
1380	So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin? Perhaps it warrants its own CudaDeviceLinkAction?
1620–1621	cubin is an ELF object file. Do we really need a new type here or can we get by with TY_Object?
lib/Driver/ToolChains.cpp
4193	unneeded {} here and in few more places throughout the patch.
4229	You may as well move it out of the loop and return early if BoundArch is nullptr.
lib/Driver/Tools.cpp
10621–10625	CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32"); or, even ArgStringList CmdArgs = {TC.getTriple().isArch64Bit() ? "-m64" : "-m32"}; Same in Linker::ConstructJob below.
10677–10678	First line does not parse. In general compute_XX does not necessarily match sm_XX. ptxas options says that Allowed values for this option: compute_20, compute_30, compute_35, compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52 Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map.
lib/Driver/Tools.h
923	Please add more details about what fatbin does. ".. which combines GPU object files and, optionally, PTX assembly into a single output file."

One question inline, one nit, and one more question here: You've got a couple of checks inline for null names/architectures, where do you expect those to come from and can you test for them? Or, another question, is if they're multiple architectures shouldn't we be able to see all of the actions that arise from each gpu?

Or I'm missing something, either way an explanation is good. :)

-eric

lib/Driver/Driver.cpp
1875	Weren't you just adding it as part of the InputInfo constructor in 16078?
lib/Driver/ToolChains.cpp
4260–4261	Nit: No braces around single lines.

jlebar mentioned this in D16097: [CUDA] Add explicit mapping from sm_XX to compute_YY..Jan 11 2016, 4:38 PM

Address tra, echristo's review comments.

In D16082#324138, @tra wrote:

Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.

Thanks. All of them worked except -fintegrated-as, which was causing us not to invoke ptxas. Fixed, and added a test.

lib/Driver/Driver.cpp
1380	It's a special case either way, but there are enough places that look for a CudaDeviceAction that I think this special case is cleaner than having a bunch of if (CudaDeviceAction \|\| CudaDeviceLinkAction)s floating around.
1620–1621	Got rid of the extra type. Note that this means that our intermediate objs will be named foo.o instead of foo.cubin, which isn't optimal, but I agree that it's simpler without the extra type.
1875	Yes, but we're doing something slightly different here. Suppose you have a CudaDeviceAction --> BackendAction. What do you want the InputInfo's Action to be? Without this change, it's the BackendAction. But we really want it to be the CudaDeviceAction. Thus this line. I updated the comment in an attempt to clarify.
lib/Driver/ToolChains.cpp
4229	Tried this and discovered it's actually subtly different in a way that, thankfully, affects one of the tests I added. if (!BoundArch) continue applies only if A->getOption().matches(options::OPT_Xarch__). So we can't hoist it into an early return.
lib/Driver/Tools.cpp
10677–10678	Oh goodness, how awful. Thank you for catching that. Fixed.

Add test checking that sm_XX gets translated to compute_YY correctly.

jlebar mentioned this in rL257530: [CUDA] Add explicit mapping from sm_XX to compute_YY..Jan 12 2016, 2:26 PM

LGTM.

This revision is now accepted and ready to land.Jan 12 2016, 3:54 PM

This is terrible, but the only other option is fixing bind arch and inverting the graph which is a major rewrite to the driver.

So, LGTM.

-eric

Closed by commit rL257809: [CUDA] Invoke ptxas and fatbinary during compilation. (authored by jlebar). · Explain WhyJan 14 2016, 1:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

clang/

Driver/

6 lines

4 lines

2 lines

1 line

lib/

CodeGen/

CGCUDANV.cpp

2 lines

Driver/

2 lines

52 lines

11 lines

27 lines

35 lines

78 lines

3 lines

test/

Driver/

Inputs/

CUDA/

usr/

local/

cuda/

bin/

.keep

cuda-external-tools.cu

70 lines

cuda-options.cu

25 lines

Diff 44584

include/clang/Driver/Action.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	public:

static bool classof(const Action *A) {		static bool classof(const Action *A) {
return A->getKind() == BindArchClass;		return A->getKind() == BindArchClass;
}		}
};		};

class CudaDeviceAction : public Action {		class CudaDeviceAction : public Action {
virtual void anchor();		virtual void anchor();
/// GPU architecture to bind. Always of the form /sm_\d+/.		/// GPU architecture to bind. Always of the form /sm_\d+/ or null (when the
		/// action applies to multiple architectures).
const char *GpuArchName;		const char *GpuArchName;
/// True when action results are not consumed by the host action (e.g when		/// True when action results are not consumed by the host action (e.g when
/// -fsyntax-only or --cuda-device-only options are used).		/// -fsyntax-only or --cuda-device-only options are used).
bool AtTopLevel;		bool AtTopLevel;

public:		public:
CudaDeviceAction(Action Input, const char ArchName, bool AtTopLevel);		CudaDeviceAction(Action Input, const char ArchName, bool AtTopLevel);

const char *getGpuArchName() const { return GpuArchName; }		const char *getGpuArchName() const { return GpuArchName; }

/// Gets the compute_XX that corresponds to getGpuArchName().		/// Gets the compute_XX that corresponds to getGpuArchName(). Returns null
		/// when getGpuArchName() is null.
const char *getComputeArchName() const;		const char *getComputeArchName() const;

bool isAtTopLevel() const { return AtTopLevel; }		bool isAtTopLevel() const { return AtTopLevel; }

static bool IsValidGpuArchName(llvm::StringRef ArchName);		static bool IsValidGpuArchName(llvm::StringRef ArchName);

static bool classof(const Action *A) {		static bool classof(const Action *A) {
return A->getKind() == CudaDeviceClass;		return A->getKind() == CudaDeviceClass;
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines
	def Xanalyzer : Separate<["-"], "Xanalyzer">,			def Xanalyzer : Separate<["-"], "Xanalyzer">,
	HelpText<"Pass <arg> to the static analyzer">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the static analyzer">, MetaVarName<"<arg>">;
	def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[DriverOption]>;			def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[DriverOption]>;
	def Xassembler : Separate<["-"], "Xassembler">,			def Xassembler : Separate<["-"], "Xassembler">,
	HelpText<"Pass <arg> to the assembler">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the assembler">, MetaVarName<"<arg>">;
	def Xclang : Separate<["-"], "Xclang">,			def Xclang : Separate<["-"], "Xclang">,
	HelpText<"Pass <arg> to the clang compiler">, MetaVarName<"<arg>">,			HelpText<"Pass <arg> to the clang compiler">, MetaVarName<"<arg>">,
	Flags<[DriverOption, CoreOption]>;			Flags<[DriverOption, CoreOption]>;
				def Xcuda_fatbinary : Separate<["-"], "Xcuda-fatbinary">,
				HelpText<"Pass <arg> to fatbinary invocation">, MetaVarName<"<arg>">;
				def Xcuda_ptxas : Separate<["-"], "Xcuda-ptxas">,
				HelpText<"Pass <arg> to the ptxas assembler">, MetaVarName<"<arg>">;
	def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,			def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,
	HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">;			HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">;
	def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,			def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,
	HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">;
	def Xpreprocessor : Separate<["-"], "Xpreprocessor">,			def Xpreprocessor : Separate<["-"], "Xpreprocessor">,
	HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;
	def X_Flag : Flag<["-"], "X">;			def X_Flag : Flag<["-"], "X">;
	def X_Joined : Joined<["-"], "X">;			def X_Joined : Joined<["-"], "X">;
	▲ Show 20 Lines • Show All 1,787 Lines • Show Last 20 Lines

include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	public:
/// IsBlocksDefault - Does this tool chain enable -fblocks by default.		/// IsBlocksDefault - Does this tool chain enable -fblocks by default.
virtual bool IsBlocksDefault() const { return false; }		virtual bool IsBlocksDefault() const { return false; }

/// IsIntegratedAssemblerDefault - Does this tool chain enable -integrated-as		/// IsIntegratedAssemblerDefault - Does this tool chain enable -integrated-as
/// by default.		/// by default.
virtual bool IsIntegratedAssemblerDefault() const { return false; }		virtual bool IsIntegratedAssemblerDefault() const { return false; }

/// \brief Check if the toolchain should use the integrated assembler.		/// \brief Check if the toolchain should use the integrated assembler.
bool useIntegratedAs() const;		virtual bool useIntegratedAs() const;

/// IsMathErrnoDefault - Does this tool chain use -fmath-errno by default.		/// IsMathErrnoDefault - Does this tool chain use -fmath-errno by default.
virtual bool IsMathErrnoDefault() const { return true; }		virtual bool IsMathErrnoDefault() const { return true; }

/// IsEncodeExtendedBlockSignatureDefault - Does this tool chain enable		/// IsEncodeExtendedBlockSignatureDefault - Does this tool chain enable
/// -fencode-extended-block-signature by default.		/// -fencode-extended-block-signature by default.
virtual bool IsEncodeExtendedBlockSignatureDefault() const { return false; }		virtual bool IsEncodeExtendedBlockSignatureDefault() const { return false; }

▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

include/clang/Driver/Types.def

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	TYPE("rewritten-legacy-objc", RewrittenLegacyObjC,INVALID, "cpp", "")			TYPE("rewritten-legacy-objc", RewrittenLegacyObjC,INVALID, "cpp", "")
	TYPE("remap", Remap, INVALID, "remap", "")			TYPE("remap", Remap, INVALID, "remap", "")
	TYPE("precompiled-header", PCH, INVALID, "gch", "A")			TYPE("precompiled-header", PCH, INVALID, "gch", "A")
	TYPE("object", Object, INVALID, "o", "")			TYPE("object", Object, INVALID, "o", "")
	TYPE("treelang", Treelang, INVALID, nullptr, "u")			TYPE("treelang", Treelang, INVALID, nullptr, "u")
	TYPE("image", Image, INVALID, "out", "")			TYPE("image", Image, INVALID, "out", "")
	TYPE("dSYM", dSYM, INVALID, "dSYM", "A")			TYPE("dSYM", dSYM, INVALID, "dSYM", "A")
	TYPE("dependencies", Dependencies, INVALID, "d", "")			TYPE("dependencies", Dependencies, INVALID, "d", "")
				TYPE("cuda-fatbin", CUDA_FATBIN, INVALID, "fatbin","A")
	TYPE("none", Nothing, INVALID, nullptr, "u")			TYPE("none", Nothing, INVALID, nullptr, "u")

lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	llvm::Constant *Values[] = {
llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper magic.		llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper magic.
llvm::ConstantInt::get(IntTy, 1), // Fatbin version.		llvm::ConstantInt::get(IntTy, 1), // Fatbin version.
makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16), // Data.		makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16), // Data.
llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in fatbin v1.		llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in fatbin v1.
llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable(		llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable(
TheModule, FatbinWrapperTy, true, llvm::GlobalValue::InternalLinkage,		TheModule, FatbinWrapperTy, true, llvm::GlobalValue::InternalLinkage,
llvm::ConstantStruct::get(FatbinWrapperTy, Values),		llvm::ConstantStruct::get(FatbinWrapperTy, Values),
"__cuda_fatbin_wrapper");		"__cuda_fatbin_wrapper");
		// NVIDIA's cuobjdump looks for fatbins in this section.
		FatbinWrapper->setSection(".nvFatBinSegment");

// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);		// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(		llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
RegisterFatbinFunc,		RegisterFatbinFunc,
CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));		CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable(		llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable(
TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,		TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__cuda_gpubin_handle");		llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__cuda_gpubin_handle");
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

lib/Driver/Action.cpp

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	}			}

	void CudaDeviceAction::anchor() {}			void CudaDeviceAction::anchor() {}

	CudaDeviceAction::CudaDeviceAction(Action Input, const char ArchName,			CudaDeviceAction::CudaDeviceAction(Action Input, const char ArchName,
	bool AtTopLevel)			bool AtTopLevel)
	: Action(CudaDeviceClass, Input), GpuArchName(ArchName),			: Action(CudaDeviceClass, Input), GpuArchName(ArchName),
	AtTopLevel(AtTopLevel) {			AtTopLevel(AtTopLevel) {
	assert(IsValidGpuArchName(GpuArchName));			assert(!GpuArchName \|\| IsValidGpuArchName(GpuArchName));
	}			}

	const char *CudaDeviceAction::getComputeArchName() const {			const char *CudaDeviceAction::getComputeArchName() const {
	return GpuArchToComputeName(GpuArchName);			return GpuArchToComputeName(GpuArchName);
	}			}

	bool CudaDeviceAction::IsValidGpuArchName(llvm::StringRef ArchName) {			bool CudaDeviceAction::IsValidGpuArchName(llvm::StringRef ArchName) {
	return GpuArchToComputeName(ArchName.data()) != nullptr;			return GpuArchToComputeName(ArchName.data()) != nullptr;
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

lib/Driver/Driver.cpp

Show First 20 Lines • Show All 943 Lines • ▼ Show 20 Lines	static unsigned PrintActions1(const Compilation &C, Action *A,

os << Action::getClassName(A->getKind()) << ", ";		os << Action::getClassName(A->getKind()) << ", ";
if (InputAction *IA = dyn_cast<InputAction>(A)) {		if (InputAction *IA = dyn_cast<InputAction>(A)) {
os << "\"" << IA->getInputArg().getValue() << "\"";		os << "\"" << IA->getInputArg().getValue() << "\"";
} else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {		} else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
os << '"' << BIA->getArchName() << '"' << ", {"		os << '"' << BIA->getArchName() << '"' << ", {"
<< PrintActions1(C, *BIA->begin(), Ids) << "}";		<< PrintActions1(C, *BIA->begin(), Ids) << "}";
} else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {		} else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
os << '"' << CDA->getGpuArchName() << '"' << ", {"		os << '"'
<< PrintActions1(C, *CDA->begin(), Ids) << "}";		<< (CDA->getGpuArchName() ? CDA->getGpuArchName() : "(multiple archs)")
		<< '"' << ", {" << PrintActions1(C, *CDA->begin(), Ids) << "}";
} else {		} else {
const ActionList *AL;		const ActionList *AL;
if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {		if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
os << "{" << PrintActions1(C, *CHA->begin(), Ids) << "}"		os << "{" << PrintActions1(C, *CHA->begin(), Ids) << "}"
<< ", gpu binaries ";		<< ", gpu binaries ";
AL = &CHA->getDeviceActions();		AL = &CHA->getDeviceActions();
} else		} else
AL = &A->getInputs();		AL = &A->getInputs();
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	static Action *buildCudaActions(Compilation &C, DerivedArgList &Args,
C.getDriver().BuildActions(C, *C.getCudaDeviceToolChain(), Args,		C.getDriver().BuildActions(C, *C.getCudaDeviceToolChain(), Args,
CudaDeviceInputs, CudaDeviceActions);		CudaDeviceInputs, CudaDeviceActions);
assert(GpuArchList.size() == CudaDeviceActions.size() &&		assert(GpuArchList.size() == CudaDeviceActions.size() &&
"Failed to create actions for all devices");		"Failed to create actions for all devices");

// Check whether any of device actions stopped before they could generate PTX.		// Check whether any of device actions stopped before they could generate PTX.
bool PartialCompilation =		bool PartialCompilation =
llvm::any_of(CudaDeviceActions, [](const Action *a) {		llvm::any_of(CudaDeviceActions, [](const Action *a) {
return a->getKind() != Action::BackendJobClass;		return a->getKind() != Action::AssembleJobClass;
});		});

// Figure out what to do with device actions -- pass them as inputs to the		// Figure out what to do with device actions -- pass them as inputs to the
// host action or run each of them independently.		// host action or run each of them independently.
bool DeviceOnlyCompilation = PartialCompilationArg != nullptr;		bool DeviceOnlyCompilation = PartialCompilationArg != nullptr;
if (PartialCompilation \|\| DeviceOnlyCompilation) {		if (PartialCompilation \|\| DeviceOnlyCompilation) {
// In case of partial or device-only compilation results of device actions		// In case of partial or device-only compilation results of device actions
// are not consumed by the host action device actions have to be added to		// are not consumed by the host action device actions have to be added to
Show All 12 Lines	for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
GpuArchList[I],		GpuArchList[I],
/* AtTopLevel */ true));		/* AtTopLevel */ true));
// Kill host action in case of device-only compilation.		// Kill host action in case of device-only compilation.
if (DeviceOnlyCompilation)		if (DeviceOnlyCompilation)
return nullptr;		return nullptr;
return HostAction;		return HostAction;
}		}

// Outputs of device actions during complete CUDA compilation get created		// If we're not a partial or device-only compilation, we compile each arch to
// with AtTopLevel=false and become inputs for the host action.		// ptx and assemble to cubin, then feed the cubin and the ptx into a device
		// "link" action, which uses fatbinary to combine these cubins into one
		// fatbin. The fatbin is then an input to the host compilation.
ActionList DeviceActions;		ActionList DeviceActions;
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
DeviceActions.push_back(		Action* AssembleAction = CudaDeviceActions[I];
C.MakeAction<CudaDeviceAction>(CudaDeviceActions[I], GpuArchList[I],		assert(AssembleAction->getType() == types::TY_Object);
/* AtTopLevel */ false));		assert(AssembleAction->getInputs().size() == 1);

		Action* BackendAction = AssembleAction->getInputs()[0];
		assert(BackendAction->getType() == types::TY_PP_Asm);

		for (const auto& A : {AssembleAction, BackendAction}) {
		DeviceActions.push_back(C.MakeAction<CudaDeviceAction>(
		A, GpuArchList[I], /* AtTopLevel */ false));
		}
		}
		auto FatbinAction = C.MakeAction<CudaDeviceAction>(
		C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN),
		/* GpuArchName = */ nullptr,
		traUnsubmitted Done Reply Inline Actions So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin? Perhaps it warrants its own CudaDeviceLinkAction? tra: So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin? Perhaps…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions It's a special case either way, but there are enough places that look for a CudaDeviceAction that I think this special case is cleaner than having a bunch of if (CudaDeviceAction \|\| CudaDeviceLinkAction)s floating around. jlebar: It's a special case either way, but there are enough places that look for a CudaDeviceAction…
		/* AtTopLevel = */ false);
// Return a new host action that incorporates original host action and all		// Return a new host action that incorporates original host action and all
// device actions.		// device actions.
return C.MakeAction<CudaHostAction>(HostAction, DeviceActions);		return C.MakeAction<CudaHostAction>(std::move(HostAction),
		ActionList({FatbinAction}));
}		}

void Driver::BuildActions(Compilation &C, const ToolChain &TC,		void Driver::BuildActions(Compilation &C, const ToolChain &TC,
DerivedArgList &Args, const InputList &Inputs,		DerivedArgList &Args, const InputList &Inputs,
ActionList &Actions) const {		ActionList &Actions) const {
llvm::PrettyStackTraceString CrashInfo("Building compilation actions");		llvm::PrettyStackTraceString CrashInfo("Building compilation actions");

if (!SuppressMissingInputWarning && Inputs.empty()) {		if (!SuppressMissingInputWarning && Inputs.empty()) {
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	case phases::Backend: {
if (Args.hasArg(options::OPT_emit_llvm)) {		if (Args.hasArg(options::OPT_emit_llvm)) {
types::ID Output =		types::ID Output =
Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;		Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;
return C.MakeAction<BackendJobAction>(Input, Output);		return C.MakeAction<BackendJobAction>(Input, Output);
}		}
return C.MakeAction<BackendJobAction>(Input, types::TY_PP_Asm);		return C.MakeAction<BackendJobAction>(Input, types::TY_PP_Asm);
}		}
case phases::Assemble:		case phases::Assemble:
return C.MakeAction<AssembleJobAction>(Input, types::TY_Object);		return C.MakeAction<AssembleJobAction>(std::move(Input), types::TY_Object);
}		}
		traUnsubmitted Done Reply Inline Actions cubin is an ELF object file. Do we really need a new type here or can we get by with TY_Object? tra: cubin is an ELF object file. Do we really need a new type here or can we get by with…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Got rid of the extra type. Note that this means that our intermediate objs will be named foo.o instead of foo.cubin, which isn't optimal, but I agree that it's simpler without the extra type. jlebar: Got rid of the extra type. Note that this means that our intermediate objs will be named foo.o…

llvm_unreachable("invalid phase in ConstructPhaseAction");		llvm_unreachable("invalid phase in ConstructPhaseAction");
}		}

void Driver::BuildJobs(Compilation &C) const {		void Driver::BuildJobs(Compilation &C) const {
llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");		llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");

Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);		Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	if (const BindArchAction *BAA = dyn_cast<BindArchAction>(A)) {

return BuildJobsForAction(C, *BAA->begin(), TC, ArchName, AtTopLevel,		return BuildJobsForAction(C, *BAA->begin(), TC, ArchName, AtTopLevel,
MultipleArchs, LinkingOutput, CachedResults);		MultipleArchs, LinkingOutput, CachedResults);
}		}

if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {		if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
// Initial processing of CudaDeviceAction carries host params.		// Initial processing of CudaDeviceAction carries host params.
// Call BuildJobsForAction() again, now with correct device parameters.		// Call BuildJobsForAction() again, now with correct device parameters.
assert(CDA->getGpuArchName() && "No GPU name in device action.");		InputInfo II = BuildJobsForAction(
return BuildJobsForAction(C, *CDA->begin(), C.getCudaDeviceToolChain(),		C, *CDA->begin(), C.getCudaDeviceToolChain(), CDA->getGpuArchName(),
CDA->getGpuArchName(), CDA->isAtTopLevel(),		CDA->isAtTopLevel(), /MultipleArchs/ true, LinkingOutput,
/MultipleArchs/ true, LinkingOutput,
CachedResults);		CachedResults);
		// Currently II's Action is *CDA->begin(). Set it to CDA instead, so that
		// one can retrieve II's GPU arch.
		II.setAction(A);
		echristoUnsubmitted Done Reply Inline Actions Weren't you just adding it as part of the InputInfo constructor in 16078? echristo: Weren't you just adding it as part of the InputInfo constructor in 16078?
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Yes, but we're doing something slightly different here. Suppose you have a CudaDeviceAction --> BackendAction. What do you want the InputInfo's Action to be? Without this change, it's the BackendAction. But we really want it to be the CudaDeviceAction. Thus this line. I updated the comment in an attempt to clarify. jlebar: Yes, but we're doing something slightly different here. Suppose you have a CudaDeviceAction…
		return II;
}		}

const ActionList *Inputs = &A->getInputs();		const ActionList *Inputs = &A->getInputs();

const JobAction *JA = cast<JobAction>(A);		const JobAction *JA = cast<JobAction>(A);
const CudaHostAction *CollapsedCHA = nullptr;		const CudaHostAction *CollapsedCHA = nullptr;
const Tool *T =		const Tool *T =
selectToolForJob(C, isSaveTempsEnabled(), TC, JA, Inputs, CollapsedCHA);		selectToolForJob(C, isSaveTempsEnabled(), TC, JA, Inputs, CollapsedCHA);
▲ Show 20 Lines • Show All 541 Lines • Show Last 20 Lines

lib/Driver/ToolChains.h

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	protected:
GCCInstallationDetector GCCInstallation;		GCCInstallationDetector GCCInstallation;

// \brief A class to find a viable CUDA installation		// \brief A class to find a viable CUDA installation

class CudaInstallationDetector {		class CudaInstallationDetector {
bool IsValid;		bool IsValid;
const Driver &D;		const Driver &D;
std::string CudaInstallPath;		std::string CudaInstallPath;
		std::string CudaBinPath;
std::string CudaLibPath;		std::string CudaLibPath;
std::string CudaLibDevicePath;		std::string CudaLibDevicePath;
std::string CudaIncludePath;		std::string CudaIncludePath;
llvm::StringMap<std::string> CudaLibDeviceMap;		llvm::StringMap<std::string> CudaLibDeviceMap;

public:		public:
CudaInstallationDetector(const Driver &D) : IsValid(false), D(D) {}		CudaInstallationDetector(const Driver &D) : IsValid(false), D(D) {}
void init(const llvm::Triple &TargetTriple, const llvm::opt::ArgList &Args);		void init(const llvm::Triple &TargetTriple, const llvm::opt::ArgList &Args);

/// \brief Check whether we detected a valid Cuda install.		/// \brief Check whether we detected a valid Cuda install.
bool isValid() const { return IsValid; }		bool isValid() const { return IsValid; }
/// \brief Print information about the detected CUDA installation.		/// \brief Print information about the detected CUDA installation.
void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;

/// \brief Get the detected Cuda installation path.		/// \brief Get the detected Cuda installation path.
StringRef getInstallPath() const { return CudaInstallPath; }		StringRef getInstallPath() const { return CudaInstallPath; }
		/// \brief Get the detected path to Cuda's bin directory.
		StringRef getBinPath() const { return CudaBinPath; }
/// \brief Get the detected Cuda Include path.		/// \brief Get the detected Cuda Include path.
StringRef getIncludePath() const { return CudaIncludePath; }		StringRef getIncludePath() const { return CudaIncludePath; }
/// \brief Get the detected Cuda library path.		/// \brief Get the detected Cuda library path.
StringRef getLibPath() const { return CudaLibPath; }		StringRef getLibPath() const { return CudaLibPath; }
/// \brief Get the detected Cuda device library path.		/// \brief Get the detected Cuda device library path.
StringRef getLibDevicePath() const { return CudaLibDevicePath; }		StringRef getLibDevicePath() const { return CudaLibDevicePath; }
/// \brief Get libdevice file for given architecture		/// \brief Get libdevice file for given architecture
std::string getLibDeviceFile(StringRef Gpu) const {		std::string getLibDeviceFile(StringRef Gpu) const {
▲ Show 20 Lines • Show All 621 Lines • ▼ Show 20 Lines	public:
CudaToolChain(const Driver &D, const llvm::Triple &Triple,		CudaToolChain(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args,		TranslateArgs(const llvm::opt::DerivedArgList &Args,
const char *BoundArch) const override;		const char *BoundArch) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;

		// Never try to use the integrated assembler with CUDA; always fork out to
		// ptxas.
		bool useIntegratedAs() const override { return false; }

		protected:
		Tool *buildAssembler() const override; // ptxas
		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)
};		};

class LLVM_LIBRARY_VISIBILITY MipsLLVMToolChain : public Linux {		class LLVM_LIBRARY_VISIBILITY MipsLLVMToolChain : public Linux {
protected:		protected:
Tool *buildLinker() const override;		Tool *buildLinker() const override;

public:		public:
MipsLLVMToolChain(const Driver &D, const llvm::Triple &Triple,		MipsLLVMToolChain(const Driver &D, const llvm::Triple &Triple,
▲ Show 20 Lines • Show All 317 Lines • Show Last 20 Lines

lib/Driver/ToolChains.cpp

Show First 20 Lines • Show All 1,646 Lines • ▼ Show 20 Lines	else {
CudaPathCandidates.push_back(D.SysRoot + "/usr/local/cuda-7.0");		CudaPathCandidates.push_back(D.SysRoot + "/usr/local/cuda-7.0");
}		}

for (const auto &CudaPath : CudaPathCandidates) {		for (const auto &CudaPath : CudaPathCandidates) {
if (CudaPath.empty() \|\| !D.getVFS().exists(CudaPath))		if (CudaPath.empty() \|\| !D.getVFS().exists(CudaPath))
continue;		continue;

CudaInstallPath = CudaPath;		CudaInstallPath = CudaPath;
		CudaBinPath = CudaPath + "/bin";
CudaIncludePath = CudaInstallPath + "/include";		CudaIncludePath = CudaInstallPath + "/include";
CudaLibDevicePath = CudaInstallPath + "/nvvm/libdevice";		CudaLibDevicePath = CudaInstallPath + "/nvvm/libdevice";
CudaLibPath =		CudaLibPath =
CudaInstallPath + (TargetTriple.isArch64Bit() ? "/lib64" : "/lib");		CudaInstallPath + (TargetTriple.isArch64Bit() ? "/lib64" : "/lib");

if (!(D.getVFS().exists(CudaIncludePath) &&		if (!(D.getVFS().exists(CudaIncludePath) &&
D.getVFS().exists(CudaLibPath) &&		D.getVFS().exists(CudaBinPath) && D.getVFS().exists(CudaLibPath) &&
D.getVFS().exists(CudaLibDevicePath)))		D.getVFS().exists(CudaLibDevicePath)))
continue;		continue;

std::error_code EC;		std::error_code EC;
for (llvm::sys::fs::directory_iterator LI(CudaLibDevicePath, EC), LE;		for (llvm::sys::fs::directory_iterator LI(CudaLibDevicePath, EC), LE;
!EC && LI != LE; LI = LI.increment(EC)) {		!EC && LI != LE; LI = LI.increment(EC)) {
StringRef FilePath = LI->path();		StringRef FilePath = LI->path();
StringRef FileName = llvm::sys::path::filename(FilePath);		StringRef FileName = llvm::sys::path::filename(FilePath);
▲ Show 20 Lines • Show All 2,507 Lines • ▼ Show 20 Lines
Tool *DragonFly::buildAssembler() const {		Tool *DragonFly::buildAssembler() const {
return new tools::dragonfly::Assembler(*this);		return new tools::dragonfly::Assembler(*this);
}		}

Tool *DragonFly::buildLinker() const {		Tool *DragonFly::buildLinker() const {
return new tools::dragonfly::Linker(*this);		return new tools::dragonfly::Linker(*this);
}		}

/// Stub for CUDA toolchain. At the moment we don't have assembler or		/// CUDA toolchain. Our assembler is ptxas, and our "linker" is fatbinary,
/// linker and need toolchain mainly to propagate device-side options		/// which isn't properly a linker but nonetheless performs the step of stitching
/// to CC1.		/// together object files from the assembler into a single blob.

CudaToolChain::CudaToolChain(const Driver &D, const llvm::Triple &Triple,		CudaToolChain::CudaToolChain(const Driver &D, const llvm::Triple &Triple,
const ArgList &Args)		const ArgList &Args)
: Linux(D, Triple, Args) {}		: Linux(D, Triple, Args) {
		if (CudaInstallation.isValid())
		traUnsubmitted Not Done Reply Inline Actions unneeded {} here and in few more places throughout the patch. tra: unneeded {} here and in few more places throughout the patch.
		getProgramPaths().push_back(CudaInstallation.getBinPath());
		}

void		void
CudaToolChain::addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		CudaToolChain::addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const {		llvm::opt::ArgStringList &CC1Args) const {
Linux::addClangTargetOptions(DriverArgs, CC1Args);		Linux::addClangTargetOptions(DriverArgs, CC1Args);
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasArg(options::OPT_nocudalib))		if (DriverArgs.hasArg(options::OPT_nocudalib))
Show All 17 Lines
CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,		CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
const char *BoundArch) const {		const char *BoundArch) const {
DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());		DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
const OptTable &Opts = getDriver().getOpts();		const OptTable &Opts = getDriver().getOpts();

for (Arg *A : Args) {		for (Arg *A : Args) {
if (A->getOption().matches(options::OPT_Xarch__)) {		if (A->getOption().matches(options::OPT_Xarch__)) {
// Skip this argument unless the architecture matches BoundArch		// Skip this argument unless the architecture matches BoundArch
if (A->getValue(0) != StringRef(BoundArch))		if (!BoundArch \|\| A->getValue(0) != StringRef(BoundArch))
		traUnsubmitted Done Reply Inline Actions You may as well move it out of the loop and return early if BoundArch is nullptr. tra: You may as well move it out of the loop and return early if BoundArch is nullptr.
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Tried this and discovered it's actually subtly different in a way that, thankfully, affects one of the tests I added. if (!BoundArch) continue applies only if A->getOption().matches(options::OPT_Xarch__). So we can't hoist it into an early return. jlebar: Tried this and discovered it's actually subtly different in a way that, thankfully, affects one…
continue;		continue;

unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(1));		unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
unsigned Prev = Index;		unsigned Prev = Index;
std::unique_ptr<Arg> XarchArg(Opts.ParseOneArg(Args, Index));		std::unique_ptr<Arg> XarchArg(Opts.ParseOneArg(Args, Index));

// If the argument parsing failed or more than one argument was		// If the argument parsing failed or more than one argument was
// consumed, the -Xarch_ argument's parameter tried to consume		// consumed, the -Xarch_ argument's parameter tried to consume
Show All 14 Lines	if (A->getOption().matches(options::OPT_Xarch__)) {
}		}
XarchArg->setBaseArg(A);		XarchArg->setBaseArg(A);
A = XarchArg.release();		A = XarchArg.release();
DAL->AddSynthesizedArg(A);		DAL->AddSynthesizedArg(A);
}		}
DAL->append(A);		DAL->append(A);
}		}

		if (BoundArch)
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
		echristoUnsubmitted Done Reply Inline Actions Nit: No braces around single lines. echristo: Nit: No braces around single lines.
return DAL;		return DAL;
}		}

		Tool *CudaToolChain::buildAssembler() const {
		return new tools::NVPTX::Assembler(*this);
		}

		Tool *CudaToolChain::buildLinker() const {
		return new tools::NVPTX::Linker(*this);
		}

/// XCore tool chain		/// XCore tool chain
XCoreToolChain::XCoreToolChain(const Driver &D, const llvm::Triple &Triple,		XCoreToolChain::XCoreToolChain(const Driver &D, const llvm::Triple &Triple,
const ArgList &Args)		const ArgList &Args)
: ToolChain(D, Triple, Args) {		: ToolChain(D, Triple, Args) {
// ProgramPaths are found via 'PATH' environment variable.		// ProgramPaths are found via 'PATH' environment variable.
}		}

Tool *XCoreToolChain::buildAssembler() const {		Tool *XCoreToolChain::buildAssembler() const {
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

lib/Driver/Tools.h

Show First 20 Lines • Show All 897 Lines • ▼ Show 20 Lines	public:
void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};
} // end namespace PS4cpu		} // end namespace PS4cpu

		namespace NVPTX {

		// Run ptxas, the NVPTX assembler.
		class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {
		public:
		Assembler(const ToolChain &TC)
		: Tool("NVPTX::Assembler", "ptxas", TC, RF_Full, llvm::sys::WEM_UTF8,
		"--options-file") {}

		bool hasIntegratedCPP() const override { return false; }

		void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output, const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

		// Runs fatbinary, which combines GPU object files ("cubin" files) and/or PTX
		traUnsubmitted Done Reply Inline Actions Please add more details about what fatbin does. ".. which combines GPU object files and, optionally, PTX assembly into a single output file." tra: Please add more details about what fatbin does. ".. which combines GPU object files and…
		// assembly into a single output file.
		class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
		public:
		Linker(const ToolChain &TC)
		: Tool("NVPTX::Linker", "fatbinary", TC, RF_Full, llvm::sys::WEM_UTF8,
		"--options-file") {}

		bool hasIntegratedCPP() const override { return false; }

		void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output, const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

		} // end namespace NVPTX

} // end namespace tools		} // end namespace tools
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLS_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLS_H

lib/Driver/Tools.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,594 Lines • ▼ Show 20 Lines	void PS4cpu::Link::ConstructJob(Compilation &C, const JobAction &JA,
else		else
PS4Linker = !Args.hasArg(options::OPT_shared);		PS4Linker = !Args.hasArg(options::OPT_shared);

if (PS4Linker)		if (PS4Linker)
ConstructPS4LinkJob(*this, C, JA, Output, Inputs, Args, LinkingOutput);		ConstructPS4LinkJob(*this, C, JA, Output, Inputs, Args, LinkingOutput);
else		else
ConstructGoldLinkJob(*this, C, JA, Output, Inputs, Args, LinkingOutput);		ConstructGoldLinkJob(*this, C, JA, Output, Inputs, Args, LinkingOutput);
}		}

		void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const ArgList &Args,
		const char *LinkingOutput) const {
		const auto &TC =
		static_cast<const toolchains::CudaToolChain &>(getToolChain());
		assert(TC.getArch() == llvm::Triple::nvptx \|\|
		TC.getArch() == llvm::Triple::nvptx64);

		std::vector<std::string> gpu_archs =
		Args.getAllArgValues(options::OPT_march_EQ);
		assert(gpu_archs.size() == 1 && "Exactly one GPU Arch required for ptxas.");
		const std::string& gpu_arch = gpu_archs[0];


		ArgStringList CmdArgs;
		CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32");

		// Clang's default optimization level is -O0, but ptxas's default is -O3.
		CmdArgs.push_back(Args.MakeArgString(
		llvm::Twine("-O") +
		traUnsubmitted Done Reply Inline Actions CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32"); or, even ArgStringList CmdArgs = {TC.getTriple().isArch64Bit() ? "-m64" : "-m32"}; Same in Linker::ConstructJob below. tra: CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32"); or, even ArgStringList…
		Args.getLastArgValue(options::OPT_O_Group, "0").data()));

		// Don't bother passing -g to ptxas: It's enabled by default at -O0, and
		// not supported at other optimization levels.

		CmdArgs.push_back("--gpu-name");
		CmdArgs.push_back(Args.MakeArgString(gpu_arch));
		CmdArgs.push_back("--output-file");
		CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
		for (const auto& II : Inputs)
		CmdArgs.push_back(Args.MakeArgString(II.getFilename()));

		for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
		CmdArgs.push_back(Args.MakeArgString(A));

		const char *Exec = Args.MakeArgString(TC.GetProgramPath("ptxas"));
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));
		}

		// All inputs to this linker must be from CudaDeviceActions, as we need to look
		// at the Inputs' Actions in order to figure out which GPU architecture they
		// correspond to.
		void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const ArgList &Args,
		const char *LinkingOutput) const {
		const auto &TC =
		static_cast<const toolchains::CudaToolChain &>(getToolChain());
		assert(TC.getArch() == llvm::Triple::nvptx \|\|
		TC.getArch() == llvm::Triple::nvptx64);

		ArgStringList CmdArgs;
		CmdArgs.push_back("--cuda");
		CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-64" : "-32");
		CmdArgs.push_back(Args.MakeArgString("--create"));
		CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));

		for (const auto& II : Inputs) {
		auto* A = cast<const CudaDeviceAction>(II.getAction());
		// We need to pass an Arch of the form "sm_XX" for cubin files and
		// "compute_XX" for ptx.
		const char *Arch = (II.getType() == types::TY_PP_Asm)
		? A->getComputeArchName()
		: A->getGpuArchName();
		CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
		Arch + ",file=" + II.getFilename()));
		}

		for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_fatbinary))
		CmdArgs.push_back(Args.MakeArgString(A));

		const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));
		traUnsubmitted Done Reply Inline Actions First line does not parse. In general compute_XX does not necessarily match sm_XX. ptxas options says that Allowed values for this option: compute_20, compute_30, compute_35, compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52 Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map. tra: First line does not parse. In general compute_XX does not necessarily match sm_XX. [[ http…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Oh goodness, how awful. Thank you for catching that. Fixed. jlebar: Oh goodness, how awful. Thank you for catching that. Fixed.
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));
		}

lib/Driver/Types.cpp

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	if (Id != TY_Object) {

if (onlyPrecompileType(Id)) {		if (onlyPrecompileType(Id)) {
P.push_back(phases::Precompile);		P.push_back(phases::Precompile);
} else {		} else {
if (!onlyAssembleType(Id)) {		if (!onlyAssembleType(Id)) {
P.push_back(phases::Compile);		P.push_back(phases::Compile);
P.push_back(phases::Backend);		P.push_back(phases::Backend);
}		}
if (Id != TY_CUDA_DEVICE)
P.push_back(phases::Assemble);		P.push_back(phases::Assemble);
}		}
}		}

if (!onlyPrecompileType(Id) && Id != TY_CUDA_DEVICE) {		if (!onlyPrecompileType(Id) && Id != TY_CUDA_DEVICE) {
P.push_back(phases::Link);		P.push_back(phases::Link);
}		}
assert(0 < P.size() && "Not enough phases in list");		assert(0 < P.size() && "Not enough phases in list");
assert(P.size() <= phases::MaxNumberOfPhases && "Too many phases in list");		assert(P.size() <= phases::MaxNumberOfPhases && "Too many phases in list");
Show All 18 Lines

test/Driver/Inputs/CUDA/usr/local/cuda/bin/.keep

This file was added.

This is an empty file.

test/Driver/cuda-external-tools.cu

This file was added.

				// Tests that ptxas and fatbinary are correctly during CUDA compilation.
				//
				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// Regular compile with -O2.
				// RUN: %clang -### -target x86_64-linux-gnu -O2 -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT2 %s

				// Regular compile without -O. This should result in us passing -O0 to ptxas.
				// RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s

				// Regular compile targeting sm_35.
				// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ARCH64 -check-prefix SM35 %s

				// 32-bit compile.
				// RUN: %clang -### -target x86_32-linux-gnu -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ARCH32 -check-prefix SM20 %s

				// Compile with -fintegrated-as. This should still cause us to invoke ptxas.
				// RUN: %clang -### -target x86_64-linux-gnu -fintegrated-as -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ARCH64 -check-prefix SM20 -check-prefix OPT0 %s

				// Check -Xcuda-ptxas and -Xcuda-fatbinary
				// RUN: %clang -### -target x86_64-linux-gnu -c -Xcuda-ptxas -foo1 \
				// RUN: -Xcuda-fatbinary -bar1 -Xcuda-ptxas -foo2 -Xcuda-fatbinary -bar2 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix SM20 -check-prefix PTXAS-EXTRA \
				// RUN: -check-prefix FATBINARY-EXTRA %s

				// Match clang job that produces PTX assembly.
				// CHECK: "-cc1" "-triple" "nvptx64-nvidia-cuda"
				// SM20: "-target-cpu" "sm_20"
				// SM35: "-target-cpu" "sm_35"
				// SM20: "-o" "[[PTXFILE:[^"]*]]"
				// SM35: "-o" "[[PTXFILE:[^"]*]]"

				// Match the call to ptxas (which assembles PTX to SASS).
				// CHECK:ptxas
				// ARCH64: "-m64"
				// ARCH32: "-m32"
				// OPT0: "-O0"
				// OPT2: "-O2"
				// SM20: "--gpu-name" "sm_20"
				// SM35: "--gpu-name" "sm_35"
				// SM20: "--output-file" "[[CUBINFILE:[^"]*]]"
				// SM35: "--output-file" "[[CUBINFILE:[^"]*]]"
				// PTXAS-EXTRA: "-foo1"
				// PTXAS-EXTRA-SAME: "-foo2"
				// CHECK-SAME: "[[PTXFILE]]"

				// Match the call to fatbinary (which combines all our PTX and SASS into one
				// blob).
				// CHECK:fatbinary
				// CHECK-DAG: "--cuda"
				// ARCH64-DAG: "-64"
				// ARCH32-DAG: "-32"
				// CHECK-DAG: "--create" "[[FATBINARY:[^"]*]]"
				// SM20-DAG: "--image=profile=compute_20,file=[[PTXFILE]]"
				// SM35-DAG: "--image=profile=compute_35,file=[[PTXFILE]]"
				// SM20-DAG: "--image=profile=sm_20,file=[[CUBINFILE]]"
				// SM35-DAG: "--image=profile=sm_35,file=[[CUBINFILE]]"
				// FATBINARY-EXTRA: "-bar1"
				// FATBINARY-EXTRA-SAME: "-bar2"

				// Match the clang job for host compilation.
				// CHECK: "-cc1" "-triple" "x86_64--linux-gnu"
				// CHECK-SAME: "-fcuda-include-gpubinary" "[[FATBINARY]]"

test/Driver/cuda-options.cu

	Show All 33 Lines
	// RUN: -check-prefix NOHOST -check-prefix NOLINK %s			// RUN: -check-prefix NOHOST -check-prefix NOLINK %s

	// Same test as above, but with preceeding --cuda-host-only to make sure only			// Same test as above, but with preceeding --cuda-host-only to make sure only
	// the last option has an effect.			// the last option has an effect.
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-host-only --cuda-device-only %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-host-only --cuda-device-only %s 2>&1 \
	// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \			// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \
	// RUN: -check-prefix NOHOST -check-prefix NOLINK %s			// RUN: -check-prefix NOHOST -check-prefix NOLINK %s

	// Verify that with -S we compile host and device sides to assembly and
	// incorporate device code into the host side.
	// RUN: %clang -### -target x86_64-linux-gnu -S -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \
	// RUN: -check-prefix HOST -check-prefix INCLUDES-DEVICE \
	// RUN: -check-prefix NOLINK %s

	// Verify that --cuda-gpu-arch option passes the correct GPU archtecture to			// Verify that --cuda-gpu-arch option passes the correct GPU archtecture to
	// device compilation.			// device compilation.
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_35 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \			// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \
	// RUN: -check-prefix DEVICE-SM35 -check-prefix HOST \			// RUN: -check-prefix DEVICE-SM35 -check-prefix HOST \
	// RUN: -check-prefix INCLUDES-DEVICE -check-prefix NOLINK %s			// RUN: -check-prefix INCLUDES-DEVICE -check-prefix NOLINK %s

	// Verify that there is one device-side compilation per --cuda-gpu-arch args			// Verify that there is one device-side compilation per --cuda-gpu-arch args
	// and that all results are included on the host side.			// and that all results are included on the host side.
	// RUN: %clang -### -target x86_64-linux-gnu \			// RUN: %clang -### -target x86_64-linux-gnu \
	// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 -c %s 2>&1 \			// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \			// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-NOSAVE \
	// RUN: -check-prefix DEVICE2 -check-prefix DEVICE-SM35 \			// RUN: -check-prefix DEVICE2 -check-prefix DEVICE-SM35 \
	// RUN: -check-prefix DEVICE2-SM30 -check-prefix HOST \			// RUN: -check-prefix DEVICE2-SM30 -check-prefix HOST \
	// RUN: -check-prefix HOST-NOSAVE -check-prefix INCLUDES-DEVICE \			// RUN: -check-prefix HOST-NOSAVE -check-prefix INCLUDES-DEVICE \
	// RUN: -check-prefix INCLUDES-DEVICE2 -check-prefix NOLINK %s			// RUN: -check-prefix NOLINK %s

	// Verify that device-side results are passed to the correct tool when			// Verify that device-side results are passed to the correct tool when
	// -save-temps is used.			// -save-temps is used.
	// RUN: %clang -### -target x86_64-linux-gnu -save-temps -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -save-temps -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-SAVE \			// RUN: \| FileCheck -check-prefix DEVICE -check-prefix DEVICE-SAVE \
	// RUN: -check-prefix HOST -check-prefix HOST-SAVE -check-prefix NOLINK %s			// RUN: -check-prefix HOST -check-prefix HOST-SAVE -check-prefix NOLINK %s

	// Verify that device-side results are passed to the correct tool when			// Verify that device-side results are passed to the correct tool when
	Show All 14 Lines
	// DEVICE-SAVE-SAME: "-fcuda-is-device"			// DEVICE-SAVE-SAME: "-fcuda-is-device"
	// DEVICE-SAVE-SAME: "-x" "cuda-cpp-output"			// DEVICE-SAVE-SAME: "-x" "cuda-cpp-output"

	// Match the job that produces PTX assembly.			// Match the job that produces PTX assembly.
	// DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// DEVICE-NOSAVE-SAME: "-aux-triple" "x86_64--linux-gnu"			// DEVICE-NOSAVE-SAME: "-aux-triple" "x86_64--linux-gnu"
	// DEVICE-SAME: "-fcuda-is-device"			// DEVICE-SAME: "-fcuda-is-device"
	// DEVICE-SM35-SAME: "-target-cpu" "sm_35"			// DEVICE-SM35-SAME: "-target-cpu" "sm_35"
	// DEVICE-SAME: "-o" "[[GPUBINARY1:[^"]*]]"			// DEVICE-SAME: "-o" "[[PTXFILE:[^"]*]]"
	// DEVICE-NOSAVE-SAME: "-x" "cuda"			// DEVICE-NOSAVE-SAME: "-x" "cuda"
	// DEVICE-SAVE-SAME: "-x" "ir"			// DEVICE-SAVE-SAME: "-x" "ir"

				// Match the call to ptxas (which assembles PTX to SASS).
				// DEVICE:ptxas
				// DEVICE-SM35-DAG: "--gpu-name" "sm_35"
				// DEVICE-DAG: "--output-file" "[[CUBINFILE:[^"]*]]"
				// DEVICE-DAG: "[[PTXFILE]]"

	// Match another device-side compilation.			// Match another device-side compilation.
	// DEVICE2: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// DEVICE2: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// DEVICE2-SAME: "-aux-triple" "x86_64--linux-gnu"			// DEVICE2-SAME: "-aux-triple" "x86_64--linux-gnu"
	// DEVICE2-SAME: "-fcuda-is-device"			// DEVICE2-SAME: "-fcuda-is-device"
	// DEVICE2-SM30-SAME: "-target-cpu" "sm_30"			// DEVICE2-SM30-SAME: "-target-cpu" "sm_30"
	// DEVICE2-SAME: "-o" "[[GPUBINARY2:[^"]*]]"			// DEVICE2-SAME: "-o" "[[GPUBINARY2:[^"]*]]"
	// DEVICE2-SAME: "-x" "cuda"			// DEVICE2-SAME: "-x" "cuda"

	// Match no device-side compilation.			// Match no device-side compilation.
	// NODEVICE-NOT: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// NODEVICE-NOT: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// NODEVICE-SAME-NOT: "-fcuda-is-device"			// NODEVICE-SAME-NOT: "-fcuda-is-device"

				// INCLUDES-DEVICE:fatbinary
				// INCLUDES-DEVICE-DAG: "--create" "[[FATBINARY:[^"]*]]"
				// INCLUDES-DEVICE-DAG: "--image=profile=sm_{{[0-9]+}},file=[[CUBINFILE]]"
				// INCLUDES-DEVICE-DAG: "--image=profile=compute_{{[0-9]+}},file=[[PTXFILE]]"

	// Match host-side preprocessor job with -save-temps.			// Match host-side preprocessor job with -save-temps.
	// HOST-SAVE: "-cc1" "-triple" "x86_64--linux-gnu"			// HOST-SAVE: "-cc1" "-triple" "x86_64--linux-gnu"
	// HOST-SAVE-SAME: "-aux-triple" "nvptx64-nvidia-cuda"			// HOST-SAVE-SAME: "-aux-triple" "nvptx64-nvidia-cuda"
	// HOST-SAVE-SAME-NOT: "-fcuda-is-device"			// HOST-SAVE-SAME-NOT: "-fcuda-is-device"
	// HOST-SAVE-SAME: "-x" "cuda"			// HOST-SAVE-SAME: "-x" "cuda"

	// Match host-side compilation.			// Match host-side compilation.
	// HOST: "-cc1" "-triple" "x86_64--linux-gnu"			// HOST: "-cc1" "-triple" "x86_64--linux-gnu"
	// HOST-SAME: "-aux-triple" "nvptx64-nvidia-cuda"			// HOST-SAME: "-aux-triple" "nvptx64-nvidia-cuda"
	// HOST-SAME-NOT: "-fcuda-is-device"			// HOST-SAME-NOT: "-fcuda-is-device"
	// HOST-SAME: "-o" "[[HOSTOUTPUT:[^"]*]]"			// HOST-SAME: "-o" "[[HOSTOUTPUT:[^"]*]]"
	// HOST-NOSAVE-SAME: "-x" "cuda"			// HOST-NOSAVE-SAME: "-x" "cuda"
	// HOST-SAVE-SAME: "-x" "cuda-cpp-output"			// HOST-SAVE-SAME: "-x" "cuda-cpp-output"
	// INCLUDES-DEVICE-SAME: "-fcuda-include-gpubinary" "[[GPUBINARY1]]"			// INCLUDES-DEVICE-SAME: "-fcuda-include-gpubinary" "[[FATBINARY]]"
	// INCLUDES-DEVICE2-SAME: "-fcuda-include-gpubinary" "[[GPUBINARY2]]"

	// Match external assembler that uses compilation output.			// Match external assembler that uses compilation output.
	// HOST-AS: "-o" "{{.*}}.o" "[[HOSTOUTPUT]]"			// HOST-AS: "-o" "{{.*}}.o" "[[HOSTOUTPUT]]"

	// Match no GPU code inclusion.			// Match no GPU code inclusion.
	// NOINCLUDES-DEVICE-NOT: "-fcuda-include-gpubinary"			// NOINCLUDES-DEVICE-NOT: "-fcuda-include-gpubinary"

	// Match no host compilation.			// Match no host compilation.
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Invoke ptxas and fatbinary during compilation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 44584

include/clang/Driver/Action.h

include/clang/Driver/Options.td

include/clang/Driver/ToolChain.h

include/clang/Driver/Types.def

lib/CodeGen/CGCUDANV.cpp

lib/Driver/Action.cpp

lib/Driver/Driver.cpp

lib/Driver/ToolChains.h

lib/Driver/ToolChains.cpp

lib/Driver/Tools.h

lib/Driver/Tools.cpp

lib/Driver/Types.cpp

test/Driver/Inputs/CUDA/usr/local/cuda/bin/.keep

test/Driver/cuda-external-tools.cu

test/Driver/cuda-options.cu

[CUDA] Invoke ptxas and fatbinary during compilation.
ClosedPublic