This is an archive of the discontinued LLVM Phabricator instance.

End-to-end CUDA compilation.
AbandonedPublic

Authored by tra on Mar 19 2015, 2:28 PM.

Download Raw Diff

Details

Reviewers

eliben
echristo

Summary

The changes implement end-to-end CUDA compilation pipeline (i.e single clang invocation produces usable host object file which incorporates GPU code) in the driver and necessary runtime to initialize GPU code.

Launch device-side compilation(s):
- Added '--cuda-gpu-arch=<sm_XX>' option. ARCH defaults to sm_20.
- For each GPU architecture launch cc1 with -fcuda-is-device -target-cpu <GPU>
- internally each device-side compilation action is wrapped in CudaDeviceAction(GPU) which selects appropriate toolchain based on GPU and then proceeds construction compilation pipeline.
- added --cuda-host-only and --cuda-device-only options to skip host/device compilation parts.

Incorporate GPU code generated by device-side into the host object file:
- added "-fcuda-include-gpubinary <FILE>" option to specify file with GPU code to incorporate.
- internally host-side compilation action is wrapped in CudaHostAction(input.cu, [list of files produced by device-side compilation]). When driver builds jobs for CudaHostAction, host compilation jobs are constructed normally. At the end each device-side output is passed to the host-side compilation by adding "-fcuda-include-gpubinary <device-side-output.s>" options.
- CGCUDARuntime class was extended to provide API for per-module constructor/destructor creation.
- CGNVCUDARuntime
  
  Implemented ModuleCtorFunction and ModuleDtorFunction to generate initialization code required for cudart-style kernel launches to work.
- ModuleCtorFunction():
  - creates .cuda_register_functions(fatbin_handle) function which calls __cudaRegisterFunction(...) for each kernel emitted with EmitDeviceStub().
  - creates and returns .cuda_module_ctor() function: For each -fcuda-include-gpubinary:
    - creates a constant string with contents of the file specified.
    - creates an initialized __fatBinC_Wrapper_t struct which points to the string.
    - generates call to __cudaRegisterFatBinary(&wrapper_struct) and stores returned handle in a variable.
    - generates a call to .cuda_register_functions(handle)
      
      NOTE: Even though we're calling __cudaRegisterFatBinary() which would imply that it expects GPU code to be encapsulated in nvidia's proprietary 'FatBinary' format, we're actually passing GPU code as a NUL-terminated string with PTX assembly in it. Alas fatbin format is not documented. Fortunately the low-level driver API for loading GPU code accepts cubin/fatbin/NUL-terminated string formats and cudart seems to pass the string to the driver, so we can skip fatbin altogether.
- ModuleDtorFunction(): creates and returns .cuda_module_dtor() function which generates a call to __cudaUnregisterFatBinary(saved_handle) for each GPU code blob initialized in ModuleCtorFunction().
- CodeGenModule.cpp During host-side CUDA compilation calls CUDARuntime->ModuleCtorFunction()/ModuleDtorFunction() and adds returned value to a global constructor/destructor list.

Added test case to verify CUDA pipeline construction in the driver.

Diff Detail

Event Timeline

tra updated this revision to Diff 22301.Mar 19 2015, 2:28 PM

tra retitled this revision from to Preliminary driver changes to build and stitch together host and device-side CUDA compilation pipelines..

tra updated this object.

tra edited the test plan for this revision. (Show Details)

tra added reviewers: eliben, echristo.

tra added a subscriber: Unknown Object (MLST).

Herald added a subscriber: klimek. · View Herald TranscriptMar 19 2015, 2:28 PM

Were we going to abandon this one in favor of the write up you're doing or
is this something else?

-eric

yaron.keren added a subscriber: yaron.keren.Mar 23 2015, 11:47 AM

• Tzafrir added a subscriber: • Tzafrir.Mar 23 2015, 11:25 PM

The changes no longer depend on external ptxwrap tool.
CUDA runtime support in CGCUDANV.cpp now generates per-module constructor/destructor to load and initialize GPU code.

tra retitled this revision from Preliminary driver changes to build and stitch together host and device-side CUDA compilation pipelines. to End-to-end CUDA compilation..Apr 2 2015, 2:05 PM

tra updated this object.

Updated description -- added detailed description for the changes.

Thanks for the updated description.

Here's an initial round of comments. I'm leaving the driver parts mostly to the driver experts.

lib/CodeGen/CGCUDANV.cpp
109	Use C++11 {...} initialization?
166	If this is pseudocode example, second level of // comments is superfluous
182	const?
184	Can you document the C signature of the called function somewhere for clarity?
186	leftovers?
208	same here re second-level //
240	Is the 4 in [4] needed?
lib/CodeGen/CGCUDARuntime.h
39	It would really be great not to have data inside this abstract interface; is this necessary? Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :)
46	Please document these APIs

eliben added inline comments.Apr 3 2015, 2:30 PM

include/clang/Driver/Action.h
140	Can you give an example in this comment? like sm_30, etc.
144	IIRC _[A-Z] names are discouraged, and against the style anyway
include/clang/Driver/CC1Options.td
605	I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then? PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow. [this applies to other uses of "gpucode" too]
include/clang/Driver/Options.td
462	Is it possible to make these flags positive, with false-by-default values?
1079	What is this for?
include/clang/Frontend/CodeGenOptions.h
164	s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode"
lib/CodeGen/CGCUDANV.cpp
50	Put doc comments for the new functions/methods you're adding, and preferably for the data fields as well, unless they're completely obvious
95	Do you really need Zeros as a member? You only use it once. Also, if you just declare it you can use the nice C++11 {...} initializer in the place of use, making the code even shorter.

Addressed eliben@'s review comments.

tra added inline comments.Apr 6 2015, 11:04 AM

include/clang/Driver/Action.h
140	Done
144	Done.
include/clang/Driver/CC1Options.td
605	It's actually an opaque blob. clang does not care what's in the file as it just passes the bits to cudart which passes it to the driver. The driver can digest PTX (which we pass in this case), but it will as happily accept GPU code packed in fatbin or cubin formats. If/when we grow ability to compile device-side to SASS, we would just do "-cuda-include-gpucode gpu-code-packed-in.cubin" and it should work with no other changes on the host side. So, 'gpucode' was the best approximation I could come up with that would keep "GPU code in any shape or form as long as it's PTX/fatbin or cubin". I'd be happy to change it. Suggestions?
include/clang/Driver/Options.td
462	Sure. Changed the options to -fcuda-host-only/-fcuda-device-only
1079	I've added for (partial) compatibility with nvcc. I've removed it for now as drop-in nvcc compatibility is not the purpose of this patch.
include/clang/Frontend/CodeGenOptions.h
164	It's a vector of strings containing names of files that contain GPU code blobs, whatever their format may be. I'll rename the variable to CudaGpuCodeFileNames and will update the comment to reflect that. How about this? /// A list of file names passed with -cuda-include-gpucode options to forward /// to CUDA runtime back-end for incorporating them into host-side object /// file. std::vector<std::string> CudaGpuCodeFileNames;
lib/CodeGen/CGCUDANV.cpp
50	Moved some fields out of the class and into local variables where they are used. Documented the rest.
95	Done. Also moved number of other things with single use down to where they are used.
109	OK.
166	The idea I wanted to convey is that I'm not really generating the loop, but rather rather generate a call for each kernel, in effect unrolling the loop. I've changed pseudocode to linear sequence of calls which is what those functions really generate.
182	Nope. CreateBitCast wants non-const Function: ../../../tools/clang/lib/CodeGen/CGCUDANV.cpp:198:31: error: cannot initialize a parameter of type 'llvm::Value ' with an lvalue of type 'const llvm::Function *' Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr
184	I've moved CreateRuntimeFunction(...,"__cudaRegisterFunction") along with its signature in the comments into makeRegisterKernelsFn, so it should be visible close to where it's used.
186	Yes. Removed.
208	Fixed.
240	Not really. Removed.
lib/CodeGen/CGCUDARuntime.h
39	List of generated kernels is something that I expect to be useful for all subclasses of CUDARuntime. That's why I've put EmittedKernels there and a non-virtual methodEmitDeviceStub() to populate it. FatbinHandles, on the other hand, is indeed cudart-specific. I've moved it into CGCUDANV.
46	Done.

A couple of replies to comments; will do another pass on the new revision

include/clang/Driver/CC1Options.td
605	I see - some generic mnemonic is needed, I agree (so PTX is not a good idea). But "--gpu-code" is a nvcc flag that means something completely different :-/ So "gpu code" here may still be confusing. Maybe "gpublob" or "gpuobject" or "gpubinary" or something like that. I can't think of a perfect solution right now. I'll leave it to your discretion.
include/clang/Frontend/CodeGenOptions.h
164	Yeah, if this is for file names, it's a good idea to have "FileNames" in the name
lib/CodeGen/CGCUDARuntime.h
39	I would still remove EmittedKernels for now; we only have a single CUDA runtime at this time in upstream, so this feels redundant, as it makes the runtime interface / implementation barrier less clean than it should be. In the future if/when new runtime implementations are added, we'll figure out what's the best way to factor common code out is. YAGNI, essentially :)

tra added inline comments.Apr 6 2015, 3:25 PM

include/clang/Driver/CC1Options.td
605	gpubinary wins.
lib/CodeGen/CGCUDARuntime.h
39	OK.

Where are the tests for emitting ctors/dtors, registering kernels, etc?

include/clang/Driver/CC1Options.td
605	Should we prefix all cuda-related flags with -f for consistency with the existing ones? Don't know if it makes sense given that the cl_ ones above (for example) have no -f, but at least the CUDA ones should be consistent among themselves
lib/CodeGen/CGCUDANV.cpp
38	s/VMContext/Context/
41	Document FatbinHandles
94	extra line
181	Can you include the parameter names in this declaration? It would be much easier to follow I believe this comes from host_runtime.h?
190	Please document what BlobHandlePtr means here and how it's used
252	Use a named constant for the magic number -- it will then document itself
253	I'd go for a constant here as well These can be class level, probably
260	Comment explaining why
lib/CodeGen/CGCUDARuntime.h
48	I'd move this to the implementation as well, along with EmittedKernels. Just reading the documentation of this method makes little sense given that it lives in an abstract interface. The code will be easier to untangle if the interface stays completely functionality-free. At this time this won't even add code duplication since we just have a single implementation.
lib/Driver/Driver.cpp
1235	you can just return new CudaHostAction... here, no?
1520	Remove
lib/Driver/Tools.cpp
2590	Can you explain a bit more why/what this means in the comment?

I'm still working on the changes to address your comments, so "done" means "done but not submitted yet". I'll finish remaining bits and will update the patch tomorrow (Tue).

include/clang/Driver/CC1Options.td
605	Just had a chat with chandlerc@ and echristo@ on the subject. Consensus appears to be that options related to driver behavior should be --cuda-something[=value] and options passed down to cc1 -fcuda-something[=value]. I'll rename the options I've added accordingly.
lib/CodeGen/CGCUDANV.cpp
38	Done.
41	Done.
94	Removed.
252	It would be an overkill IMO. There's nothing more informative I could add to the comment that it's a magic number.
260	That's what nvcc does. I don't know whether there's a good reason for it. Removing it does not seem to break loading of GPU binary, so I'll remove explicit alignment.
lib/CodeGen/CGCUDARuntime.h
48	Done that already while I was moving EmittedKernels out.
lib/Driver/Driver.cpp
1235	Done.
1520	That was not intended to be committed. Will fix shortly.
lib/Driver/Tools.cpp
2590	General assumption that compilation deals with a single source file. When we're compiling CUDA, driver may generate additional build passes and we may end up with an action that has more than one action input. The check makes sure that all those inputs were results of compilation of the same source file. Hmm. That's another case where I need info about source file type. Let me see if I can add a function to dig that out from the action chain and then this loop will not be necessary as I can explicitly check whether we're compiling a CUDA file.

Round #2 of clean-ups to address eliben@'s commens

Renamed new options to be more consistent.
Added more comments, fixed formatting errors.

tra updated this object.Apr 7 2015, 1:45 PM

Added test case for IR generation for module constructor/destructor.

non-driver parts LGTM

Hi Art,

Starting to look pretty good here. I've got a few inline nits and a couple of small requests, but I think we're almost ready to go here. Sorry for the delays.

-eric

lib/CodeGen/CGCUDANV.cpp
164	"with the CUDA runtime".
166	The function name begins with a .? Ugh.
197–207	clang-format?
lib/Driver/Driver.cpp
183–186	"and partial CUDA compilations only run up"
1194	Some comment on the default here.
1672–1674	Do you need the declaration up here? Why not just pull the static function up if so?
1732	Probably would prefer "DeviceTriple" here.
lib/Driver/Tools.cpp
2583–2588	Comment about what's going on here.
2696	Might be nice to pull this sort of change out so it isn't affecting the rest of the diff.
5741–5747	Please pull this out into a separate patch.
test/CodeGenCUDA/device-stub.cu
40–46 ↗	(On Diff #23372)	Should some of these be CHECK-NEXT?

Addressed most of echristo@'s comments.
I will split out the parts Eric suggested and runtime glue code generation into separate sets of changes.

tra added inline comments.May 5 2015, 1:24 PM

lib/CodeGen/CGCUDANV.cpp
164	Done.
166	Replaced with __
197–207	Done. I've also replaced last argument with a plain NullPtr.
lib/Driver/Driver.cpp
183–186	Fixed.
1194	Done.
1672–1674	That would clutter the changes for no good reason. Whenever bunch of code moved from one place to another, it's always a pain figuring out whether things were just copied or copied and changed. Forward declaration is a lesser crime, IMO.
1732	Done.
lib/Driver/Tools.cpp
2583–2588	Done.
2696	Sure. I was also thinking of splitting code generation into a separate commit as well as it's largely independent of the driver changes.
test/CodeGenCUDA/device-stub.cu
40–46 ↗	(On Diff #23372)	Some. Changed to CHECK-NEXT where it was possible.

Fixed error checking in createInvocationFromCommandLine() so it can deal with multiple jobs created during cuda compilation.
Added a test case to make sure external tools can parse cuda files.

tra mentioned this in D9507: [cuda] Include GPU binary into host object file and generate init/deinit code..May 5 2015, 3:46 PM

tra mentioned this in D9509: [cuda] Driver changes to build and stitch together host and device-side CUDA code..May 5 2015, 3:58 PM

tra updated this revision to Diff 25077.May 6 2015, 11:52 AM

tra updated this object.

This comment was removed by tra.

Ignore diff 25077 which was unintentionally added to this review.
This review has been split into D9509, D9507 and D9506

mkuron added a subscriber: mkuron.Sep 2 2015, 5:06 AM

Revision Contents

Path

Size

include/

clang/

Driver/

35 lines

2 lines

3 lines

6 lines

3 lines

1 line

Frontend/

CodeGenOptions.h

5 lines

lib/

CodeGen/

220 lines

18 lines

2 lines

7 lines

Driver/

21 lines

205 lines

2 lines

12 lines

56 lines

2 lines

46 lines

21 lines

Frontend/

CompilerInvocation.cpp

3 lines

test/

Driver/

cuda-options.cu

108 lines

Index/

attributes-cuda.cu

4 lines

tools/

libclang/

CIndex.cpp

5 lines

unittests/

ASTMatchers/

ASTMatchersTest.h

1 line

Diff 23366

include/clang/Driver/Action.h

Show All 35 Lines
public:		public:
typedef ActionList::size_type size_type;		typedef ActionList::size_type size_type;
typedef ActionList::iterator iterator;		typedef ActionList::iterator iterator;
typedef ActionList::const_iterator const_iterator;		typedef ActionList::const_iterator const_iterator;

enum ActionClass {		enum ActionClass {
InputClass = 0,		InputClass = 0,
BindArchClass,		BindArchClass,
		CudaDeviceClass,
		CudaHostClass,
PreprocessJobClass,		PreprocessJobClass,
PrecompileJobClass,		PrecompileJobClass,
AnalyzeJobClass,		AnalyzeJobClass,
MigrateJobClass,		MigrateJobClass,
CompileJobClass,		CompileJobClass,
BackendJobClass,		BackendJobClass,
AssembleJobClass,		AssembleJobClass,
LinkJobClass,		LinkJobClass,
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	public:

const char *getArchName() const { return ArchName; }		const char *getArchName() const { return ArchName; }

static bool classof(const Action *A) {		static bool classof(const Action *A) {
return A->getKind() == BindArchClass;		return A->getKind() == BindArchClass;
}		}
};		};

		class CudaDeviceAction : public Action {
		virtual void anchor();
		/// GPU architecture to bind -- e.g sm_35
		elibenUnsubmitted Not Done Reply Inline Actions Can you give an example in this comment? like sm_30, etc. eliben: Can you give an example in this comment? like sm_30, etc.
		traAuthorUnsubmitted Not Done Reply Inline Actions Done tra: Done
		const char *GpuArchName;
		bool AtTopLevel;

		public:
		elibenUnsubmitted Not Done Reply Inline Actions IIRC _[A-Z] names are discouraged, and against the style anyway eliben: IIRC _[A-Z] names are discouraged, and against the style anyway
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		CudaDeviceAction(std::unique_ptr<Action> Input, const char *ArchName,
		bool AtTopLevel);

		const char *getGpuArchName() const { return GpuArchName; }
		bool isAtTopLevel() const { return AtTopLevel; }

		static bool classof(const Action *A) {
		return A->getKind() == CudaDeviceClass;
		}
		};

		class CudaHostAction : public Action {
		virtual void anchor();
		ActionList DeviceActions;

		public:
		CudaHostAction(std::unique_ptr<Action> Input,
		const ActionList &DeviceActions);
		~CudaHostAction() override;

		ActionList &getDeviceActions() { return DeviceActions; }
		const ActionList &getDeviceActions() const { return DeviceActions; }

		static bool classof(const Action *A) { return A->getKind() == CudaHostClass; }
		};

class JobAction : public Action {		class JobAction : public Action {
virtual void anchor();		virtual void anchor();
protected:		protected:
JobAction(ActionClass Kind, std::unique_ptr<Action> Input, types::ID Type);		JobAction(ActionClass Kind, std::unique_ptr<Action> Input, types::ID Type);
JobAction(ActionClass Kind, const ActionList &Inputs, types::ID Type);		JobAction(ActionClass Kind, const ActionList &Inputs, types::ID Type);

public:		public:
static bool classof(const Action *A) {		static bool classof(const Action *A) {
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

include/clang/Driver/CC1Options.td

Show First 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	def cl_std_EQ : Joined<["-"], "cl-std=">,
HelpText<"OpenCL language standard to compile for">;		HelpText<"OpenCL language standard to compile for">;
def cl_denorms_are_zero : Flag<["-"], "cl-denorms-are-zero">,		def cl_denorms_are_zero : Flag<["-"], "cl-denorms-are-zero">,
HelpText<"OpenCL only. Allow denormals to be flushed to zero">;		HelpText<"OpenCL only. Allow denormals to be flushed to zero">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CUDA Options		// CUDA Options
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def fcuda_is_device : Flag<["-"], "fcuda-is-device">,		def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
		elibenUnsubmitted Not Done Reply Inline Actions I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then? PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow. [this applies to other uses of "gpucode" too] eliben: I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does…
		traAuthorUnsubmitted Not Done Reply Inline Actions It's actually an opaque blob. clang does not care what's in the file as it just passes the bits to cudart which passes it to the driver. The driver can digest PTX (which we pass in this case), but it will as happily accept GPU code packed in fatbin or cubin formats. If/when we grow ability to compile device-side to SASS, we would just do "-cuda-include-gpucode gpu-code-packed-in.cubin" and it should work with no other changes on the host side. So, 'gpucode' was the best approximation I could come up with that would keep "GPU code in any shape or form as long as it's PTX/fatbin or cubin". I'd be happy to change it. Suggestions? tra: It's actually an opaque blob. clang does not care what's in the file as it just passes the bits…
		elibenUnsubmitted Not Done Reply Inline Actions I see - some generic mnemonic is needed, I agree (so PTX is not a good idea). But "--gpu-code" is a nvcc flag that means something completely different :-/ So "gpu code" here may still be confusing. Maybe "gpublob" or "gpuobject" or "gpubinary" or something like that. I can't think of a perfect solution right now. I'll leave it to your discretion. eliben: I see - some generic mnemonic is needed, I agree (so PTX is not a good idea). But "--gpu-code"…
		elibenUnsubmitted Not Done Reply Inline Actions Should we prefix all cuda-related flags with -f for consistency with the existing ones? Don't know if it makes sense given that the cl_ ones above (for example) have no -f, but at least the CUDA ones should be consistent among themselves eliben: Should we prefix all cuda-related flags with -f for consistency with the existing ones? Don't…
		traAuthorUnsubmitted Not Done Reply Inline Actions gpubinary wins. tra: gpubinary wins.
		traAuthorUnsubmitted Not Done Reply Inline Actions Just had a chat with chandlerc@ and echristo@ on the subject. Consensus appears to be that options related to driver behavior should be --cuda-something[=value] and options passed down to cc1 -fcuda-something[=value]. I'll rename the options I've added accordingly. tra: Just had a chat with chandlerc@ and echristo@ on the subject. Consensus appears to be that…
HelpText<"Generate code for CUDA device">;		HelpText<"Generate code for CUDA device">;
def fcuda_allow_host_calls_from_host_device : Flag<["-"],		def fcuda_allow_host_calls_from_host_device : Flag<["-"],
"fcuda-allow-host-calls-from-host-device">,		"fcuda-allow-host-calls-from-host-device">,
HelpText<"Allow host device functions to call host functions">;		HelpText<"Allow host device functions to call host functions">;
		def fcuda_include_gpubinary : Separate<["-"], "fcuda-include-gpubinary">,
		HelpText<"Incorporate CUDA device-side binary into host object file.">;

} // let Flags = [CC1Option]		} // let Flags = [CC1Option]


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// cc1as-only Options		// cc1as-only Options
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Show All 23 Lines

include/clang/Driver/Driver.h

Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	public:

bool IsUsingLTO(const ToolChain &TC, const llvm::opt::ArgList &Args) const;		bool IsUsingLTO(const ToolChain &TC, const llvm::opt::ArgList &Args) const;

private:		private:
/// \brief Retrieves a ToolChain for a particular target triple.		/// \brief Retrieves a ToolChain for a particular target triple.
///		///
/// Will cache ToolChains for the life of the driver object, and create them		/// Will cache ToolChains for the life of the driver object, and create them
/// on-demand.		/// on-demand.
		const ToolChain &getTargetToolChain(const llvm::opt::ArgList &Args,
		llvm::Triple &Target) const;

const ToolChain &getToolChain(const llvm::opt::ArgList &Args,		const ToolChain &getToolChain(const llvm::opt::ArgList &Args,
StringRef DarwinArchName = "") const;		StringRef DarwinArchName = "") const;

/// @}		/// @}

/// \brief Get bitmasks for which option flags to include and exclude based on		/// \brief Get bitmasks for which option flags to include and exclude based on
/// the driver mode.		/// the driver mode.
std::pair<unsigned, unsigned> getIncludeExcludeOptionFlagMasks() const;		std::pair<unsigned, unsigned> getIncludeExcludeOptionFlagMasks() const;
Show All 22 Lines

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 343 Lines • ▼ Show 20 Lines
	def coverage : Flag<["-", "--"], "coverage">;			def coverage : Flag<["-", "--"], "coverage">;
	def cpp_precomp : Flag<["-"], "cpp-precomp">, Group<clang_ignored_f_Group>;			def cpp_precomp : Flag<["-"], "cpp-precomp">, Group<clang_ignored_f_Group>;
	def current__version : JoinedOrSeparate<["-"], "current_version">;			def current__version : JoinedOrSeparate<["-"], "current_version">;
	def cxx_isystem : JoinedOrSeparate<["-"], "cxx-isystem">, Group<clang_i_Group>,			def cxx_isystem : JoinedOrSeparate<["-"], "cxx-isystem">, Group<clang_i_Group>,
	HelpText<"Add directory to the C++ SYSTEM include search path">, Flags<[CC1Option]>,			HelpText<"Add directory to the C++ SYSTEM include search path">, Flags<[CC1Option]>,
	MetaVarName<"<directory>">;			MetaVarName<"<directory>">;
	def c : Flag<["-"], "c">, Flags<[DriverOption]>,			def c : Flag<["-"], "c">, Flags<[DriverOption]>,
	HelpText<"Only run preprocess, compile, and assemble steps">;			HelpText<"Only run preprocess, compile, and assemble steps">;
				def cuda_device_only : Flag<["--"], "cuda-device-only">,
				HelpText<"Do device-side CUDA compilation only">;
				def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">,
				Flags<[DriverOption, HelpHidden]>, HelpText<"CUDA GPU architecture">;
				def cuda_host_only : Flag<["--"], "cuda-host-only">,
				HelpText<"Do host-side CUDA compilation only">;
	def dA : Flag<["-"], "dA">, Group<d_Group>;			def dA : Flag<["-"], "dA">, Group<d_Group>;
	def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,			def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode in addition to normal output">;			HelpText<"Print macro definitions in -E mode in addition to normal output">;
	def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,			def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode instead of normal output">;			HelpText<"Print macro definitions in -E mode instead of normal output">;
	def dead__strip : Flag<["-"], "dead_strip">;			def dead__strip : Flag<["-"], "dead_strip">;
	def dependency_file : Separate<["-"], "dependency-file">, Flags<[CC1Option]>,			def dependency_file : Separate<["-"], "dependency-file">, Flags<[CC1Option]>,
	HelpText<"Filename (or -) to write dependency output to">;			HelpText<"Filename (or -) to write dependency output to">;
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	def fconstant_cfstrings : Flag<["-"], "fconstant-cfstrings">, Group<f_Group>;			def fconstant_cfstrings : Flag<["-"], "fconstant-cfstrings">, Group<f_Group>;
	def fconstant_string_class_EQ : Joined<["-"], "fconstant-string-class=">, Group<f_Group>;			def fconstant_string_class_EQ : Joined<["-"], "fconstant-string-class=">, Group<f_Group>;
	def fconstexpr_depth_EQ : Joined<["-"], "fconstexpr-depth=">, Group<f_Group>;			def fconstexpr_depth_EQ : Joined<["-"], "fconstexpr-depth=">, Group<f_Group>;
	def fconstexpr_steps_EQ : Joined<["-"], "fconstexpr-steps=">, Group<f_Group>;			def fconstexpr_steps_EQ : Joined<["-"], "fconstexpr-steps=">, Group<f_Group>;
	def fconstexpr_backtrace_limit_EQ : Joined<["-"], "fconstexpr-backtrace-limit=">,			def fconstexpr_backtrace_limit_EQ : Joined<["-"], "fconstexpr-backtrace-limit=">,
	Group<f_Group>;			Group<f_Group>;
	def fno_crash_diagnostics : Flag<["-"], "fno-crash-diagnostics">, Group<f_clang_Group>, Flags<[NoArgumentUnused]>;			def fno_crash_diagnostics : Flag<["-"], "fno-crash-diagnostics">, Group<f_clang_Group>, Flags<[NoArgumentUnused]>;
	def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;			def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;
	def fcxx_exceptions: Flag<["-"], "fcxx-exceptions">, Group<f_Group>,			def fcxx_exceptions: Flag<["-"], "fcxx-exceptions">, Group<f_Group>,
				elibenUnsubmitted Not Done Reply Inline Actions Is it possible to make these flags positive, with false-by-default values? eliben: Is it possible to make these flags positive, with false-by-default values?
				traAuthorUnsubmitted Not Done Reply Inline Actions Sure. Changed the options to -fcuda-host-only/-fcuda-device-only tra: Sure. Changed the options to -fcuda-host-only/-fcuda-device-only
	HelpText<"Enable C++ exceptions">, Flags<[CC1Option]>;			HelpText<"Enable C++ exceptions">, Flags<[CC1Option]>;
	def fcxx_modules : Flag <["-"], "fcxx-modules">, Group<f_Group>,			def fcxx_modules : Flag <["-"], "fcxx-modules">, Group<f_Group>,
	Flags<[DriverOption]>;			Flags<[DriverOption]>;
	def fdebug_pass_arguments : Flag<["-"], "fdebug-pass-arguments">, Group<f_Group>;			def fdebug_pass_arguments : Flag<["-"], "fdebug-pass-arguments">, Group<f_Group>;
	def fdebug_pass_structure : Flag<["-"], "fdebug-pass-structure">, Group<f_Group>;			def fdebug_pass_structure : Flag<["-"], "fdebug-pass-structure">, Group<f_Group>;
	def fdiagnostics_fixit_info : Flag<["-"], "fdiagnostics-fixit-info">, Group<f_clang_Group>;			def fdiagnostics_fixit_info : Flag<["-"], "fdiagnostics-fixit-info">, Group<f_clang_Group>;
	def fdiagnostics_parseable_fixits : Flag<["-"], "fdiagnostics-parseable-fixits">, Group<f_clang_Group>,			def fdiagnostics_parseable_fixits : Flag<["-"], "fdiagnostics-parseable-fixits">, Group<f_clang_Group>,
	Flags<[CC1Option]>, HelpText<"Print fix-its in machine parseable form">;			Flags<[CC1Option]>, HelpText<"Print fix-its in machine parseable form">;
	▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines
	def gcolumn_info : Flag<["-"], "gcolumn-info">, Group<g_flags_Group>;			def gcolumn_info : Flag<["-"], "gcolumn-info">, Group<g_flags_Group>;
	def gno_column_info : Flag<["-"], "gno-column-info">, Group<g_flags_Group>;			def gno_column_info : Flag<["-"], "gno-column-info">, Group<g_flags_Group>;
	def gsplit_dwarf : Flag<["-"], "gsplit-dwarf">, Group<g_flags_Group>;			def gsplit_dwarf : Flag<["-"], "gsplit-dwarf">, Group<g_flags_Group>;
	def ggnu_pubnames : Flag<["-"], "ggnu-pubnames">, Group<g_flags_Group>;			def ggnu_pubnames : Flag<["-"], "ggnu-pubnames">, Group<g_flags_Group>;
	def gdwarf_aranges : Flag<["-"], "gdwarf-aranges">, Group<g_flags_Group>;			def gdwarf_aranges : Flag<["-"], "gdwarf-aranges">, Group<g_flags_Group>;
	def headerpad__max__install__names : Joined<["-"], "headerpad_max_install_names">;			def headerpad__max__install__names : Joined<["-"], "headerpad_max_install_names">;
	def help : Flag<["-", "--"], "help">, Flags<[CC1Option,CC1AsOption]>,			def help : Flag<["-", "--"], "help">, Flags<[CC1Option,CC1AsOption]>,
	HelpText<"Display available options">;			HelpText<"Display available options">;
	def index_header_map : Flag<["-"], "index-header-map">, Flags<[CC1Option]>,			def index_header_map : Flag<["-"], "index-header-map">, Flags<[CC1Option]>,
				elibenUnsubmitted Not Done Reply Inline Actions What is this for? eliben: What is this for?
				traAuthorUnsubmitted Not Done Reply Inline Actions I've added for (partial) compatibility with nvcc. I've removed it for now as drop-in nvcc compatibility is not the purpose of this patch. tra: I've added for (partial) compatibility with nvcc. I've removed it for now as drop-in nvcc…
	HelpText<"Make the next included directory (-I or -F) an indexer header map">;			HelpText<"Make the next included directory (-I or -F) an indexer header map">;
	def idirafter : JoinedOrSeparate<["-"], "idirafter">, Group<clang_i_Group>, Flags<[CC1Option]>,			def idirafter : JoinedOrSeparate<["-"], "idirafter">, Group<clang_i_Group>, Flags<[CC1Option]>,
	HelpText<"Add directory to AFTER include search path">;			HelpText<"Add directory to AFTER include search path">;
	def iframework : JoinedOrSeparate<["-"], "iframework">, Group<clang_i_Group>, Flags<[CC1Option]>,			def iframework : JoinedOrSeparate<["-"], "iframework">, Group<clang_i_Group>, Flags<[CC1Option]>,
	HelpText<"Add directory to SYSTEM framework search path">;			HelpText<"Add directory to SYSTEM framework search path">;
	def imacros : JoinedOrSeparate<["-", "--"], "imacros">, Group<clang_i_Group>, Flags<[CC1Option]>,			def imacros : JoinedOrSeparate<["-", "--"], "imacros">, Group<clang_i_Group>, Flags<[CC1Option]>,
	HelpText<"Include macros from file before parsing">, MetaVarName<"<file>">;			HelpText<"Include macros from file before parsing">, MetaVarName<"<file>">;
	def image__base : Separate<["-"], "image_base">;			def image__base : Separate<["-"], "image_base">;
	▲ Show 20 Lines • Show All 855 Lines • Show Last 20 Lines

include/clang/Driver/Types.h

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	#undef TYPE
bool canLipoType(ID Id);		bool canLipoType(ID Id);

/// isAcceptedByClang - Can clang handle this input type.		/// isAcceptedByClang - Can clang handle this input type.
bool isAcceptedByClang(ID Id);		bool isAcceptedByClang(ID Id);

/// isCXX - Is this a "C++" input (C++ and Obj-C++ sources and headers).		/// isCXX - Is this a "C++" input (C++ and Obj-C++ sources and headers).
bool isCXX(ID Id);		bool isCXX(ID Id);

		/// isCuda - Is this a CUDA input.
		bool isCuda(ID Id);

/// isObjC - Is this an "ObjC" input (Obj-C and Obj-C++ sources and headers).		/// isObjC - Is this an "ObjC" input (Obj-C and Obj-C++ sources and headers).
bool isObjC(ID Id);		bool isObjC(ID Id);

/// lookupTypeForExtension - Lookup the type to use for the file		/// lookupTypeForExtension - Lookup the type to use for the file
/// extension \p Ext.		/// extension \p Ext.
ID lookupTypeForExtension(const char *Ext);		ID lookupTypeForExtension(const char *Ext);

/// lookupTypeForTypSpecifier - Lookup the type to use for a user		/// lookupTypeForTypSpecifier - Lookup the type to use for a user
Show All 18 Lines

include/clang/Driver/Types.def

	Show All 38 Lines


	// C family source language (with and without preprocessing).			// C family source language (with and without preprocessing).
	TYPE("cpp-output", PP_C, INVALID, "i", "u")			TYPE("cpp-output", PP_C, INVALID, "i", "u")
	TYPE("c", C, PP_C, "c", "u")			TYPE("c", C, PP_C, "c", "u")
	TYPE("cl", CL, PP_C, "cl", "u")			TYPE("cl", CL, PP_C, "cl", "u")
	TYPE("cuda-cpp-output", PP_CUDA, INVALID, "cui", "u")			TYPE("cuda-cpp-output", PP_CUDA, INVALID, "cui", "u")
	TYPE("cuda", CUDA, PP_CUDA, "cu", "u")			TYPE("cuda", CUDA, PP_CUDA, "cu", "u")
				TYPE("cuda", CUDA_DEVICE, PP_CUDA, "cu", "")
	TYPE("objective-c-cpp-output", PP_ObjC, INVALID, "mi", "u")			TYPE("objective-c-cpp-output", PP_ObjC, INVALID, "mi", "u")
	TYPE("objc-cpp-output", PP_ObjC_Alias, INVALID, "mi", "u")			TYPE("objc-cpp-output", PP_ObjC_Alias, INVALID, "mi", "u")
	TYPE("objective-c", ObjC, PP_ObjC, "m", "u")			TYPE("objective-c", ObjC, PP_ObjC, "m", "u")
	TYPE("c++-cpp-output", PP_CXX, INVALID, "ii", "u")			TYPE("c++-cpp-output", PP_CXX, INVALID, "ii", "u")
	TYPE("c++", CXX, PP_CXX, "cpp", "u")			TYPE("c++", CXX, PP_CXX, "cpp", "u")
	TYPE("objective-c++-cpp-output", PP_ObjCXX, INVALID, "mii", "u")			TYPE("objective-c++-cpp-output", PP_ObjCXX, INVALID, "mii", "u")
	TYPE("objc++-cpp-output", PP_ObjCXX_Alias, INVALID, "mii", "u")			TYPE("objc++-cpp-output", PP_ObjCXX_Alias, INVALID, "mii", "u")
	TYPE("objective-c++", ObjCXX, PP_ObjCXX, "mm", "u")			TYPE("objective-c++", ObjCXX, PP_ObjCXX, "mm", "u")
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

include/clang/Frontend/CodeGenOptions.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	public:
std::vector<std::string> DependentLibraries;		std::vector<std::string> DependentLibraries;

/// Name of the profile file to use with -fprofile-sample-use.		/// Name of the profile file to use with -fprofile-sample-use.
std::string SampleProfileFile;		std::string SampleProfileFile;

/// Name of the profile file to use as input for -fprofile-instr-use		/// Name of the profile file to use as input for -fprofile-instr-use
std::string InstrProfileInput;		std::string InstrProfileInput;

		/// A list of file names passed with -fcuda-include-gpubinary options to
		/// forward to CUDA runtime back-end for incorporating them into host-side
		elibenUnsubmitted Not Done Reply Inline Actions s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode" eliben: s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode"
		traAuthorUnsubmitted Not Done Reply Inline Actions It's a vector of strings containing names of files that contain GPU code blobs, whatever their format may be. I'll rename the variable to CudaGpuCodeFileNames and will update the comment to reflect that. How about this? /// A list of file names passed with -cuda-include-gpucode options to forward /// to CUDA runtime back-end for incorporating them into host-side object /// file. std::vector<std::string> CudaGpuCodeFileNames; tra: It's a vector of strings containing names of files that contain GPU code blobs, whatever their…
		elibenUnsubmitted Not Done Reply Inline Actions Yeah, if this is for file names, it's a good idea to have "FileNames" in the name eliben: Yeah, if this is for file names, it's a good idea to have "FileNames" in the name
		/// object file.
		std::vector<std::string> CudaGpuBinaryFileNames;

/// Regular expression to select optimizations for which we should enable		/// Regular expression to select optimizations for which we should enable
/// optimization remarks. Transformation passes whose name matches this		/// optimization remarks. Transformation passes whose name matches this
/// expression (and support this feature), will emit a diagnostic		/// expression (and support this feature), will emit a diagnostic
/// whenever they perform a transformation. This is enabled by the		/// whenever they perform a transformation. This is enabled by the
/// -Rpass=regexp flag.		/// -Rpass=regexp flag.
std::shared_ptr<llvm::Regex> OptimizationRemarkPattern;		std::shared_ptr<llvm::Regex> OptimizationRemarkPattern;

/// Regular expression to select optimizations for which we should enable		/// Regular expression to select optimizations for which we should enable
Show All 35 Lines

lib/CodeGen/CGCUDANV.cpp

Show All 14 Lines
#include "CGCUDARuntime.h"		#include "CGCUDARuntime.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
		#include "llvm/IR/Verifier.h"
#include <vector>		#include <vector>

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

namespace {		namespace {

class CGNVCUDARuntime : public CGCUDARuntime {		class CGNVCUDARuntime : public CGCUDARuntime {

private:		private:
llvm::Type IntTy, SizeTy;		llvm::Type IntTy, SizeTy, *VoidTy;
llvm::PointerType CharPtrTy, VoidPtrTy;		llvm::PointerType CharPtrTy, VoidPtrTy, *VoidPtrPtrTy;

		/// Convenience reference to LLVM Context
		llvm::LLVMContext &Context;
		elibenUnsubmitted Not Done Reply Inline Actions s/VMContext/Context/ eliben: s/VMContext/Context/
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		/// Convenience reference to the current module
		llvm::Module &TheModule;
		/// Keeps track of kernel launch stubs emitted in this module
		elibenUnsubmitted Not Done Reply Inline Actions Document FatbinHandles eliben: Document FatbinHandles
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
		/// Keeps track of variables containing handles of GPU binaries Populated by
		/// ModuleCtorFunction() and used to create corresponding cleanup calls in
		/// ModuleDtorFunction()
		llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles;

llvm::Constant *getSetupArgumentFn() const;		llvm::Constant *getSetupArgumentFn() const;
llvm::Constant *getLaunchFn() const;		llvm::Constant *getLaunchFn() const;

		elibenUnsubmitted Not Done Reply Inline Actions Put doc comments for the new functions/methods you're adding, and preferably for the data fields as well, unless they're completely obvious eliben: Put doc comments for the new functions/methods you're adding, and preferably for the data…
		traAuthorUnsubmitted Not Done Reply Inline Actions Moved some fields out of the class and into local variables where they are used. Documented the rest. tra: Moved some fields out of the class and into local variables where they are used. Documented the…
		/// Creates a function to register all kernel stubs generated in this module.
		llvm::Function *makeRegisterKernelsFn();

		/// Helper function that generates a constant string and returns a pointer to
		/// the start of the string. The result of this function can be used anywhere
		/// where the C code specifies const char*.
		llvm::Constant *MakeConstantString(const std::string &Str,
		const std::string &Name = "",
		unsigned Alignment = 0) {
		llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0),
		llvm::ConstantInt::get(SizeTy, 0)};
		llvm::Constant *ConstStr =
		CGM.GetAddrOfConstantCString(Str, Name.c_str(), Alignment);
		return llvm::ConstantExpr::getGetElementPtr(ConstStr, Zeros);
		}

		void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args);

public:		public:
CGNVCUDARuntime(CodeGenModule &CGM);		CGNVCUDARuntime(CodeGenModule &CGM);

void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args) override;		void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;
		/// Creates module constructor function
		llvm::Function *ModuleCtorFunction() override;
		/// Creates module destructor function
		llvm::Function *ModuleDtorFunction() override;
};		};

}		}

CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) : CGCUDARuntime(CGM) {		CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
		: CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
		TheModule(CGM.getModule()) {
CodeGen::CodeGenTypes &Types = CGM.getTypes();		CodeGen::CodeGenTypes &Types = CGM.getTypes();
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

IntTy = Types.ConvertType(Ctx.IntTy);		IntTy = Types.ConvertType(Ctx.IntTy);
SizeTy = Types.ConvertType(Ctx.getSizeType());		SizeTy = Types.ConvertType(Ctx.getSizeType());
		VoidTy = llvm::Type::getVoidTy(Context);

CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));		CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));
VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));		VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));
		VoidPtrPtrTy = VoidPtrTy->getPointerTo();
}		}
		elibenUnsubmitted Not Done Reply Inline Actions extra line eliben: extra line
		traAuthorUnsubmitted Not Done Reply Inline Actions Removed. tra: Removed.

		elibenUnsubmitted Not Done Reply Inline Actions Do you really need Zeros as a member? You only use it once. Also, if you just declare it you can use the nice C++11 {...} initializer in the place of use, making the code even shorter. eliben: Do you really need Zeros as a member? You only use it once. Also, if you just declare it you…
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. Also moved number of other things with single use down to where they are used. tra: Done. Also moved number of other things with single use down to where they are used.
llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {		llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {
// cudaError_t cudaSetupArgument(void *, size_t, size_t)		// cudaError_t cudaSetupArgument(void *, size_t, size_t)
std::vector<llvm::Type*> Params;		std::vector<llvm::Type*> Params;
Params.push_back(VoidPtrTy);		Params.push_back(VoidPtrTy);
Params.push_back(SizeTy);		Params.push_back(SizeTy);
Params.push_back(SizeTy);		Params.push_back(SizeTy);
return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,		return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,
Params, false),		Params, false),
"cudaSetupArgument");		"cudaSetupArgument");
}		}

llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {		llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {
// cudaError_t cudaLaunch(char *)		// cudaError_t cudaLaunch(char *)
std::vector<llvm::Type*> Params;		return CGM.CreateRuntimeFunction(
		elibenUnsubmitted Not Done Reply Inline Actions Use C++11 {...} initialization? eliben: Use C++11 {...} initialization?
		traAuthorUnsubmitted Not Done Reply Inline Actions OK. tra: OK.
Params.push_back(CharPtrTy);		llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");
return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,		}
Params, false),
"cudaLaunch");		void CGNVCUDARuntime::EmitDeviceStub(CodeGenFunction &CGF,
		FunctionArgList &Args) {
		EmittedKernels.push_back(CGF.CurFn);
		EmitDeviceStubBody(CGF, Args);
}		}

void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF,		void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
// Build the argument value list and the argument stack struct type.		// Build the argument value list and the argument stack struct type.
SmallVector<llvm::Value *, 16> ArgValues;		SmallVector<llvm::Value *, 16> ArgValues;
std::vector<llvm::Type *> ArgTypes;		std::vector<llvm::Type *> ArgTypes;
for (FunctionArgList::const_iterator I = Args.begin(), E = Args.end();		for (FunctionArgList::const_iterator I = Args.begin(), E = Args.end();
I != E; ++I) {		I != E; ++I) {
llvm::Value V = CGF.GetAddrOfLocalVar(I);		llvm::Value V = CGF.GetAddrOfLocalVar(I);
ArgValues.push_back(V);		ArgValues.push_back(V);
assert(isa<llvm::PointerType>(V->getType()) && "Arg type not PointerType");		assert(isa<llvm::PointerType>(V->getType()) && "Arg type not PointerType");
ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType());		ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType());
}		}
llvm::StructType *ArgStackTy = llvm::StructType::get(		llvm::StructType *ArgStackTy = llvm::StructType::get(Context, ArgTypes);
CGF.getLLVMContext(), ArgTypes);

llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");		llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");

// Emit the calls to cudaSetupArgument		// Emit the calls to cudaSetupArgument
llvm::Constant *cudaSetupArgFn = getSetupArgumentFn();		llvm::Constant *cudaSetupArgFn = getSetupArgumentFn();
for (unsigned I = 0, E = Args.size(); I != E; ++I) {		for (unsigned I = 0, E = Args.size(); I != E; ++I) {
llvm::Value *Args[3];		llvm::Value *Args[3];
llvm::BasicBlock *NextBlock = CGF.createBasicBlock("setup.next");		llvm::BasicBlock *NextBlock = CGF.createBasicBlock("setup.next");
Show All 15 Lines	void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF,
llvm::Constant *cudaLaunchFn = getLaunchFn();		llvm::Constant *cudaLaunchFn = getLaunchFn();
llvm::Value *Arg = CGF.Builder.CreatePointerCast(CGF.CurFn, CharPtrTy);		llvm::Value *Arg = CGF.Builder.CreatePointerCast(CGF.CurFn, CharPtrTy);
CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);		CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
CGF.EmitBranch(EndBlock);		CGF.EmitBranch(EndBlock);

CGF.EmitBlock(EndBlock);		CGF.EmitBlock(EndBlock);
}		}

		/// Creates internal function to register all kernel stubs generated in this
		/// module with CUDA runtime.
		echristoUnsubmitted Not Done Reply Inline Actions "with the CUDA runtime". echristo: "with the CUDA runtime".
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		/// \code
		/// void .cuda_register_kernels(void** GpuBinaryHandle) {
		elibenUnsubmitted Not Done Reply Inline Actions If this is pseudocode example, second level of // comments is superfluous eliben: If this is pseudocode example, second level of // comments is superfluous
		traAuthorUnsubmitted Not Done Reply Inline Actions The idea I wanted to convey is that I'm not really generating the loop, but rather rather generate a call for each kernel, in effect unrolling the loop. I've changed pseudocode to linear sequence of calls which is what those functions really generate. tra: The idea I wanted to convey is that I'm not really generating the loop, but rather rather…
		echristoUnsubmitted Not Done Reply Inline Actions The function name begins with a .? Ugh. echristo: The function name begins with a .? Ugh.
		traAuthorUnsubmitted Not Done Reply Inline Actions Replaced with __ tra: Replaced with __
		/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...);
		/// ...
		/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...);
		/// }
		/// \endcode
		llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() {
		llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
		llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
		llvm::GlobalValue::InternalLinkage, ".cuda_register_kernels", &TheModule);
		llvm::BasicBlock *EntryBB =
		llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
		CGBuilderTy Builder(Context);
		Builder.SetInsertPoint(EntryBB);

		// void __cudaRegisterFunction(void *, const char , char , const char ,
		elibenUnsubmitted Not Done Reply Inline Actions Can you include the parameter names in this declaration? It would be much easier to follow I believe this comes from host_runtime.h? eliben: Can you include the parameter names in this declaration? It would be much easier to follow I…
		// int, uint3, uint3, dim3, dim3, int*)
		elibenUnsubmitted Not Done Reply Inline Actions const? eliben: const?
		traAuthorUnsubmitted Not Done Reply Inline Actions Nope. CreateBitCast wants non-const Function: ../../../tools/clang/lib/CodeGen/CGCUDANV.cpp:198:31: error: cannot initialize a parameter of type 'llvm::Value ' with an lvalue of type 'const llvm::Function ' Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr tra:* Nope. CreateBitCast wants non-const Function*: ../../../tools/clang/lib/CodeGen/CGCUDANV.cpp…
		std::vector<llvm::Type *> RegisterFuncParams = {
		VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,
		elibenUnsubmitted Not Done Reply Inline Actions Can you document the C signature of the called function somewhere for clarity? eliben: Can you document the C signature of the called function somewhere for clarity?
		traAuthorUnsubmitted Not Done Reply Inline Actions I've moved CreateRuntimeFunction(...,"__cudaRegisterFunction") along with its signature in the comments into makeRegisterKernelsFn, so it should be visible close to where it's used. tra: I've moved CreateRuntimeFunction(...,"__cudaRegisterFunction") along with its signature in the…
		VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, IntTy->getPointerTo()};
		llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(
		elibenUnsubmitted Not Done Reply Inline Actions leftovers? eliben: leftovers?
		traAuthorUnsubmitted Not Done Reply Inline Actions Yes. Removed. tra: Yes. Removed.
		llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
		"__cudaRegisterFunction");

		// Extract GpuBinaryHandle passed as the first argument passed to
		elibenUnsubmitted Not Done Reply Inline Actions Please document what BlobHandlePtr means here and how it's used eliben: Please document what BlobHandlePtr means here and how it's used
		// .cuda_register_kernels() and generate __cudaRegisterFunction() call for
		// each emitted kernel.
		llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
		for (llvm::Function *Kernel : EmittedKernels) {
		llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
		llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
		llvm::Value *args[] = {
		&GpuBinaryHandlePtr,
		Builder.CreateBitCast(Kernel, VoidPtrTy),
		KernelName,
		KernelName,
		llvm::ConstantInt::get(IntTy, -1),
		NullPtr,
		NullPtr,
		NullPtr,
		NullPtr,
		llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
		echristoUnsubmitted Not Done Reply Inline Actions clang-format? echristo: clang-format?
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. I've also replaced last argument with a plain NullPtr. tra: Done. I've also replaced last argument with a plain NullPtr.
		Builder.CreateCall(RegisterFunc, args);
		elibenUnsubmitted Not Done Reply Inline Actions same here re second-level // eliben: same here re second-level //
		traAuthorUnsubmitted Not Done Reply Inline Actions Fixed. tra: Fixed.
		}

		Builder.CreateRetVoid();
		llvm::verifyFunction(*RegisterKernelsFunc);
		return RegisterKernelsFunc;
		}

		/// Creates a global constructor function for the module:
		/// \code
		/// void .cuda_module_ctor(void*) {
		/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0);
		/// .cuda_register_kernels(Handle0);
		/// ...
		/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN);
		/// .cuda_register_kernels(HandleN);
		/// }
		/// \endcode
		llvm::Function *CGNVCUDARuntime::ModuleCtorFunction() {
		// void .cuda_register_kernels(void* handle);
		llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn();
		// void ** __cudaRegisterFatBinary(void *);
		llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
		"__cudaRegisterFatBinary");
		// struct { int magic, int version, void * gpu_binary, void * dont_care };
		llvm::StructType *FatbinWrapperTy =
		llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy, nullptr);

		llvm::Function *ModuleCtorFunc = llvm::Function::Create(
		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
		llvm::GlobalValue::InternalLinkage, ".cuda_module_ctor", &TheModule);
		llvm::BasicBlock *CtorEntryBB =
		elibenUnsubmitted Not Done Reply Inline Actions Is the 4 in [4] needed? eliben: Is the 4 in [4] needed?
		traAuthorUnsubmitted Not Done Reply Inline Actions Not really. Removed. tra: Not really. Removed.
		llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);
		CGBuilderTy CtorBuilder(Context);

		CtorBuilder.SetInsertPoint(CtorEntryBB);

		for (const std::string &GpuBinaryFileName :
		CGM.getCodeGenOpts().CudaGpuBinaryFileNames) {
		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =
		llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);
		if (std::error_code EC = GpuBinaryOrErr.getError()) {
		CGM.getDiags().Report(diag::err_cannot_open_file) << GpuBinaryFileName
		<< EC.message();
		elibenUnsubmitted Not Done Reply Inline Actions Use a named constant for the magic number -- it will then document itself eliben: Use a named constant for the magic number -- it will then document itself
		traAuthorUnsubmitted Not Done Reply Inline Actions It would be an overkill IMO. There's nothing more informative I could add to the comment that it's a magic number. tra: It would be an overkill IMO. There's nothing more informative I could add to the comment that…
		continue;
		elibenUnsubmitted Not Done Reply Inline Actions I'd go for a constant here as well These can be class level, probably eliben: I'd go for a constant here as well These can be class level, probably
		}

		// Create initialized wrapper structure that points to the loaded GPU binary
		llvm::Constant *Values[] = {
		llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper magic.
		llvm::ConstantInt::get(IntTy, 1), // Fatbin version.
		MakeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16), // Data.
		elibenUnsubmitted Not Done Reply Inline Actions Comment explaining why eliben: Comment explaining why
		traAuthorUnsubmitted Not Done Reply Inline Actions That's what nvcc does. I don't know whether there's a good reason for it. Removing it does not seem to break loading of GPU binary, so I'll remove explicit alignment. tra: That's what nvcc does. I don't know whether there's a good reason for it. Removing it does not…
		llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in fatbin v1.
		llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable(
		TheModule, FatbinWrapperTy, true, llvm::GlobalValue::InternalLinkage,
		llvm::ConstantStruct::get(FatbinWrapperTy, Values),
		".cuda_fatbin_wrapper");

		// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
		llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
		RegisterFatbinFunc,
		CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
		llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable(
		TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,
		llvm::ConstantPointerNull::get(VoidPtrPtrTy), ".cuda_gpubin_handle");
		CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle, false);

		// Call .cuda_register_kernels(GpuBinaryHandle);
		CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall);

		// Save GpuBinaryHandle so we can unregister it in destructor.
		GpuBinaryHandles.push_back(GpuBinaryHandle);
		}

		CtorBuilder.CreateRetVoid();
		llvm::verifyFunction(*ModuleCtorFunc);
		return ModuleCtorFunc;
		}

		/// Creates a global destructor function that unregisters all GPU code blobs
		/// registered by constructor.
		/// \code
		/// void .cuda_module_dtor(void*) {
		/// __cudaUnregisterFatBinary(Handle0);
		/// ...
		/// __cudaUnregisterFatBinary(HandleN);
		/// }
		/// \endcode
		llvm::Function *CGNVCUDARuntime::ModuleDtorFunction() {
		// void __cudaUnregisterFatBinary(void ** handle);
		llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
		"__cudaUnregisterFatBinary");

		llvm::Function *ModuleDtorFunc = llvm::Function::Create(
		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
		llvm::GlobalValue::InternalLinkage, ".cuda_module_dtor", &TheModule);
		llvm::BasicBlock *DtorEntryBB =
		llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);
		CGBuilderTy DtorBuilder(Context);
		DtorBuilder.SetInsertPoint(DtorEntryBB);

		for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) {
		DtorBuilder.CreateCall(UnregisterFatbinFunc,
		DtorBuilder.CreateLoad(GpuBinaryHandle, false));
		}

		DtorBuilder.CreateRetVoid();
		llvm::verifyFunction(*ModuleDtorFunc);
		return ModuleDtorFunc;
		}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

lib/CodeGen/CGCUDARuntime.h

	Show All 10 Lines
	// subclasses of this implement code generation for specific CUDA			// subclasses of this implement code generation for specific CUDA
	// runtime libraries.			// runtime libraries.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H			#ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
	#define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H			#define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H

				namespace llvm {
				class Function;
				}

	namespace clang {			namespace clang {

	class CUDAKernelCallExpr;			class CUDAKernelCallExpr;

	namespace CodeGen {			namespace CodeGen {

	class CodeGenFunction;			class CodeGenFunction;
	class CodeGenModule;			class CodeGenModule;
	class FunctionArgList;			class FunctionArgList;
	class ReturnValueSlot;			class ReturnValueSlot;
	class RValue;			class RValue;

	class CGCUDARuntime {			class CGCUDARuntime {
	protected:			protected:
	CodeGenModule &CGM;			CodeGenModule &CGM;

	public:			public:
				elibenUnsubmitted Not Done Reply Inline Actions It would really be great not to have data inside this abstract interface; is this necessary? Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :) eliben: It would really be great not to have data inside this abstract interface; is this necessary?
				traAuthorUnsubmitted Not Done Reply Inline Actions List of generated kernels is something that I expect to be useful for all subclasses of CUDARuntime. That's why I've put EmittedKernels there and a non-virtual methodEmitDeviceStub() to populate it. FatbinHandles, on the other hand, is indeed cudart-specific. I've moved it into CGCUDANV. tra: List of generated kernels is something that I expect to be useful for all subclasses of…
				elibenUnsubmitted Not Done Reply Inline Actions I would still remove EmittedKernels for now; we only have a single CUDA runtime at this time in upstream, so this feels redundant, as it makes the runtime interface / implementation barrier less clean than it should be. In the future if/when new runtime implementations are added, we'll figure out what's the best way to factor common code out is. YAGNI, essentially :) eliben: I would still remove EmittedKernels for now; we only have a single CUDA runtime at this time in…
				traAuthorUnsubmitted Not Done Reply Inline Actions OK. tra: OK.
	CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}			CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
	virtual ~CGCUDARuntime();			virtual ~CGCUDARuntime();

	virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,			virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
	const CUDAKernelCallExpr *E,			const CUDAKernelCallExpr *E,
	ReturnValueSlot ReturnValue);			ReturnValueSlot ReturnValue);

				elibenUnsubmitted Not Done Reply Inline Actions Please document these APIs eliben: Please document these APIs
				traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
	virtual void EmitDeviceStubBody(CodeGenFunction &CGF,			/// Adds CGF.CurFn to EmittedKernels and calls EmitDeviceStubBody() to emit a
	FunctionArgList &Args) = 0;			/// kernel launch stub.
				elibenUnsubmitted Not Done Reply Inline Actions I'd move this to the implementation as well, along with EmittedKernels. Just reading the documentation of this method makes little sense given that it lives in an abstract interface. The code will be easier to untangle if the interface stays completely functionality-free. At this time this won't even add code duplication since we just have a single implementation. eliben: I'd move this to the implementation as well, along with EmittedKernels. Just reading the…
				traAuthorUnsubmitted Not Done Reply Inline Actions Done that already while I was moving EmittedKernels out. tra: Done that already while I was moving EmittedKernels out.
				virtual void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;

				/// Constructs and returns a module initialization function or nullptr if it's
				/// not needed. Must be called after all kernels have been emitted.
				virtual llvm::Function *ModuleCtorFunction() = 0;

				/// Returns a module cleanup function or nullptr if it's not needed.
				/// Must be called after ModuleCtorFunction
				virtual llvm::Function *ModuleDtorFunction() = 0;
	};			};

	/// Creates an instance of a CUDA runtime class.			/// Creates an instance of a CUDA runtime class.
	CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);			CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);

	}			}
	}			}

	#endif			#endif

lib/CodeGen/CodeGenFunction.cpp

Show First 20 Lines • Show All 852 Lines • ▼ Show 20 Lines	void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
PGO.assignRegionCounters(GD.getDecl(), CurFn);		PGO.assignRegionCounters(GD.getDecl(), CurFn);
if (isa<CXXDestructorDecl>(FD))		if (isa<CXXDestructorDecl>(FD))
EmitDestructorBody(Args);		EmitDestructorBody(Args);
else if (isa<CXXConstructorDecl>(FD))		else if (isa<CXXConstructorDecl>(FD))
EmitConstructorBody(Args);		EmitConstructorBody(Args);
else if (getLangOpts().CUDA &&		else if (getLangOpts().CUDA &&
!getLangOpts().CUDAIsDevice &&		!getLangOpts().CUDAIsDevice &&
FD->hasAttr<CUDAGlobalAttr>())		FD->hasAttr<CUDAGlobalAttr>())
CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args);		CGM.getCUDARuntime().EmitDeviceStub(*this, Args);
else if (isa<CXXConversionDecl>(FD) &&		else if (isa<CXXConversionDecl>(FD) &&
cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) {		cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) {
// The lambda conversion to block pointer is special; the semantics can't be		// The lambda conversion to block pointer is special; the semantics can't be
// expressed in the AST, so IRGen needs to special-case it.		// expressed in the AST, so IRGen needs to special-case it.
EmitLambdaToBlockPointerBody(Args);		EmitLambdaToBlockPointerBody(Args);
} else if (isa<CXXMethodDecl>(FD) &&		} else if (isa<CXXMethodDecl>(FD) &&
cast<CXXMethodDecl>(FD)->isLambdaStaticInvoker()) {		cast<CXXMethodDecl>(FD)->isLambdaStaticInvoker()) {
// The lambda static invoker function is special, because it forwards or		// The lambda static invoker function is special, because it forwards or
▲ Show 20 Lines • Show All 866 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	void CodeGenModule::Release() {
applyReplacements();		applyReplacements();
checkAliases();		checkAliases();
EmitCXXGlobalInitFunc();		EmitCXXGlobalInitFunc();
EmitCXXGlobalDtorFunc();		EmitCXXGlobalDtorFunc();
EmitCXXThreadLocalInitFunc();		EmitCXXThreadLocalInitFunc();
if (ObjCRuntime)		if (ObjCRuntime)
if (llvm::Function *ObjCInitFunction = ObjCRuntime->ModuleInitFunction())		if (llvm::Function *ObjCInitFunction = ObjCRuntime->ModuleInitFunction())
AddGlobalCtor(ObjCInitFunction);		AddGlobalCtor(ObjCInitFunction);
		if (Context.getLangOpts().CUDA && !Context.getLangOpts().CUDAIsDevice &&
		CUDARuntime) {
		if (llvm::Function *CudaCtorFunction = CUDARuntime->ModuleCtorFunction())
		AddGlobalCtor(CudaCtorFunction);
		if (llvm::Function *CudaDtorFunction = CUDARuntime->ModuleDtorFunction())
		AddGlobalDtor(CudaDtorFunction);
		}
if (PGOReader && PGOStats.hasDiagnostics())		if (PGOReader && PGOStats.hasDiagnostics())
PGOStats.reportDiagnostics(getDiags(), getCodeGenOpts().MainFileName);		PGOStats.reportDiagnostics(getDiags(), getCodeGenOpts().MainFileName);
EmitCtorList(GlobalCtors, "llvm.global_ctors");		EmitCtorList(GlobalCtors, "llvm.global_ctors");
EmitCtorList(GlobalDtors, "llvm.global_dtors");		EmitCtorList(GlobalDtors, "llvm.global_dtors");
EmitGlobalAnnotations();		EmitGlobalAnnotations();
EmitStaticExternCAliases();		EmitStaticExternCAliases();
EmitDeferredUnusedCoverageMappings();		EmitDeferredUnusedCoverageMappings();
if (CoverageMapping)		if (CoverageMapping)
▲ Show 20 Lines • Show All 3,313 Lines • Show Last 20 Lines

lib/Driver/Action.cpp

Show All 18 Lines	for (iterator it = begin(), ie = end(); it != ie; ++it)
delete *it;		delete *it;
}		}
}		}

const char *Action::getClassName(ActionClass AC) {		const char *Action::getClassName(ActionClass AC) {
switch (AC) {		switch (AC) {
case InputClass: return "input";		case InputClass: return "input";
case BindArchClass: return "bind-arch";		case BindArchClass: return "bind-arch";
		case CudaDeviceClass: return "cuda-device";
		case CudaHostClass: return "cuda-host";
case PreprocessJobClass: return "preprocessor";		case PreprocessJobClass: return "preprocessor";
case PrecompileJobClass: return "precompiler";		case PrecompileJobClass: return "precompiler";
case AnalyzeJobClass: return "analyzer";		case AnalyzeJobClass: return "analyzer";
case MigrateJobClass: return "migrator";		case MigrateJobClass: return "migrator";
case CompileJobClass: return "compiler";		case CompileJobClass: return "compiler";
case BackendJobClass: return "backend";		case BackendJobClass: return "backend";
case AssembleJobClass: return "assembler";		case AssembleJobClass: return "assembler";
case LinkJobClass: return "linker";		case LinkJobClass: return "linker";
Show All 13 Lines
}		}

void BindArchAction::anchor() {}		void BindArchAction::anchor() {}

BindArchAction::BindArchAction(std::unique_ptr<Action> Input,		BindArchAction::BindArchAction(std::unique_ptr<Action> Input,
const char *_ArchName)		const char *_ArchName)
: Action(BindArchClass, std::move(Input)), ArchName(_ArchName) {}		: Action(BindArchClass, std::move(Input)), ArchName(_ArchName) {}

		void CudaDeviceAction::anchor() {}

		CudaDeviceAction::CudaDeviceAction(std::unique_ptr<Action> Input,
		const char *ArchName, bool AtTopLevel)
		: Action(CudaDeviceClass, std::move(Input)), GpuArchName(ArchName),
		AtTopLevel(AtTopLevel) {}

		void CudaHostAction::anchor() {}

		CudaHostAction::CudaHostAction(std::unique_ptr<Action> Input,
		const ActionList &_DeviceActions)
		: Action(CudaHostClass, std::move(Input)), DeviceActions(_DeviceActions) {}

		CudaHostAction::~CudaHostAction() {
		for (iterator it = DeviceActions.begin(), ie = DeviceActions.end(); it != ie;
		++it)
		delete *it;
		}

void JobAction::anchor() {}		void JobAction::anchor() {}

JobAction::JobAction(ActionClass Kind, std::unique_ptr<Action> Input,		JobAction::JobAction(ActionClass Kind, std::unique_ptr<Action> Input,
types::ID Type)		types::ID Type)
: Action(Kind, std::move(Input), Type) {}		: Action(Kind, std::move(Input), Type) {}

JobAction::JobAction(ActionClass Kind, const ActionList &Inputs, types::ID Type)		JobAction::JobAction(ActionClass Kind, const ActionList &Inputs, types::ID Type)
: Action(Kind, Inputs, Type) {		: Action(Kind, Inputs, Type) {
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

lib/Driver/Driver.cpp

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	// -{fsyntax-only,-analyze,emit-ast} only run up to the compiler.
options::OPT__analyze_auto)) \|\|		options::OPT__analyze_auto)) \|\|
(PhaseArg = DAL.getLastArg(options::OPT_emit_ast))) {		(PhaseArg = DAL.getLastArg(options::OPT_emit_ast))) {
FinalPhase = phases::Compile;		FinalPhase = phases::Compile;

// -S only runs up to the backend.		// -S only runs up to the backend.
} else if ((PhaseArg = DAL.getLastArg(options::OPT_S))) {		} else if ((PhaseArg = DAL.getLastArg(options::OPT_S))) {
FinalPhase = phases::Backend;		FinalPhase = phases::Backend;

// -c only runs up to the assembler.		// -c and partial CUDA compilations only runs up to the assembler.
} else if ((PhaseArg = DAL.getLastArg(options::OPT_c))) {		} else if ((PhaseArg = DAL.getLastArg(options::OPT_c)) \|\|
		(PhaseArg = DAL.getLastArg(options::OPT_cuda_device_only)) \|\|
		(PhaseArg = DAL.getLastArg(options::OPT_cuda_host_only))) {
		echristoUnsubmitted Not Done Reply Inline Actions "and partial CUDA compilations only run up" echristo: "and partial CUDA compilations only run up"
		traAuthorUnsubmitted Not Done Reply Inline Actions Fixed. tra: Fixed.
FinalPhase = phases::Assemble;		FinalPhase = phases::Assemble;

// Otherwise do everything.		// Otherwise do everything.
} else		} else
FinalPhase = phases::Link;		FinalPhase = phases::Link;

if (FinalPhaseArg)		if (FinalPhaseArg)
*FinalPhaseArg = PhaseArg;		*FinalPhaseArg = PhaseArg;

return FinalPhase;		return FinalPhase;
▲ Show 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	if (C.getArgs().hasArg(options::OPT_print_multi_os_directory)) {
// nothing because it's not supported yet.		// nothing because it's not supported yet.
return false;		return false;
}		}

return true;		return true;
}		}

static unsigned PrintActions1(const Compilation &C, Action *A,		static unsigned PrintActions1(const Compilation &C, Action *A,
		std::map<Action *, unsigned> &Ids);

		static std::string PrintActionList(const Compilation &C, ActionList &AL,
		std::map<Action *, unsigned> &Ids) {
		std::string str;
		llvm::raw_string_ostream os(str);
		os << "{";
		for (Action::iterator it = AL.begin(), ie = AL.end(); it != ie;) {
		os << PrintActions1(C, *it, Ids);
		++it;
		if (it != ie)
		os << ", ";
		}
		os << "}";
		return str;
		}

		static unsigned PrintActions1(const Compilation &C, Action *A,
std::map<Action*, unsigned> &Ids) {		std::map<Action *, unsigned> &Ids) {
if (Ids.count(A))		if (Ids.count(A))
return Ids[A];		return Ids[A];

std::string str;		std::string str;
llvm::raw_string_ostream os(str);		llvm::raw_string_ostream os(str);

os << Action::getClassName(A->getKind()) << ", ";		os << Action::getClassName(A->getKind()) << ", ";
if (InputAction *IA = dyn_cast<InputAction>(A)) {		if (InputAction *IA = dyn_cast<InputAction>(A)) {
os << "\"" << IA->getInputArg().getValue() << "\"";		os << "\"" << IA->getInputArg().getValue() << "\"";
} else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {		} else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
os << '"' << BIA->getArchName() << '"'		os << '"' << BIA->getArchName() << '"'
<< ", {" << PrintActions1(C, *BIA->begin(), Ids) << "}";		<< ", {" << PrintActions1(C, *BIA->begin(), Ids) << "}";
		} else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
		os << '"' << CDA->getGpuArchName() << '"' << ", {"
		<< PrintActions1(C, *CDA->begin(), Ids) << "}";
		} else if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
		os << "{" << PrintActions1(C, *CHA->begin(), Ids) << "}"
		<< ", gpu binaries " << PrintActionList(C, CHA->getDeviceActions(), Ids);
} else {		} else {
os << "{";		os << PrintActionList(C, A->getInputs(), Ids);
for (Action::iterator it = A->begin(), ie = A->end(); it != ie;) {
os << PrintActions1(C, *it, Ids);
++it;
if (it != ie)
os << ", ";
}
os << "}";
}		}

unsigned Id = Ids.size();		unsigned Id = Ids.size();
Ids[A] = Id;		Ids[A] = Id;
llvm::errs() << Id << ": " << os.str() << ", "		llvm::errs() << Id << ": " << os.str() << ", "
<< types::getTypeName(A->getType()) << "\n";		<< types::getTypeName(A->getType()) << "\n";

return Id;		return Id;
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	void Driver::BuildInputs(const ToolChain &TC, DerivedArgList &Args,
if (CCCIsCPP() && Inputs.empty()) {		if (CCCIsCPP() && Inputs.empty()) {
// If called as standalone preprocessor, stdin is processed		// If called as standalone preprocessor, stdin is processed
// if no other input is present.		// if no other input is present.
Arg *A = MakeInputArg(Args, Opts, "-");		Arg *A = MakeInputArg(Args, Opts, "-");
Inputs.push_back(std::make_pair(types::TY_C, A));		Inputs.push_back(std::make_pair(types::TY_C, A));
}		}
}		}

		// For eash unique --cuda-gpu-arch= argument creates a TY_CUDA_DEVICE input
		// action and then wraps each in CudaDeviceAction paired with appropriate GPU
		// arch name. If we're only building device-side code, each action remains
		// independent. Otherwise we pass device-side actions as inputs to a new
		// CudaHostAction which combines both host and device side actions.
		static std::unique_ptr<Action>
		BuildCudaActions(const Driver &D, const ToolChain &TC, DerivedArgList &Args,
		const Arg *InputArg, const types::ID InputType,
		std::unique_ptr<Action> Current, ActionList &Actions) {

		assert(InputType == types::TY_CUDA &&
		"CUDA Actions only apply to CUDA inputs.");

		SmallVector<const char *, 4> GpuArchList;
		llvm::StringSet<> GpuArchNames;
		for (Arg *A : Args) {
		if (A->getOption().matches(options::OPT_cuda_gpu_arch_EQ)) {
		A->claim();
		if (GpuArchNames.insert(A->getValue()).second)
		GpuArchList.push_back(A->getValue());
		}
		}

		if (GpuArchList.empty())
		GpuArchList.push_back("sm_20");
		echristoUnsubmitted Not Done Reply Inline Actions Some comment on the default here. echristo: Some comment on the default here.
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.

		Driver::InputList CudaDeviceInputs;
		for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i)
		CudaDeviceInputs.push_back(std::make_pair(types::TY_CUDA_DEVICE, InputArg));

		ActionList CudaDeviceActions;
		D.BuildActions(TC, Args, CudaDeviceInputs, CudaDeviceActions);
		assert(GpuArchList.size() == CudaDeviceActions.size() &&
		"Failed to create actions for all devices");

		bool PartialCompilation = false;
		bool DeviceOnlyCompilation = Args.hasArg(options::OPT_cuda_device_only);
		for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i) {
		if (CudaDeviceActions[i]->getKind() != Action::BackendJobClass) {
		PartialCompilation = true;
		break;
		}
		}

		if (PartialCompilation \|\| DeviceOnlyCompilation) {
		// If -o specified we can only work if it's device-only compilation for a
		// single device.
		if (Args.hasArg(options::OPT_o) &&
		(!DeviceOnlyCompilation \|\| GpuArchList.size() > 1)) {
		D.Diag(clang::diag::err_drv_output_argument_with_multiple_files);
		return nullptr;
		}
		for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i)
		Actions.push_back(new CudaDeviceAction(
		std::unique_ptr<Action>(CudaDeviceActions[i]), GpuArchList[i], true));
		if (DeviceOnlyCompilation)
		Current.reset(nullptr);
		return Current;
		} else {
		ActionList CudaDeviceJobActions;
		for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i)
		CudaDeviceJobActions.push_back(
		new CudaDeviceAction(std::unique_ptr<Action>(CudaDeviceActions[i]),
		GpuArchList[i], false));
		return std::unique_ptr<Action>(
		new CudaHostAction(std::move(Current), CudaDeviceJobActions));
		elibenUnsubmitted Not Done Reply Inline Actions you can just return new CudaHostAction... here, no? eliben: you can just return new CudaHostAction... here, no?
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		}
		}

void Driver::BuildActions(const ToolChain &TC, DerivedArgList &Args,		void Driver::BuildActions(const ToolChain &TC, DerivedArgList &Args,
const InputList &Inputs, ActionList &Actions) const {		const InputList &Inputs, ActionList &Actions) const {
llvm::PrettyStackTraceString CrashInfo("Building compilation actions");		llvm::PrettyStackTraceString CrashInfo("Building compilation actions");

if (!SuppressMissingInputWarning && Inputs.empty()) {		if (!SuppressMissingInputWarning && Inputs.empty()) {
Diag(clang::diag::err_drv_no_input_files);		Diag(clang::diag::err_drv_no_input_files);
return;		return;
}		}
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (InitialPhase > FinalPhase) {
<< getPhaseName(InitialPhase)		<< getPhaseName(InitialPhase)
<< !!FinalPhaseArg		<< !!FinalPhaseArg
<< (FinalPhaseArg ? FinalPhaseArg->getOption().getName() : "");		<< (FinalPhaseArg ? FinalPhaseArg->getOption().getName() : "");
continue;		continue;
}		}

// Build the pipeline for this file.		// Build the pipeline for this file.
std::unique_ptr<Action> Current(new InputAction(*InputArg, InputType));		std::unique_ptr<Action> Current(new InputAction(*InputArg, InputType));
for (SmallVectorImpl<phases::ID>::iterator		phases::ID CudaInjectionPhase;
i = PL.begin(), e = PL.end(); i != e; ++i) {		if (isSaveTempsEnabled()) {
		// All phases are done independently, inject GPU blobs during compilation
		// phase as that's where we generate glue code to init them.
		CudaInjectionPhase = phases::Compile;
		} else {
		// Assumes that clang does everything up until linking phase, so we inject
		// cuda device actions at the last step before linking. Otherwise CUDA
		// host action forces preprocessor into a separate invocation.
		if (FinalPhase == phases::Link) {
		for (auto i = PL.begin(), e = PL.end(); i != e; ++i) {
		auto next = i + 1;
		if (next != e && *next == phases::Link)
		CudaInjectionPhase = *i;
		}
		} else
		CudaInjectionPhase = FinalPhase;
		}
		for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e = PL.end();
		i != e; ++i) {
phases::ID Phase = *i;		phases::ID Phase = *i;

// We are done if this step is past what the user requested.		// We are done if this step is past what the user requested.
if (Phase > FinalPhase)		if (Phase > FinalPhase)
break;		break;

// Queue linker inputs.		// Queue linker inputs.
if (Phase == phases::Link) {		if (Phase == phases::Link) {
assert((i + 1) == e && "linking must be final compilation step.");		assert((i + 1) == e && "linking must be final compilation step.");
LinkerInputs.push_back(Current.release());		LinkerInputs.push_back(Current.release());
break;		break;
}		}

// Some types skip the assembler phase (e.g., llvm-bc), but we can't		// Some types skip the assembler phase (e.g., llvm-bc), but we can't
// encode this in the steps because the intermediate type depends on		// encode this in the steps because the intermediate type depends on
// arguments. Just special case here.		// arguments. Just special case here.
if (Phase == phases::Assemble && Current->getType() != types::TY_PP_Asm)		if (Phase == phases::Assemble && Current->getType() != types::TY_PP_Asm)
continue;		continue;

// Otherwise construct the appropriate action.		// Otherwise construct the appropriate action.
Current = ConstructPhaseAction(TC, Args, Phase, std::move(Current));		Current = ConstructPhaseAction(TC, Args, Phase, std::move(Current));

		if (InputType == types::TY_CUDA && Phase == CudaInjectionPhase &&
		!Args.hasArg(options::OPT_cuda_host_only)) {
		Current = BuildCudaActions(*this, TC, Args, InputArg, InputType,
		std::move(Current), Actions);
		if (!Current)
		break;
		}

if (Current->getType() == types::TY_Nothing)		if (Current->getType() == types::TY_Nothing)
break;		break;
}		}

// If we ended with something, add to the output list.		// If we ended with something, add to the output list.
if (Current)		if (Current)
Actions.push_back(Current.release());		Actions.push_back(Current.release());
}		}
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	void Driver::BuildJobs(Compilation &C) const {
// It is an error to provide a -o option if we are making multiple output		// It is an error to provide a -o option if we are making multiple output
// files.		// files.
if (FinalOutput) {		if (FinalOutput) {
unsigned NumOutputs = 0;		unsigned NumOutputs = 0;
for (const Action *A : C.getActions())		for (const Action *A : C.getActions())
if (A->getType() != types::TY_Nothing)		if (A->getType() != types::TY_Nothing)
++NumOutputs;		++NumOutputs;

if (NumOutputs > 1) {		if (NumOutputs > 1) {
		elibenUnsubmitted Not Done Reply Inline Actions Remove eliben: Remove
		traAuthorUnsubmitted Not Done Reply Inline Actions That was not intended to be committed. Will fix shortly. tra: That was not intended to be committed. Will fix shortly.
Diag(clang::diag::err_drv_output_argument_with_multiple_files);		Diag(clang::diag::err_drv_output_argument_with_multiple_files);
FinalOutput = nullptr;		FinalOutput = nullptr;
}		}
}		}

// Collect the list of architectures.		// Collect the list of architectures.
llvm::StringSet<> ArchNames;		llvm::StringSet<> ArchNames;
if (C.getDefaultToolChain().getTriple().isOSBinFormatMachO())		if (C.getDefaultToolChain().getTriple().isOSBinFormatMachO())
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	static const Tool *SelectToolForJob(Compilation &C, bool SaveTemps,
}		}

// A backend job should always be combined with the preceding compile job		// A backend job should always be combined with the preceding compile job
// unless OPT_save_temps is enabled and the compiler is capable of emitting		// unless OPT_save_temps is enabled and the compiler is capable of emitting
// LLVM IR as an intermediate output.		// LLVM IR as an intermediate output.
if (isa<BackendJobAction>(JA)) {		if (isa<BackendJobAction>(JA)) {
// Check if the compiler supports emitting LLVM IR.		// Check if the compiler supports emitting LLVM IR.
assert(Inputs->size() == 1);		assert(Inputs->size() == 1);
JobAction CompileJA = cast<CompileJobAction>(Inputs->begin());		JobAction *CompileJA;
		// Extract real host action, if it's a CudaHostAction.
		if (CudaHostAction CudaHA = dyn_cast<CudaHostAction>(Inputs->begin()))
		CompileJA = cast<CompileJobAction>(*CudaHA->begin());
		else
		CompileJA = cast<CompileJobAction>(*Inputs->begin());

const Tool Compiler = TC->SelectTool(CompileJA);		const Tool Compiler = TC->SelectTool(CompileJA);
if (!Compiler)		if (!Compiler)
return nullptr;		return nullptr;
if (!Compiler->canEmitIR() \|\| !SaveTemps) {		if (!Compiler->canEmitIR() \|\| !SaveTemps) {
Inputs = &(*Inputs)[0]->getInputs();		Inputs = &(*Inputs)[0]->getInputs();
ToolForJob = Compiler;		ToolForJob = Compiler;
}		}
}		}
Show All 11 Lines	if (Inputs->size() == 1 && isa<PreprocessJobAction>(*Inputs->begin()) &&
!SaveTemps &&		!SaveTemps &&
!C.getArgs().hasArg(options::OPT_rewrite_objc) &&		!C.getArgs().hasArg(options::OPT_rewrite_objc) &&
ToolForJob->hasIntegratedCPP())		ToolForJob->hasIntegratedCPP())
Inputs = &(*Inputs)[0]->getInputs();		Inputs = &(*Inputs)[0]->getInputs();

return ToolForJob;		return ToolForJob;
}		}

		static llvm::Triple computeTargetTriple(StringRef DefaultTargetTriple,
		const ArgList &Args,
		StringRef DarwinArchName);
		echristoUnsubmitted Not Done Reply Inline Actions Do you need the declaration up here? Why not just pull the static function up if so? echristo: Do you need the declaration up here? Why not just pull the static function up if so?
		traAuthorUnsubmitted Not Done Reply Inline Actions That would clutter the changes for no good reason. Whenever bunch of code moved from one place to another, it's always a pain figuring out whether things were just copied or copied and changed. Forward declaration is a lesser crime, IMO. tra: That would clutter the changes for no good reason. Whenever bunch of code moved from one place…

void Driver::BuildJobsForAction(Compilation &C,		void Driver::BuildJobsForAction(Compilation &C,
const Action *A,		const Action *A,
const ToolChain *TC,		const ToolChain *TC,
const char *BoundArch,		const char *BoundArch,
bool AtTopLevel,		bool AtTopLevel,
bool MultipleArchs,		bool MultipleArchs,
const char *LinkingOutput,		const char *LinkingOutput,
InputInfo &Result) const {		InputInfo &Result) const {
llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");		llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");

		InputInfoList CudaDeviceInputInfos;
		if (const CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
		InputInfo II;
		// Append outputs of device jobs to the input list.
		for (const Action *DA : CHA->getDeviceActions()) {
		BuildJobsForAction(C, DA, TC, "", AtTopLevel,
		/MultipleArchs/ false, LinkingOutput, II);
		CudaDeviceInputInfos.push_back(II);
		}
		// Override current action with a real host compile action and continue
		// processing it.
		A = *CHA->begin();
		}

if (const InputAction *IA = dyn_cast<InputAction>(A)) {		if (const InputAction *IA = dyn_cast<InputAction>(A)) {
// FIXME: It would be nice to not claim this here; maybe the old scheme of		// FIXME: It would be nice to not claim this here; maybe the old scheme of
// just using Args was better?		// just using Args was better?
const Arg &Input = IA->getInputArg();		const Arg &Input = IA->getInputArg();
Input.claim();		Input.claim();
if (Input.getOption().matches(options::OPT_INPUT)) {		if (Input.getOption().matches(options::OPT_INPUT)) {
const char *Name = Input.getValue();		const char *Name = Input.getValue();
Result = InputInfo(Name, A->getType(), Name);		Result = InputInfo(Name, A->getType(), Name);
} else		} else
Result = InputInfo(&Input, A->getType(), "");		Result = InputInfo(&Input, A->getType(), "");
return;		return;
}		}

if (const BindArchAction *BAA = dyn_cast<BindArchAction>(A)) {		if (const BindArchAction *BAA = dyn_cast<BindArchAction>(A)) {
const ToolChain *TC;		const ToolChain *TC;
const char *ArchName = BAA->getArchName();		const char *ArchName = BAA->getArchName();

if (ArchName)		if (ArchName)
TC = &getToolChain(C.getArgs(), ArchName);		TC = &getToolChain(C.getArgs(), ArchName);
else		else
TC = &C.getDefaultToolChain();		TC = &C.getDefaultToolChain();

BuildJobsForAction(C, *BAA->begin(), TC, BAA->getArchName(),		BuildJobsForAction(C, *BAA->begin(), TC, ArchName, AtTopLevel,
AtTopLevel, MultipleArchs, LinkingOutput, Result);		MultipleArchs, LinkingOutput, Result);
		return;
		}

		if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
		const ToolChain *TC;
		const char *ArchName = CDA->getGpuArchName();
		llvm::Triple HostTriple =
		computeTargetTriple(DefaultTargetTriple, C.getArgs(), "");
		llvm::Triple TargetTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"
		echristoUnsubmitted Not Done Reply Inline Actions Probably would prefer "DeviceTriple" here. echristo: Probably would prefer "DeviceTriple" here.
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.
		: "nvptx-nvidia-cuda");
		TC = &getTargetToolChain(C.getArgs(), TargetTriple);
		BuildJobsForAction(C, *CDA->begin(), TC, ArchName, CDA->isAtTopLevel(),
		/MultipleArchs/ true, LinkingOutput, Result);
return;		return;
}		}

const ActionList *Inputs = &A->getInputs();		const ActionList *Inputs = &A->getInputs();

const JobAction *JA = cast<JobAction>(A);		const JobAction *JA = cast<JobAction>(A);
const Tool *T = SelectToolForJob(C, isSaveTempsEnabled(), TC, JA, Inputs);		const Tool *T = SelectToolForJob(C, isSaveTempsEnabled(), TC, JA, Inputs);
if (!T)		if (!T)
Show All 18 Lines	void Driver::BuildJobsForAction(Compilation &C,
// Always use the first input as the base input.		// Always use the first input as the base input.
const char *BaseInput = InputInfos[0].getBaseInput();		const char *BaseInput = InputInfos[0].getBaseInput();

// ... except dsymutil actions, which use their actual input as the base		// ... except dsymutil actions, which use their actual input as the base
// input.		// input.
if (JA->getType() == types::TY_dSYM)		if (JA->getType() == types::TY_dSYM)
BaseInput = InputInfos[0].getFilename();		BaseInput = InputInfos[0].getFilename();

		// Append outputs of cuda device jobs to the input list
		if (CudaDeviceInputInfos.size())
		InputInfos.append(CudaDeviceInputInfos.begin(), CudaDeviceInputInfos.end());

// Determine the place to write output to, if any.		// Determine the place to write output to, if any.
if (JA->getType() == types::TY_Nothing)		if (JA->getType() == types::TY_Nothing)
Result = InputInfo(A->getType(), BaseInput);		Result = InputInfo(A->getType(), BaseInput);
else		else
Result = InputInfo(GetNamedOutputPath(C, *JA, BaseInput, BoundArch,		Result = InputInfo(GetNamedOutputPath(C, *JA, BaseInput, BoundArch,
AtTopLevel, MultipleArchs),		AtTopLevel, MultipleArchs),
A->getType(), BaseInput);		A->getType(), BaseInput);

▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	if (Arg *A = Args.getLastArg(options::OPT_m64, options::OPT_mx32,

if (AT != llvm::Triple::UnknownArch && AT != Target.getArch())		if (AT != llvm::Triple::UnknownArch && AT != Target.getArch())
Target.setArch(AT);		Target.setArch(AT);
}		}

return Target;		return Target;
}		}

const ToolChain &Driver::getToolChain(const ArgList &Args,		const ToolChain &Driver::getTargetToolChain(const ArgList &Args,
StringRef DarwinArchName) const {		llvm::Triple &Target) const {
llvm::Triple Target = computeTargetTriple(DefaultTargetTriple, Args,
DarwinArchName);

ToolChain *&TC = ToolChains[Target.str()];		ToolChain *&TC = ToolChains[Target.str()];
if (!TC) {		if (!TC) {
switch (Target.getOS()) {		switch (Target.getOS()) {
case llvm::Triple::CloudABI:		case llvm::Triple::CloudABI:
TC = new toolchains::CloudABI(*this, Target, Args);		TC = new toolchains::CloudABI(*this, Target, Args);
break;		break;
case llvm::Triple::Darwin:		case llvm::Triple::Darwin:
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	case llvm::Triple::Win32:
TC = new toolchains::CrossWindowsToolChain(*this, Target, Args);		TC = new toolchains::CrossWindowsToolChain(*this, Target, Args);
break;		break;
case llvm::Triple::MSVC:		case llvm::Triple::MSVC:
case llvm::Triple::UnknownEnvironment:		case llvm::Triple::UnknownEnvironment:
TC = new toolchains::MSVCToolChain(*this, Target, Args);		TC = new toolchains::MSVCToolChain(*this, Target, Args);
break;		break;
}		}
break;		break;
		case llvm::Triple::CUDA:
		TC = new toolchains::Cuda(*this, Target, Args);
		break;
default:		default:
// TCE is an OSless target		// TCE is an OSless target
if (Target.getArchName() == "tce") {		if (Target.getArchName() == "tce") {
TC = new toolchains::TCEToolChain(*this, Target, Args);		TC = new toolchains::TCEToolChain(*this, Target, Args);
break;		break;
}		}
// If Hexagon is configured as an OSless target		// If Hexagon is configured as an OSless target
if (Target.getArch() == llvm::Triple::hexagon) {		if (Target.getArch() == llvm::Triple::hexagon) {
Show All 14 Lines	default:
}		}
TC = new toolchains::Generic_GCC(*this, Target, Args);		TC = new toolchains::Generic_GCC(*this, Target, Args);
break;		break;
}		}
}		}
return *TC;		return *TC;
}		}

		const ToolChain &Driver::getToolChain(const ArgList &Args,
		StringRef DarwinArchName) const {
		llvm::Triple Target =
		computeTargetTriple(DefaultTargetTriple, Args, DarwinArchName);
		return getTargetToolChain(Args, Target);
		}

bool Driver::ShouldUseClangCompiler(const JobAction &JA) const {		bool Driver::ShouldUseClangCompiler(const JobAction &JA) const {
// Check if user requested no clang, or clang doesn't understand this type (we		// Check if user requested no clang, or clang doesn't understand this type (we
// only handle single inputs for now).		// only handle single inputs for now).
if (JA.size() != 1 \|\|		if (JA.size() != 1 \|\|
!types::isAcceptedByClang((*JA.begin())->getType()))		!types::isAcceptedByClang((*JA.begin())->getType()))
return false;		return false;

// Otherwise make sure this is an action clang understands.		// Otherwise make sure this is an action clang understands.
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	Tool *ToolChain::getTool(Action::ActionClass AC) const {
case Action::AssembleJobClass:		case Action::AssembleJobClass:
return getAssemble();		return getAssemble();

case Action::LinkJobClass:		case Action::LinkJobClass:
return getLink();		return getLink();

case Action::InputClass:		case Action::InputClass:
case Action::BindArchClass:		case Action::BindArchClass:
		case Action::CudaDeviceClass:
		case Action::CudaHostClass:
case Action::LipoJobClass:		case Action::LipoJobClass:
case Action::DsymutilJobClass:		case Action::DsymutilJobClass:
case Action::VerifyDebugInfoJobClass:		case Action::VerifyDebugInfoJobClass:
llvm_unreachable("Invalid tool kind.");		llvm_unreachable("Invalid tool kind.");

case Action::CompileJobClass:		case Action::CompileJobClass:
case Action::PrecompileJobClass:		case Action::PrecompileJobClass:
case Action::PreprocessJobClass:		case Action::PreprocessJobClass:
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

lib/Driver/ToolChains.h

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	static bool addLibStdCXXIncludePaths(Twine Base, Twine Suffix,
StringRef TargetMultiarchTriple,		StringRef TargetMultiarchTriple,
Twine IncludeSuffix,		Twine IncludeSuffix,
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args);		llvm::opt::ArgStringList &CC1Args);

std::string computeSysRoot() const;		std::string computeSysRoot() const;
};		};

		class LLVM_LIBRARY_VISIBILITY Cuda : public Linux {
		public:
		Cuda(const Driver &D, const llvm::Triple &Triple,
		const llvm::opt::ArgList &Args);

		llvm::opt::DerivedArgList *
		TranslateArgs(const llvm::opt::DerivedArgList &Args,
		const char *BoundArch) const override;
		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const override;
		};

class LLVM_LIBRARY_VISIBILITY Hexagon_TC : public Linux {		class LLVM_LIBRARY_VISIBILITY Hexagon_TC : public Linux {
protected:		protected:
GCCVersion GCCLibAndIncVersion;		GCCVersion GCCLibAndIncVersion;
Tool *buildAssembler() const override;		Tool *buildAssembler() const override;
Tool *buildLinker() const override;		Tool *buildLinker() const override;

public:		public:
Hexagon_TC(const Driver &D, const llvm::Triple &Triple,		Hexagon_TC(const Driver &D, const llvm::Triple &Triple,
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

lib/Driver/ToolChains.cpp

	Show First 20 Lines • Show All 3,594 Lines • ▼ Show 20 Lines
	Tool *DragonFly::buildAssembler() const {			Tool *DragonFly::buildAssembler() const {
	return new tools::dragonfly::Assemble(*this);			return new tools::dragonfly::Assemble(*this);
	}			}

	Tool *DragonFly::buildLinker() const {			Tool *DragonFly::buildLinker() const {
	return new tools::dragonfly::Link(*this);			return new tools::dragonfly::Link(*this);
	}			}

				/// Stub for CUDA toolchain. At the moment we don't have assembler or
				/// linker and need toolchain mainly to propagate device-side options
				/// to CC1.

				Cuda::Cuda(const Driver &D, const llvm::Triple &Triple, const ArgList &Args)
				: Linux(D, Triple, Args) {}

				void Cuda::addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
				llvm::opt::ArgStringList &CC1Args) const {
				Linux::addClangTargetOptions(DriverArgs, CC1Args);
				CC1Args.push_back("-fcuda-is-device");
				}

				llvm::opt::DerivedArgList *
				Cuda::TranslateArgs(const llvm::opt::DerivedArgList &Args,
				const char *BoundArch) const {
				DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
				const OptTable &Opts = getDriver().getOpts();

				for (Arg *A : Args) {
				if (A->getOption().matches(options::OPT_Xarch__)) {
				// Skip this argument unless the architecture matches BoundArch
				if (A->getValue(0) != StringRef(BoundArch))
				continue;

				unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
				unsigned Prev = Index;
				std::unique_ptr<Arg> XarchArg(Opts.ParseOneArg(Args, Index));

				// If the argument parsing failed or more than one argument was
				// consumed, the -Xarch_ argument's parameter tried to consume
				// extra arguments. Emit an error and ignore.
				//
				// We also want to disallow any options which would alter the
				// driver behavior; that isn't going to work in our model. We
				// use isDriverOption() as an approximation, although things
				// like -O4 are going to slip through.
				if (!XarchArg \|\| Index > Prev + 1) {
				getDriver().Diag(diag::err_drv_invalid_Xarch_argument_with_args)
				<< A->getAsString(Args);
				continue;
				} else if (XarchArg->getOption().hasFlag(options::DriverOption)) {
				getDriver().Diag(diag::err_drv_invalid_Xarch_argument_isdriver)
				<< A->getAsString(Args);
				continue;
				}
				XarchArg->setBaseArg(A);
				A = XarchArg.release();
				DAL->AddSynthesizedArg(A);
				}
				DAL->append(A);
				}

				DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
				return DAL;
				}

	/// XCore tool chain			/// XCore tool chain
	XCore::XCore(const Driver &D, const llvm::Triple &Triple,			XCore::XCore(const Driver &D, const llvm::Triple &Triple,
	const ArgList &Args) : ToolChain(D, Triple, Args) {			const ArgList &Args) : ToolChain(D, Triple, Args) {
	// ProgramPaths are found via 'PATH' environment variable.			// ProgramPaths are found via 'PATH' environment variable.
	}			}

	Tool *XCore::buildAssembler() const {			Tool *XCore::buildAssembler() const {
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

lib/Driver/Tools.h

	Show All 35 Lines

	using llvm::opt::ArgStringList;			using llvm::opt::ArgStringList;

	/// \brief Clang compiler tool.			/// \brief Clang compiler tool.
	class LLVM_LIBRARY_VISIBILITY Clang : public Tool {			class LLVM_LIBRARY_VISIBILITY Clang : public Tool {
	public:			public:
	static const char *getBaseInputName(const llvm::opt::ArgList &Args,			static const char *getBaseInputName(const llvm::opt::ArgList &Args,
	const InputInfoList &Inputs);			const InputInfoList &Inputs);
				static const char *getBaseInputName(const llvm::opt::ArgList &Args,
				const InputInfo &Input);
	static const char *getBaseInputStem(const llvm::opt::ArgList &Args,			static const char *getBaseInputStem(const llvm::opt::ArgList &Args,
	const InputInfoList &Inputs);			const InputInfoList &Inputs);
	static const char *getDependencyFileName(const llvm::opt::ArgList &Args,			static const char *getDependencyFileName(const llvm::opt::ArgList &Args,
	const InputInfoList &Inputs);			const InputInfoList &Inputs);

	private:			private:
	void AddPreprocessingOptions(Compilation &C, const JobAction &JA,			void AddPreprocessingOptions(Compilation &C, const JobAction &JA,
	const Driver &D,			const Driver &D,
	▲ Show 20 Lines • Show All 675 Lines • Show Last 20 Lines

lib/Driver/Tools.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,511 Lines • ▼ Show 20 Lines	static std::string getCPUName(const ArgList &Args, const llvm::Triple &T) {
case llvm::Triple::mips64:		case llvm::Triple::mips64:
case llvm::Triple::mips64el: {		case llvm::Triple::mips64el: {
StringRef CPUName;		StringRef CPUName;
StringRef ABIName;		StringRef ABIName;
mips::getMipsCPUAndABI(Args, T, CPUName, ABIName);		mips::getMipsCPUAndABI(Args, T, CPUName, ABIName);
return CPUName;		return CPUName;
}		}

		case llvm::Triple::nvptx:
		case llvm::Triple::nvptx64:
		if (const Arg *A = Args.getLastArg(options::OPT_march_EQ))
		return A->getValue();
		return "";

case llvm::Triple::ppc:		case llvm::Triple::ppc:
case llvm::Triple::ppc64:		case llvm::Triple::ppc64:
case llvm::Triple::ppc64le: {		case llvm::Triple::ppc64le: {
std::string TargetCPUName = getPPCTargetCPU(Args);		std::string TargetCPUName = getPPCTargetCPU(Args);
// LLVM may default to generating code for the native CPU,		// LLVM may default to generating code for the native CPU,
// but, like gcc, we default to a more generic option for		// but, like gcc, we default to a more generic option for
// each architecture. (except on Darwin)		// each architecture. (except on Darwin)
if (TargetCPUName.empty() && !T.isOSDarwin()) {		if (TargetCPUName.empty() && !T.isOSDarwin()) {
▲ Show 20 Lines • Show All 1,040 Lines • ▼ Show 20 Lines	void Clang::ConstructJob(Compilation &C, const JobAction &JA,
const Driver &D = getToolChain().getDriver();		const Driver &D = getToolChain().getDriver();
ArgStringList CmdArgs;		ArgStringList CmdArgs;

bool IsWindowsGNU = getToolChain().getTriple().isWindowsGNUEnvironment();		bool IsWindowsGNU = getToolChain().getTriple().isWindowsGNUEnvironment();
bool IsWindowsCygnus =		bool IsWindowsCygnus =
getToolChain().getTriple().isWindowsCygwinEnvironment();		getToolChain().getTriple().isWindowsCygwinEnvironment();
bool IsWindowsMSVC = getToolChain().getTriple().isWindowsMSVCEnvironment();		bool IsWindowsMSVC = getToolChain().getTriple().isWindowsMSVCEnvironment();

assert(Inputs.size() == 1 && "Unable to handle multiple inputs.");		assert(Inputs.size() >= 1 && "Must have at least one input.");
		InputInfoList BaseInputs; // Inputs[0]
		const InputInfo &Input = Inputs[0];
		BaseInputs.push_back(Input);
		bool IsCuda = types::isCuda(Input.getType());

		assert((IsCuda \|\| Inputs.size() == 1) && "Unable to handle multiple inputs.");
		echristoUnsubmitted Not Done Reply Inline Actions Comment about what's going on here. echristo: Comment about what's going on here.
		traAuthorUnsubmitted Not Done Reply Inline Actions Done. tra: Done.

// Invoke ourselves in -cc1 mode.		// Invoke ourselves in -cc1 mode.
		elibenUnsubmitted Not Done Reply Inline Actions Can you explain a bit more why/what this means in the comment? eliben: Can you explain a bit more why/what this means in the comment?
		traAuthorUnsubmitted Not Done Reply Inline Actions General assumption that compilation deals with a single source file. When we're compiling CUDA, driver may generate additional build passes and we may end up with an action that has more than one action input. The check makes sure that all those inputs were results of compilation of the same source file. Hmm. That's another case where I need info about source file type. Let me see if I can add a function to dig that out from the action chain and then this loop will not be necessary as I can explicitly check whether we're compiling a CUDA file. tra: General assumption that compilation deals with a single source file. When we're compiling CUDA…
//		//
// FIXME: Implement custom jobs for internal actions.		// FIXME: Implement custom jobs for internal actions.
CmdArgs.push_back("-cc1");		CmdArgs.push_back("-cc1");

// Add the "effective" target triple.		// Add the "effective" target triple.
CmdArgs.push_back("-triple");		CmdArgs.push_back("-triple");
std::string TripleStr = getToolChain().ComputeEffectiveClangTriple(Args);		std::string TripleStr = getToolChain().ComputeEffectiveClangTriple(Args);
CmdArgs.push_back(Args.MakeArgString(TripleStr));		CmdArgs.push_back(Args.MakeArgString(TripleStr));
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// Disable the verification pass in -asserts builds.		// Disable the verification pass in -asserts builds.
#ifdef NDEBUG		#ifdef NDEBUG
CmdArgs.push_back("-disable-llvm-verifier");		CmdArgs.push_back("-disable-llvm-verifier");
#endif		#endif

// Set the main file name, so that debug info works even with		// Set the main file name, so that debug info works even with
// -save-temps.		// -save-temps.
CmdArgs.push_back("-main-file-name");		CmdArgs.push_back("-main-file-name");
CmdArgs.push_back(getBaseInputName(Args, Inputs));		CmdArgs.push_back(getBaseInputName(Args, Input));
		echristoUnsubmitted Not Done Reply Inline Actions Might be nice to pull this sort of change out so it isn't affecting the rest of the diff. echristo: Might be nice to pull this sort of change out so it isn't affecting the rest of the diff.
		traAuthorUnsubmitted Not Done Reply Inline Actions Sure. I was also thinking of splitting code generation into a separate commit as well as it's largely independent of the driver changes. tra: Sure. I was also thinking of splitting code generation into a separate commit as well as it's…

// Some flags which affect the language (via preprocessor		// Some flags which affect the language (via preprocessor
// defines).		// defines).
if (Args.hasArg(options::OPT_static))		if (Args.hasArg(options::OPT_static))
CmdArgs.push_back("-static-define");		CmdArgs.push_back("-static-define");

if (isa<AnalyzeJobAction>(JA)) {		if (isa<AnalyzeJobAction>(JA)) {
// Enable region store model by default.		// Enable region store model by default.
Show All 11 Lines	if (!Args.hasArg(options::OPT__analyzer_no_default_checks)) {
if (!IsWindowsMSVC)		if (!IsWindowsMSVC)
CmdArgs.push_back("-analyzer-checker=unix");		CmdArgs.push_back("-analyzer-checker=unix");

if (getToolChain().getTriple().getVendor() == llvm::Triple::Apple)		if (getToolChain().getTriple().getVendor() == llvm::Triple::Apple)
CmdArgs.push_back("-analyzer-checker=osx");		CmdArgs.push_back("-analyzer-checker=osx");

CmdArgs.push_back("-analyzer-checker=deadcode");		CmdArgs.push_back("-analyzer-checker=deadcode");

if (types::isCXX(Inputs[0].getType()))		if (types::isCXX(Input.getType()))
CmdArgs.push_back("-analyzer-checker=cplusplus");		CmdArgs.push_back("-analyzer-checker=cplusplus");

// Enable the following experimental checkers for testing.		// Enable the following experimental checkers for testing.
CmdArgs.push_back(		CmdArgs.push_back(
"-analyzer-checker=security.insecureAPI.UncheckedReturn");		"-analyzer-checker=security.insecureAPI.UncheckedReturn");
CmdArgs.push_back("-analyzer-checker=security.insecureAPI.getpw");		CmdArgs.push_back("-analyzer-checker=security.insecureAPI.getpw");
CmdArgs.push_back("-analyzer-checker=security.insecureAPI.gets");		CmdArgs.push_back("-analyzer-checker=security.insecureAPI.gets");
CmdArgs.push_back("-analyzer-checker=security.insecureAPI.mktemp");		CmdArgs.push_back("-analyzer-checker=security.insecureAPI.mktemp");
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	if (Arg *A = Args.getLastArg(options::OPT_mlinker_version_EQ)) {
CmdArgs.push_back(A->getValue());		CmdArgs.push_back(A->getValue());
}		}

if (!shouldUseLeafFramePointer(Args, getToolChain().getTriple()))		if (!shouldUseLeafFramePointer(Args, getToolChain().getTriple()))
CmdArgs.push_back("-momit-leaf-frame-pointer");		CmdArgs.push_back("-momit-leaf-frame-pointer");

// Explicitly error on some things we know we don't support and can't just		// Explicitly error on some things we know we don't support and can't just
// ignore.		// ignore.
types::ID InputType = Inputs[0].getType();		types::ID InputType = Input.getType();
if (!Args.hasArg(options::OPT_fallow_unsupported)) {		if (!Args.hasArg(options::OPT_fallow_unsupported)) {
Arg *Unsupported;		Arg *Unsupported;
if (types::isCXX(InputType) &&		if (types::isCXX(InputType) &&
getToolChain().getTriple().isOSDarwin() &&		getToolChain().getTriple().isOSDarwin() &&
getToolChain().getArch() == llvm::Triple::x86) {		getToolChain().getArch() == llvm::Triple::x86) {
if ((Unsupported = Args.getLastArg(options::OPT_fapple_kext)) \|\|		if ((Unsupported = Args.getLastArg(options::OPT_fapple_kext)) \|\|
(Unsupported = Args.getLastArg(options::OPT_mkernel)))		(Unsupported = Args.getLastArg(options::OPT_mkernel)))
D.Diag(diag::err_drv_clang_unsupported_opt_cxx_darwin_i386)		D.Diag(diag::err_drv_clang_unsupported_opt_cxx_darwin_i386)
▲ Show 20 Lines • Show All 1,355 Lines • ▼ Show 20 Lines	if (Output.getType() == types::TY_Dependencies) {
// Handled with other dependency code.		// Handled with other dependency code.
} else if (Output.isFilename()) {		} else if (Output.isFilename()) {
CmdArgs.push_back("-o");		CmdArgs.push_back("-o");
CmdArgs.push_back(Output.getFilename());		CmdArgs.push_back(Output.getFilename());
} else {		} else {
assert(Output.isNothing() && "Invalid output.");		assert(Output.isNothing() && "Invalid output.");
}		}

for (const auto &II : Inputs) {		for (const auto &II : BaseInputs) {
addDashXForInput(Args, II, CmdArgs);		addDashXForInput(Args, II, CmdArgs);

if (II.isFilename())		if (II.isFilename())
CmdArgs.push_back(II.getFilename());		CmdArgs.push_back(II.getFilename());
else		else
II.getInputArg().renderAsInput(Args, CmdArgs);		II.getInputArg().renderAsInput(Args, CmdArgs);
}		}

Show All 24 Lines	#endif
// can propagate it to the backend.		// can propagate it to the backend.
bool SplitDwarf = Args.hasArg(options::OPT_gsplit_dwarf) &&		bool SplitDwarf = Args.hasArg(options::OPT_gsplit_dwarf) &&
getToolChain().getTriple().isOSLinux() &&		getToolChain().getTriple().isOSLinux() &&
(isa<AssembleJobAction>(JA) \|\| isa<CompileJobAction>(JA) \|\|		(isa<AssembleJobAction>(JA) \|\| isa<CompileJobAction>(JA) \|\|
isa<BackendJobAction>(JA));		isa<BackendJobAction>(JA));
const char *SplitDwarfOut;		const char *SplitDwarfOut;
if (SplitDwarf) {		if (SplitDwarf) {
CmdArgs.push_back("-split-dwarf-file");		CmdArgs.push_back("-split-dwarf-file");
SplitDwarfOut = SplitDebugName(Args, Inputs);		SplitDwarfOut = SplitDebugName(Args, BaseInputs);
CmdArgs.push_back(SplitDwarfOut);		CmdArgs.push_back(SplitDwarfOut);
}		}

		// Host-side cuda compilation receives device-side outputs as Inputs[1...].
		// Include them with -fcuda-include-gpubinary.
		if (IsCuda && Inputs.size() > 1)
		for (InputInfoList::const_iterator it = std::next(Inputs.begin()),
		ie = Inputs.end();
		it != ie; ++it) {
		CmdArgs.push_back("-fcuda-include-gpubinary");
		CmdArgs.push_back(it->getFilename());
		}

// Finally add the compile command to the compilation.		// Finally add the compile command to the compilation.
if (Args.hasArg(options::OPT__SLASH_fallback) &&		if (Args.hasArg(options::OPT__SLASH_fallback) &&
Output.getType() == types::TY_Object &&		Output.getType() == types::TY_Object &&
(InputType == types::TY_C \|\| InputType == types::TY_CXX)) {		(InputType == types::TY_C \|\| InputType == types::TY_CXX)) {
auto CLCommand =		auto CLCommand = getCLFallback()->GetCommand(C, JA, Output, BaseInputs,
getCLFallback()->GetCommand(C, JA, Output, Inputs, Args, LinkingOutput);		Args, LinkingOutput);
C.addCommand(llvm::make_unique<FallbackCommand>(JA, *this, Exec, CmdArgs,		C.addCommand(llvm::make_unique<FallbackCommand>(JA, *this, Exec, CmdArgs,
std::move(CLCommand)));		std::move(CLCommand)));
} else {		} else {
C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs));		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs));
}		}


// Handle the debug info splitting at object creation time if we're		// Handle the debug info splitting at object creation time if we're
▲ Show 20 Lines • Show All 1,040 Lines • ▼ Show 20 Lines	void darwin::setTripleTypeForMachOArchName(llvm::Triple &T, StringRef Str) {
if (Str == "x86_64h")		if (Str == "x86_64h")
T.setArchName(Str);		T.setArchName(Str);
else if (Str == "armv6m" \|\| Str == "armv7m" \|\| Str == "armv7em") {		else if (Str == "armv6m" \|\| Str == "armv7m" \|\| Str == "armv7em") {
T.setOS(llvm::Triple::UnknownOS);		T.setOS(llvm::Triple::UnknownOS);
T.setObjectFormat(llvm::Triple::MachO);		T.setObjectFormat(llvm::Triple::MachO);
}		}
}		}

const char *Clang::getBaseInputName(const ArgList &Args,		const char *Clang::getBaseInputName(const ArgList &Args,
		const InputInfo &Input) {
		return Args.MakeArgString(llvm::sys::path::filename(Input.getBaseInput()));
		}

		const char *Clang::getBaseInputName(const ArgList &Args,
const InputInfoList &Inputs) {		const InputInfoList &Inputs) {
		echristoUnsubmitted Not Done Reply Inline Actions Please pull this out into a separate patch. echristo: Please pull this out into a separate patch.
return Args.MakeArgString(		return getBaseInputName(Args, Inputs[0]);
llvm::sys::path::filename(Inputs[0].getBaseInput()));
}		}

const char *Clang::getBaseInputStem(const ArgList &Args,		const char *Clang::getBaseInputStem(const ArgList &Args,
const InputInfoList &Inputs) {		const InputInfoList &Inputs) {
const char *Str = getBaseInputName(Args, Inputs);		const char *Str = getBaseInputName(Args, Inputs);

if (const char *End = strrchr(Str, '.'))		if (const char *End = strrchr(Str, '.'))
return Args.MakeArgString(std::string(Str, End));		return Args.MakeArgString(std::string(Str, End));
▲ Show 20 Lines • Show All 3,118 Lines • Show Last 20 Lines

lib/Driver/Types.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	bool types::isAcceptedByClang(ID Id) {
switch (Id) {		switch (Id) {
default:		default:
return false;		return false;

case TY_Asm:		case TY_Asm:
case TY_C: case TY_PP_C:		case TY_C: case TY_PP_C:
case TY_CL:		case TY_CL:
case TY_CUDA: case TY_PP_CUDA:		case TY_CUDA: case TY_PP_CUDA:
		case TY_CUDA_DEVICE:
case TY_ObjC: case TY_PP_ObjC: case TY_PP_ObjC_Alias:		case TY_ObjC: case TY_PP_ObjC: case TY_PP_ObjC_Alias:
case TY_CXX: case TY_PP_CXX:		case TY_CXX: case TY_PP_CXX:
case TY_ObjCXX: case TY_PP_ObjCXX: case TY_PP_ObjCXX_Alias:		case TY_ObjCXX: case TY_PP_ObjCXX: case TY_PP_ObjCXX_Alias:
case TY_CHeader: case TY_PP_CHeader:		case TY_CHeader: case TY_PP_CHeader:
case TY_CLHeader:		case TY_CLHeader:
case TY_ObjCHeader: case TY_PP_ObjCHeader:		case TY_ObjCHeader: case TY_PP_ObjCHeader:
case TY_CXXHeader: case TY_PP_CXXHeader:		case TY_CXXHeader: case TY_PP_CXXHeader:
case TY_ObjCXXHeader: case TY_PP_ObjCXXHeader:		case TY_ObjCXXHeader: case TY_PP_ObjCXXHeader:
Show All 20 Lines	bool types::isCXX(ID Id) {
switch (Id) {		switch (Id) {
default:		default:
return false;		return false;

case TY_CXX: case TY_PP_CXX:		case TY_CXX: case TY_PP_CXX:
case TY_ObjCXX: case TY_PP_ObjCXX: case TY_PP_ObjCXX_Alias:		case TY_ObjCXX: case TY_PP_ObjCXX: case TY_PP_ObjCXX_Alias:
case TY_CXXHeader: case TY_PP_CXXHeader:		case TY_CXXHeader: case TY_PP_CXXHeader:
case TY_ObjCXXHeader: case TY_PP_ObjCXXHeader:		case TY_ObjCXXHeader: case TY_PP_ObjCXXHeader:
case TY_CUDA: case TY_PP_CUDA:		case TY_CUDA: case TY_PP_CUDA: case TY_CUDA_DEVICE:
		return true;
		}
		}

		bool types::isCuda(ID Id) {
		switch (Id) {
		default:
		return false;

		case TY_CUDA:
		case TY_PP_CUDA:
		case TY_CUDA_DEVICE:
return true;		return true;
}		}
}		}

types::ID types::lookupTypeForExtension(const char *Ext) {		types::ID types::lookupTypeForExtension(const char *Ext) {
return llvm::StringSwitch<types::ID>(Ext)		return llvm::StringSwitch<types::ID>(Ext)
.Case("c", TY_C)		.Case("c", TY_C)
.Case("i", TY_PP_C)		.Case("i", TY_PP_C)
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	if (Id != TY_Object) {

if (onlyPrecompileType(Id)) {		if (onlyPrecompileType(Id)) {
P.push_back(phases::Precompile);		P.push_back(phases::Precompile);
} else {		} else {
if (!onlyAssembleType(Id)) {		if (!onlyAssembleType(Id)) {
P.push_back(phases::Compile);		P.push_back(phases::Compile);
P.push_back(phases::Backend);		P.push_back(phases::Backend);
}		}
		if (Id != TY_CUDA_DEVICE)
P.push_back(phases::Assemble);		P.push_back(phases::Assemble);
}		}
}		}
if (!onlyPrecompileType(Id)) {
		if (!onlyPrecompileType(Id) && Id != TY_CUDA_DEVICE) {
P.push_back(phases::Link);		P.push_back(phases::Link);
}		}
assert(0 < P.size() && "Not enough phases in list");		assert(0 < P.size() && "Not enough phases in list");
assert(P.size() <= phases::MaxNumberOfPhases && "Too many phases in list");		assert(P.size() <= phases::MaxNumberOfPhases && "Too many phases in list");
return;		return;
}		}

ID types::lookupCXXTypeForCType(ID Id) {		ID types::lookupCXXTypeForCType(ID Id) {
Show All 14 Lines

lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 633 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.RewriteMapFiles = Args.getAllArgValues(OPT_frewrite_map_file);		Opts.RewriteMapFiles = Args.getAllArgValues(OPT_frewrite_map_file);

// Parse -fsanitize-recover= arguments.		// Parse -fsanitize-recover= arguments.
// FIXME: Report unrecoverable sanitizers incorrectly specified here.		// FIXME: Report unrecoverable sanitizers incorrectly specified here.
parseSanitizerKinds("-fsanitize-recover=",		parseSanitizerKinds("-fsanitize-recover=",
Args.getAllArgValues(OPT_fsanitize_recover_EQ), Diags,		Args.getAllArgValues(OPT_fsanitize_recover_EQ), Diags,
Opts.SanitizeRecover);		Opts.SanitizeRecover);

		Opts.CudaGpuBinaryFileNames =
		Args.getAllArgValues(OPT_fcuda_include_gpubinary);

return Success;		return Success;
}		}

static void ParseDependencyOutputArgs(DependencyOutputOptions &Opts,		static void ParseDependencyOutputArgs(DependencyOutputOptions &Opts,
ArgList &Args) {		ArgList &Args) {
using namespace options;		using namespace options;
Opts.OutputFile = Args.getLastArgValue(OPT_dependency_file);		Opts.OutputFile = Args.getLastArgValue(OPT_dependency_file);
Opts.Targets = Args.getAllArgValues(OPT_MT);		Opts.Targets = Args.getAllArgValues(OPT_MT);
▲ Show 20 Lines • Show All 1,445 Lines • Show Last 20 Lines

test/Driver/cuda-options.cu

This file was added.

				// Tests CUDA compilation pipeline construction in Driver.

				// Simple compilation case:
				// RUN: %clang -### -nocudainc -c %s 2>&1 \
				// Compile device-side to PTX assembly and make sure we use it on the host side.
				// RUN: \| FileCheck -check-prefix CUDA-D1 \
				// Then compile host side and incorporate device code.
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-I1 \
				// Make sure we don't link anything.
				// RUN: -check-prefix CUDA-NL %s

				// Typical compilation + link case:
				// RUN: %clang -### -nocudainc %s 2>&1 \
				// Compile device-side to PTX assembly and make sure we use it on the host side
				// RUN: \| FileCheck -check-prefix CUDA-D1 \
				// Then compile host side and incorporate device code.
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-I1 \
				// Then link things.
				// RUN: -check-prefix CUDA-L %s

				// Verify that -cuda-no-device disables device-side compilation and linking
				// RUN: %clang -### -nocudainc --cuda-host-only %s 2>&1 \
				// Make sure we didn't run device-side compilation.
				// RUN: \| FileCheck -check-prefix CUDA-ND \
				// Then compile host side and make sure we don't attempt to incorporate GPU code.
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-NI \
				// Make sure we don't link anything.
				// RUN: -check-prefix CUDA-NL %s

				// Verify that -cuda-no-host disables host-side compilation and linking
				// RUN: %clang -### -nocudainc --cuda-device-only %s 2>&1 \
				// Compile device-side to PTX assembly
				// RUN: \| FileCheck -check-prefix CUDA-D1 \
				// Make sure there are no host cmpilation or linking.
				// RUN: -check-prefix CUDA-NH -check-prefix CUDA-NL %s

				// Verify that with -S we compile host and device sides to assembly
				// and incorporate device code on the host side.
				// RUN: %clang -### -nocudainc -S -c %s 2>&1 \
				// Compile device-side to PTX assembly
				// RUN: \| FileCheck -check-prefix CUDA-D1 \
				// Then compile host side and incorporate GPU code.
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-I1 \
				// Make sure we don't link anything.
				// RUN: -check-prefix CUDA-NL %s

				// Verify that --cuda-gpu-arch option passes correct GPU
				// archtecture info to device compilation.
				// RUN: %clang -### -nocudainc --cuda-gpu-arch=sm_35 -c %s 2>&1 \
				// Compile device-side to PTX assembly.
				// RUN: \| FileCheck -check-prefix CUDA-D1 -check-prefix CUDA-D1-SM35 \
				// Then compile host side and incorporate GPU code.
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-I1 \
				// Make sure we don't link anything.
				// RUN: -check-prefix CUDA-NL %s

				// Verify that there is device-side compilation per --cuda-gpu-arch args
				// and that all results are included on the host side.
				// RUN: %clang -### -nocudainc --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 -c %s 2>&1 \
				// Compile both device-sides to PTX assembly
				// RUN: \| FileCheck \
				// RUN: -check-prefix CUDA-D1 -check-prefix CUDA-D1-SM35 \
				// RUN: -check-prefix CUDA-D2 -check-prefix CUDA-D2-SM30 \
				// Then compile host side and incorporate both device-side outputs
				// RUN: -check-prefix CUDA-H -check-prefix CUDA-H-I1 -check-prefix CUDA-H-I2 \
				// Make sure we don't link anything.
				// RUN: -check-prefix CUDA-NL %s

				// Match device-side compilation
				// CUDA-D1: "-cc1" "-triple" "nvptx{{64?}}-nvidia-cuda"
				// CUDA-D1-SAME: "-fcuda-is-device"
				// CUDA-D1-SM35-SAME: "-target-cpu" "sm_35"
				// CUDA-D1-SAME: "-o" "[[GPUBINARY1:[^"]*]]"
				// CUDA-D1-SAME: "-x" "cuda"

				// Match anothe device-side compilation
				// CUDA-D2: "-cc1" "-triple" "nvptx{{64?}}-nvidia-cuda"
				// CUDA-D2-SAME: "-fcuda-is-device"
				// CUDA-D2-SM30-SAME: "-target-cpu" "sm_30"
				// CUDA-D2-SAME: "-o" "[[GPUBINARY2:[^"]*]]"
				// CUDA-D2-SAME: "-x" "cuda"

				// Match no device-side compilation
				// CUDA-ND-NOT: "-cc1" "-triple" "nvptx{{64?}}-nvidia-cuda"
				// CUDA-ND-SAME-NOT: "-fcuda-is-device"

				// Match host-side compilation
				// CUDA-H: "-cc1" "-triple"
				// CUDA-H-SAME-NOT: "nvptx{{64?}}-nvidia-cuda"
				// CUDA-H-SAME-NOT: "-fcuda-is-device"
				// CUDA-H-SAME: "-o" "[[HOSTOBJ:[^"]*]]"
				// CUDA-H-SAME: "-x" "cuda"
				// CUDA-H-I1-SAME: "-fcuda-include-gpubinary" "[[GPUBINARY1]]"
				// CUDA-H-I2-SAME: "-fcuda-include-gpubinary" "[[GPUBINARY2]]"

				// Match no GPU code inclusion.
				// CUDA-H-NI-NOT: "-fcuda-include-gpubinary"

				// Match no CUDA compilation
				// CUDA-NH-NOT: "-cc1" "-triple"
				// CUDA-NH-SAME-NOT: "-x" "cuda"

				// Match linker
				// CUDA-L: "{{.*}}ld{{(.exe)?}}"
				// CUDA-L-SAME: "[[HOSTOBJ]]"

				// Match no linker
				// CUDA-NL-NOT: "{{.*}}ld{{(.exe)?}}"

test/Index/attributes-cuda.cu

	// RUN: c-index-test -test-load-source all -x cuda %s \| FileCheck %s			// RUN: c-index-test -test-load-source all -x cuda -nocudainc --cuda-host-only %s \| FileCheck %s
				// RUN: c-index-test -test-load-source all -x cuda -nocudainc --cuda-device-only %s \| FileCheck %s
	__attribute__((device)) void f_device();			__attribute__((device)) void f_device();
	__attribute__((global)) void f_global();			__attribute__((global)) void f_global();
	__attribute__((constant)) int* g_constant;			__attribute__((constant)) int* g_constant;
	__attribute__((shared)) float *g_shared;			__attribute__((shared)) float *g_shared;
	__attribute__((host)) void f_host();			__attribute__((host)) void f_host();

	// CHECK: attributes-cuda.cu:3:30: FunctionDecl=f_device:3:30			// CHECK: attributes-cuda.cu:3:30: FunctionDecl=f_device:3:30
	// CHECK-NEXT: attributes-cuda.cu:3:16: attribute(device)			// CHECK-NEXT: attributes-cuda.cu:3:16: attribute(device)
	// CHECK: attributes-cuda.cu:4:30: FunctionDecl=f_global:4:30			// CHECK: attributes-cuda.cu:4:30: FunctionDecl=f_global:4:30
	// CHECK-NEXT: attributes-cuda.cu:4:16: attribute(global)			// CHECK-NEXT: attributes-cuda.cu:4:16: attribute(global)
	// CHECK: attributes-cuda.cu:5:32: VarDecl=g_constant:5:32 (Definition)			// CHECK: attributes-cuda.cu:5:32: VarDecl=g_constant:5:32 (Definition)
	// CHECK-NEXT: attributes-cuda.cu:5:16: attribute(constant)			// CHECK-NEXT: attributes-cuda.cu:5:16: attribute(constant)
	// CHECK: attributes-cuda.cu:6:32: VarDecl=g_shared:6:32 (Definition)			// CHECK: attributes-cuda.cu:6:32: VarDecl=g_shared:6:32 (Definition)
	// CHECK-NEXT: attributes-cuda.cu:6:16: attribute(shared)			// CHECK-NEXT: attributes-cuda.cu:6:16: attribute(shared)
	// CHECK: attributes-cuda.cu:7:28: FunctionDecl=f_host:7:28			// CHECK: attributes-cuda.cu:7:28: FunctionDecl=f_host:7:28
	// CHECK-NEXT: attributes-cuda.cu:7:16: attribute(host)			// CHECK-NEXT: attributes-cuda.cu:7:16: attribute(host)

tools/libclang/CIndex.cpp

Show First 20 Lines • Show All 2,992 Lines • ▼ Show 20 Lines	std::unique_ptr<ASTUnit> Unit(ASTUnit::LoadFromCommandLine(
Args->data(), Args->data() + Args->size(), Diags,		Args->data(), Args->data() + Args->size(), Diags,
CXXIdx->getClangResourcesPath(), CXXIdx->getOnlyLocalDecls(),		CXXIdx->getClangResourcesPath(), CXXIdx->getOnlyLocalDecls(),
/CaptureDiagnostics=/true, *RemappedFiles.get(),		/CaptureDiagnostics=/true, *RemappedFiles.get(),
/RemappedFilesKeepOriginalName=/true, PrecompilePreamble, TUKind,		/RemappedFilesKeepOriginalName=/true, PrecompilePreamble, TUKind,
CacheCodeCompletionResults, IncludeBriefCommentsInCodeCompletion,		CacheCodeCompletionResults, IncludeBriefCommentsInCodeCompletion,
/AllowPCHWithCompilerErrors=/true, SkipFunctionBodies,		/AllowPCHWithCompilerErrors=/true, SkipFunctionBodies,
/UserFilesAreVolatile=/true, ForSerialization, &ErrUnit));		/UserFilesAreVolatile=/true, ForSerialization, &ErrUnit));

		if (!Unit && !ErrUnit) {
		PTUI->result = CXError_ASTReadError;
		return;
		}

if (NumErrors != Diags->getClient()->getNumErrors()) {		if (NumErrors != Diags->getClient()->getNumErrors()) {
// Make sure to check that 'Unit' is non-NULL.		// Make sure to check that 'Unit' is non-NULL.
if (CXXIdx->getDisplayDiagnostics())		if (CXXIdx->getDisplayDiagnostics())
printDiagsToStderr(Unit ? Unit.get() : ErrUnit.get());		printDiagsToStderr(Unit ? Unit.get() : ErrUnit.get());
}		}

if (isASTReadError(Unit ? Unit.get() : ErrUnit.get())) {		if (isASTReadError(Unit ? Unit.get() : ErrUnit.get())) {
PTUI->result = CXError_ASTReadError;		PTUI->result = CXError_ASTReadError;
▲ Show 20 Lines • Show All 4,301 Lines • Show Last 20 Lines

unittests/ASTMatchers/ASTMatchersTest.h

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	testing::AssertionResult matchesConditionallyWithCuda(
if (!Finder.addDynamicMatcher(AMatcher, &VerifyDynamicFound))		if (!Finder.addDynamicMatcher(AMatcher, &VerifyDynamicFound))
return testing::AssertionFailure() << "Could not add dynamic matcher";		return testing::AssertionFailure() << "Could not add dynamic matcher";
std::unique_ptr<FrontendActionFactory> Factory(		std::unique_ptr<FrontendActionFactory> Factory(
newFrontendActionFactory(&Finder));		newFrontendActionFactory(&Finder));
// Some tests use typeof, which is a gnu extension.		// Some tests use typeof, which is a gnu extension.
std::vector<std::string> Args;		std::vector<std::string> Args;
Args.push_back("-xcuda");		Args.push_back("-xcuda");
Args.push_back("-fno-ms-extensions");		Args.push_back("-fno-ms-extensions");
		Args.push_back("--cuda-host-only");
Args.push_back(CompileArg);		Args.push_back(CompileArg);
if (!runToolOnCodeWithArgs(Factory->create(),		if (!runToolOnCodeWithArgs(Factory->create(),
CudaHeader + Code, Args)) {		CudaHeader + Code, Args)) {
return testing::AssertionFailure() << "Parsing error in \"" << Code << "\"";		return testing::AssertionFailure() << "Parsing error in \"" << Code << "\"";
}		}
if (Found != DynamicFound) {		if (Found != DynamicFound) {
return testing::AssertionFailure() << "Dynamic match result ("		return testing::AssertionFailure() << "Dynamic match result ("
<< DynamicFound		<< DynamicFound
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

End-to-end CUDA compilation.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23366

include/clang/Driver/Action.h

include/clang/Driver/CC1Options.td

include/clang/Driver/Driver.h

include/clang/Driver/Options.td

include/clang/Driver/Types.h

include/clang/Driver/Types.def

include/clang/Frontend/CodeGenOptions.h

lib/CodeGen/CGCUDANV.cpp

lib/CodeGen/CGCUDARuntime.h

lib/CodeGen/CodeGenFunction.cpp

lib/CodeGen/CodeGenModule.cpp

lib/Driver/Action.cpp

lib/Driver/Driver.cpp

lib/Driver/ToolChain.cpp

lib/Driver/ToolChains.h

lib/Driver/ToolChains.cpp

lib/Driver/Tools.h

lib/Driver/Tools.cpp

lib/Driver/Types.cpp

lib/Frontend/CompilerInvocation.cpp

test/Driver/cuda-options.cu

test/Index/attributes-cuda.cu

tools/libclang/CIndex.cpp

unittests/ASTMatchers/ASTMatchersTest.h

End-to-end CUDA compilation.
AbandonedPublic