This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
-
LangOptions.def
-
Driver/
-
CC1Options.td
8
Options.td
-
Sema/
-
Sema.h
-
lib/
-
Driver/
-
Tools.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
Sema/
6/12
SemaCUDA.cpp
1/1
SemaDecl.cpp
-
test/
-
Driver/
-
cuda-complex.cu
-
SemaCUDA/
-
Inputs/
-
complex
-
complex.cu

Differential D18328

[CUDA] Add option to mark most functions inside <complex> as host+device.
AbandonedPublic

Authored by jlebar on Mar 21 2016, 1:40 PM.

Download Raw Diff

Details

Reviewers

tra
rnk

Summary

clang --cuda-allow-std-complex translates into cc1
-fcuda-allow-std-complex. With this flag, we will mark all functions
inside <complex> within namespace std as host+device, other than
operator>> and operator<<, which use ostreams, which are not supported
in CUDA device code.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 51221.Mar 21 2016, 1:40 PM

jlebar retitled this revision from to [CUDA] Add option to mark most functions inside <complex> as host+device..

jlebar updated this object.

jlebar added reviewers: tra, rnk.

jlebar added subscribers: cfe-commits, jhen.

One minor question, LGTM otherwise.

lib/Sema/SemaCUDA.cpp
474	Can C++ library headers ever be non-system? I.e. can someone use libc++ via -I ?

This revision is now accepted and ready to land.Mar 21 2016, 1:55 PM

jlebar added inline comments.Mar 21 2016, 2:23 PM

lib/Sema/SemaCUDA.cpp
474	Good question, I have no idea if that's supposed to work. Reid, do you know?

I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like

// <complex>
#pragma clang cuda_implicit_host_device {
#include_next <complex>
#pragma clang cuda_implicit_host_device }

or to provide an explicit list of the functions that we're promoting to __host__ __device__, or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.

include/clang/Driver/Options.td
383–384	I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
lib/Sema/SemaCUDA.cpp
479–481	I don't think this works: the standard library might factor parts of <complex> out into separate header files. For instance, libstdc++ 4.4 includes the TR1 pieces of <complex> in that way.

rnk added inline comments.Mar 21 2016, 2:46 PM

lib/Sema/SemaCUDA.cpp
474	libc++ complex has this pragma in it: #pragma GCC system_header So we should be safe regardless of the flags used to find it.
483–488	I'd do this check after the system header test and before the "complex" test, since it's probably faster.
485	There's no cast on the RHS, so I'd spell out `CXXRecordDecl` here to make things more obvious.
lib/Sema/SemaDecl.cpp
8344	Do you want this to apply to declarations as well as definitions? Your test uses that functionality.

Thanks for the suggestions, Richard. I'm not sure any of them will work, but I don't defend this patch as anything other than a hack, so if we can come up with something that works for what we need to accomplish and is cleaner, that's great.

In D18328#379824, @rsmith wrote:
I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like
// <complex>
#pragma clang cuda_implicit_host_device {
#include_next <complex>
#pragma clang cuda_implicit_host_device }

We considered this and ruled it out for two reasons:

We'd have to exclude operator>> and operator<<, presumably with additional pragmas, and
We'd have to exclude everything included by <complex>.

Of course with enough pragmas anything is possible, but at this point it seemed to become substantially more complicated than this (admittedly awful) hack.

or to provide an explicit list of the functions that we're promoting to __host__ __device__

The problem with that is that libstdc++ uses many helper functions, which we'd also have to enumerate. Baking those kinds of implementation details into clang seemed worse than this hack.

or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.

I think asking people to use a custom standard library is a nonstarter for e.g. OSS tensorflow, and I suspect it would be a considerable amount of work to accomplish in google3. (Not to suggest that two wrongs make a right, but we already have many similar hacks in place to match nvcc's behavior with standard library functions -- the main difference here is that we're spelling the hack in clang's C++ as opposed to in __clang_cuda_runtime_wrapper.h.)

In D18328#379824, @rsmith wrote:
I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like
// <complex>
#pragma clang cuda_implicit_host_device {
#include_next <complex>
#pragma clang cuda_implicit_host_device }
or to provide an explicit list of the functions that we're promoting to __host__ __device__, or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.

We'll still need some filtering as not everything inside <complex> should be __host__ __device__.

include/clang/Driver/Options.td
383–384	What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either. Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks.

Here are two other approaches we considered and rejected, for the record:

Copy-paste a <complex> implementation from e.g. libc++ into __clang_cuda_runtime_wrapper.h, and edit it appropriately. Then #define the real <complex>'s include guards.

Main problem with this is the obvious one: We're copying a big chunk of the standard library into the compiler, where it doesn't belong, and now we have two divergent copies of this code to maintain. In addition, we can't necessarily use libc++, since we need to support pre-c++11 and AIUI libc++ does not.

Provide __device__ overrides for all the functions defined in <complex>. This almost works, except that we do not (currently) have a way to let you inject new overloads for member functions into classes we don't own. E.g. we can add a __device__ overload std::real(const complex<T>&), just like we could override std::real in any other way, but we can't add a new __device__ overload to std::complex<T>::real().

This approach also has a similar problem to (1), which is that we'd end up copy/pasting almost all of <complex> into the compiler.

include/clang/Driver/Options.td
383–384	I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface. This is an important feature for a lot of projects, including tensorflow and eigen. No matter how we define the flag, I suspect people are going to use it en masse. (Most projects I've seen pass the equivalent flag to nvcc.) At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API. There's also the issue of discoverability. nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help. I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works.

jlebar added inline comments.Mar 21 2016, 3:20 PM

include/clang/Driver/Options.td
383–384	An alternative wrt the flag is to enable it by default. This would be somewhat consistent with existing behavior, wherein we make most std math functions available without a special flag, even though they're not technically host-device. The main difference here is that there we're matching nvcc's default behavior, whereas here we're actually going further than nvcc -- nvcc by default doesn't let you touch std::complex from device code at all, and with a flag, you can touch its constexpr functions. Which is not actually very much. Nonetheless, since the user-visible effect is consistent with our approach of making std math stuff available, and since this shouldn't make us reject code nvcc accepts, I'd be more OK hiding the flag to turn it off.

In D18328#379824, @rsmith wrote:
I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like
// <complex>
#pragma clang cuda_implicit_host_device {
#include_next <complex>
#pragma clang cuda_implicit_host_device }
or to provide an explicit list of the functions that we're promoting to __host__ __device__, or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.

I don't really like include_next wrapper headers, but adding a pragma spelling of the cuda device attributes might be nice. There would still be issues with the streaming operators, though.

include/clang/Driver/Options.td
383–384	What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably don't want a super fine grained LangOpt like this.

jlebar added inline comments.Mar 21 2016, 3:23 PM

include/clang/Driver/Options.td
383–384	What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? I think we midair'ed on this. See above comment about turning this flag on by default -- calling this "nvcc compat" wouldn't quite be right. We could certainly have a broader flag, but I'm not sure at the moment what else would reasonably go in with this one.

rsmith added inline comments.Mar 21 2016, 3:35 PM

include/clang/Driver/Options.td
383–384	I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this behaviour off) more palatable than the current approach. I'd also be a lot happier about this if we can view it as a short-term workaround, with the longer-term fix being to get the host/device attributes added to standard library implementations (even if it turns out we can never actually remove this workaround in practice). If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang.
lib/Sema/SemaCUDA.cpp
464–465	Does nvcc do this "`constexpr` implies `__host__ __device__`" thing only for functions declared within <complex>, or for all functions? Another alternative strategy: a wrapper `<complex>` header that does this: #include // ... union of includes from libc++ and libstdc++ <complex> #define constexpr __host__ __device__ constexpr #include_next <complex> #undef constexpr
485	`Parent` can't be null for a `CXXMethodDecl`, so just `Method->getParent()->isInStdNamespace()` would work.

jlebar added inline comments.Mar 21 2016, 3:48 PM

include/clang/Driver/Options.td
383–384	If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang. I think that the number of people passing --relaxed-constexpr to nvcc just so they can use a limited subset of std::complex, and the fact that we're already doing this for (basically all) other std math functions may be decent arguments for this. But I don't know if I'm a great judge of what we can legitimately claim here.
lib/Sema/SemaCUDA.cpp
464–465	Does nvcc do this "constexpr implies host device" thing only for functions declared within <complex>, or for all functions? All functions. Although std::complex is the main use I've observed. Another alternative strategy: a wrapper <complex> header that does this: That one is quite clever, although I'm not sure about enumerating all of the includes from the headers. I guess that should be reasonably stable... I think I would like to get full complex support, though, if we can agree on a path towards that. The current limitation is silly, it seems clear that people want this, and the constexpr thing gives you but a shadow of the actual library.
479–481	Hm, that is unfortunate. One option would be to say that we just don't support this. Otherwise we have to go down the road of identifying all the relevant functions...

rsmith added inline comments.Mar 21 2016, 4:27 PM

lib/Sema/SemaCUDA.cpp
479–481	I've not checked GCC 5 onwards, but it looks like in the 4.x series, this is the only problem of this kind, and only affects the TR1 pieces (which it seems we probably don't need to care about supporting here). libc++ doesn't currently have any problems of this kind. Obviously it's unknown what issues we'll see with other standard library implementations.

rsmith added inline comments.Mar 21 2016, 4:32 PM

lib/Sema/SemaCUDA.cpp
464–465	Supporting a "`constexpr` implies `__host__ __device__`" feature for all functions seems a lot cleaner than the approach taken by this patch, and will presumably improve NVCC compatibility in other cases too (though perhaps they're quite rare). This seems like a very odd pair of features to link in this way, but if we're going to have something weird like this to support existing NVCC-targeting code, using the same approach may be better. This would also mean we would not be further extending NVCC's extension.

Okay, after much discussion, we've decided to go with --relaxed-constexpr instead of this. I have a patch for that which seems to mostly work, will send it out soon.

Revision Contents

Path

Size

include/

clang/

Basic/

LangOptions.def

1 line

Driver/

CC1Options.td

2 lines

Options.td

2 lines

Sema/

Sema.h

3 lines

lib/

Driver/

Tools.cpp

2 lines

Frontend/

CompilerInvocation.cpp

3 lines

Sema/

SemaCUDA.cpp

43 lines

SemaDecl.cpp

6 lines

test/

Driver/

cuda-complex.cu

15 lines

SemaCUDA/

Inputs/

complex

30 lines

complex.cu

27 lines

Diff 51221

include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")			LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")
	LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")			LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")

	LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")
	LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")			LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")
	LANGOPT(CUDADisableTargetCallChecks, 1, 0, "Disable checks for call targets (host, device, etc.)")			LANGOPT(CUDADisableTargetCallChecks, 1, 0, "Disable checks for call targets (host, device, etc.)")
	LANGOPT(CUDATargetOverloads, 1, 0, "Enable function overloads based on CUDA target attributes")			LANGOPT(CUDATargetOverloads, 1, 0, "Enable function overloads based on CUDA target attributes")
	LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "Allow variadic functions in CUDA device code")			LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "Allow variadic functions in CUDA device code")
				LANGOPT(CUDAAllowStdComplex, 1, 0, "Allow calls to functions in <complex>, other than operator>> and operator<<, from device code")

	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")			LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")
	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")			LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")
	LANGOPT(ConceptsTS , 1, 0, "enable C++ Extensions for Concepts")			LANGOPT(ConceptsTS , 1, 0, "enable C++ Extensions for Concepts")
	BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")			BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")
	BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")			BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")
	BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")			BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")
	BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")			BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

include/clang/Driver/CC1Options.td

Show First 20 Lines • Show All 691 Lines • ▼ Show 20 Lines	def fcuda_disable_target_call_checks : Flag<["-"],
"fcuda-disable-target-call-checks">,		"fcuda-disable-target-call-checks">,
HelpText<"Disable all cross-target (host, device, etc.) call checks in CUDA">;		HelpText<"Disable all cross-target (host, device, etc.) call checks in CUDA">;
def fcuda_include_gpubinary : Separate<["-"], "fcuda-include-gpubinary">,		def fcuda_include_gpubinary : Separate<["-"], "fcuda-include-gpubinary">,
HelpText<"Incorporate CUDA device-side binary into host object file.">;		HelpText<"Incorporate CUDA device-side binary into host object file.">;
def fcuda_target_overloads : Flag<["-"], "fcuda-target-overloads">,		def fcuda_target_overloads : Flag<["-"], "fcuda-target-overloads">,
HelpText<"Enable function overloads based on CUDA target attributes.">;		HelpText<"Enable function overloads based on CUDA target attributes.">;
def fcuda_allow_variadic_functions : Flag<["-"], "fcuda-allow-variadic-functions">,		def fcuda_allow_variadic_functions : Flag<["-"], "fcuda-allow-variadic-functions">,
HelpText<"Allow variadic functions in CUDA device code.">;		HelpText<"Allow variadic functions in CUDA device code.">;
		def fcuda_allow_std_complex : Flag<["-"], "fcuda-allow-std-complex">,
		HelpText<"Allow calls to functions in <complex>, other than operator>> and operator<<, from device code.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// OpenMP Options		// OpenMP Options
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def fopenmp_is_device : Flag<["-"], "fopenmp-is-device">,		def fopenmp_is_device : Flag<["-"], "fopenmp-is-device">,
HelpText<"Generate code only for an OpenMP target device.">;		HelpText<"Generate code only for an OpenMP target device.">;
def fomp_host_ir_file_path : Separate<["-"], "fomp-host-ir-file-path">,		def fomp_host_ir_file_path : Separate<["-"], "fomp-host-ir-file-path">,
Show All 32 Lines

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 374 Lines • ▼ Show 20 Lines
	def cuda_device_only : Flag<["--"], "cuda-device-only">,			def cuda_device_only : Flag<["--"], "cuda-device-only">,
	HelpText<"Do device-side CUDA compilation only">;			HelpText<"Do device-side CUDA compilation only">;
	def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">,			def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">,
	Flags<[DriverOption, HelpHidden]>, HelpText<"CUDA GPU architecture">;			Flags<[DriverOption, HelpHidden]>, HelpText<"CUDA GPU architecture">;
	def cuda_host_only : Flag<["--"], "cuda-host-only">,			def cuda_host_only : Flag<["--"], "cuda-host-only">,
	HelpText<"Do host-side CUDA compilation only">;			HelpText<"Do host-side CUDA compilation only">;
	def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,			def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,
	HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;			HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
				def cuda_allow_std_complex : Flag<["--"], "cuda-allow-std-complex">,
				HelpText<"Allow CUDA device code to use definitions from <complex>, other than operator>> and operator<<.">;
				rsmithUnsubmitted Not Done Reply Inline Actions I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface. rsmith: I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang…
				traUnsubmitted Not Done Reply Inline Actions What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either. Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks. tra: What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda…
				jlebarAuthorUnsubmitted Not Done Reply Inline Actions I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface. This is an important feature for a lot of projects, including tensorflow and eigen. No matter how we define the flag, I suspect people are going to use it en masse. (Most projects I've seen pass the equivalent flag to nvcc.) At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API. There's also the issue of discoverability. nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help. I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works. jlebar: > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang…
				rnkUnsubmitted Not Done Reply Inline Actions What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably don't want a super fine grained LangOpt like this. rnk: What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably…
				jlebarAuthorUnsubmitted Not Done Reply Inline Actions What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? I think we midair'ed on this. See above comment about turning this flag on by default -- calling this "nvcc compat" wouldn't quite be right. We could certainly have a broader flag, but I'm not sure at the moment what else would reasonably go in with this one. jlebar: > What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? I think we…
				jlebarAuthorUnsubmitted Not Done Reply Inline Actions An alternative wrt the flag is to enable it by default. This would be somewhat consistent with existing behavior, wherein we make most std math functions available without a special flag, even though they're not technically host-device. The main difference here is that there we're matching nvcc's default behavior, whereas here we're actually going further than nvcc -- nvcc by default doesn't let you touch std::complex from device code at all, and with a flag, you can touch its constexpr functions. Which is not actually very much. Nonetheless, since the user-visible effect is consistent with our approach of making std math stuff available, and since this shouldn't make us reject code nvcc accepts, I'd be more OK hiding the flag to turn it off. jlebar: An alternative wrt the flag is to enable it by default. This would be somewhat consistent with…
				rsmithUnsubmitted Not Done Reply Inline Actions I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this behaviour off) more palatable than the current approach. I'd also be a lot happier about this if we can view it as a short-term workaround, with the longer-term fix being to get the host/device attributes added to standard library implementations (even if it turns out we can never actually remove this workaround in practice). If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang. rsmith: I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this…
				jlebarAuthorUnsubmitted Not Done Reply Inline Actions If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang. I think that the number of people passing --relaxed-constexpr to nvcc just so they can use a limited subset of std::complex, and the fact that we're already doing this for (basically all) other std math functions may be decent arguments for this. But I don't know if I'm a great judge of what we can legitimately claim here. jlebar: > If we can legitimately claim that this is the way that CUDA is intended to work, and the…
	def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,			def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
	HelpText<"CUDA installation path">;			HelpText<"CUDA installation path">;
	def dA : Flag<["-"], "dA">, Group<d_Group>;			def dA : Flag<["-"], "dA">, Group<d_Group>;
	def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,			def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode in addition to normal output">;			HelpText<"Print macro definitions in -E mode in addition to normal output">;
	def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,			def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode instead of normal output">;			HelpText<"Print macro definitions in -E mode instead of normal output">;
	def dead__strip : Flag<["-"], "dead_strip">;			def dead__strip : Flag<["-"], "dead_strip">;
	▲ Show 20 Lines • Show All 1,772 Lines • Show Last 20 Lines

include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,908 Lines • ▼ Show 20 Lines	bool inferCUDATargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,
CXXMethodDecl *MemberDecl,		CXXMethodDecl *MemberDecl,
bool ConstRHS,		bool ConstRHS,
bool Diagnose);		bool Diagnose);

/// \return true if \p CD can be considered empty according to CUDA		/// \return true if \p CD can be considered empty according to CUDA
/// (E.2.3.1 in CUDA 7.5 Programming guide).		/// (E.2.3.1 in CUDA 7.5 Programming guide).
bool isEmptyCudaConstructor(SourceLocation Loc, CXXConstructorDecl *CD);		bool isEmptyCudaConstructor(SourceLocation Loc, CXXConstructorDecl *CD);

		/// \return true if \p FD should be marked implicitly host+device.
		bool declShouldBeCUDAHostDevice(const FunctionDecl &FD);

/// \name Code completion		/// \name Code completion
//@{		//@{
/// \brief Describes the context in which code completion occurs.		/// \brief Describes the context in which code completion occurs.
enum ParserCompletionContext {		enum ParserCompletionContext {
/// \brief Code completion occurs at top-level or namespace context.		/// \brief Code completion occurs at top-level or namespace context.
PCC_Namespace,		PCC_Namespace,
/// \brief Code completion occurs within a class, struct, or union.		/// \brief Code completion occurs within a class, struct, or union.
PCC_Class,		PCC_Class,
▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

lib/Driver/Tools.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,588 Lines • ▼ Show 20 Lines	else if (&getToolChain() == C.getCudaHostToolChain())
AuxToolChain = C.getCudaDeviceToolChain();		AuxToolChain = C.getCudaDeviceToolChain();
else		else
llvm_unreachable("Can't figure out CUDA compilation mode.");		llvm_unreachable("Can't figure out CUDA compilation mode.");
assert(AuxToolChain != nullptr && "No aux toolchain.");		assert(AuxToolChain != nullptr && "No aux toolchain.");
CmdArgs.push_back("-aux-triple");		CmdArgs.push_back("-aux-triple");
CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));		CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));
CmdArgs.push_back("-fcuda-target-overloads");		CmdArgs.push_back("-fcuda-target-overloads");
CmdArgs.push_back("-fcuda-disable-target-call-checks");		CmdArgs.push_back("-fcuda-disable-target-call-checks");
		if (Args.hasArg(options::OPT_cuda_allow_std_complex))
		CmdArgs.push_back("-fcuda-allow-std-complex");
}		}

if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm \|\|		if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm \|\|
Triple.getArch() == llvm::Triple::thumb)) {		Triple.getArch() == llvm::Triple::thumb)) {
unsigned Offset = Triple.getArch() == llvm::Triple::arm ? 4 : 6;		unsigned Offset = Triple.getArch() == llvm::Triple::arm ? 4 : 6;
unsigned Version;		unsigned Version;
Triple.getArchName().substr(Offset).getAsInteger(10, Version);		Triple.getArchName().substr(Offset).getAsInteger(10, Version);
if (Version < 7)		if (Version < 7)
▲ Show 20 Lines • Show All 7,373 Lines • Show Last 20 Lines

lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,570 Lines • ▼ Show 20 Lines	if (Args.hasArg(OPT_fcuda_disable_target_call_checks))
Opts.CUDADisableTargetCallChecks = 1;		Opts.CUDADisableTargetCallChecks = 1;

if (Args.hasArg(OPT_fcuda_target_overloads))		if (Args.hasArg(OPT_fcuda_target_overloads))
Opts.CUDATargetOverloads = 1;		Opts.CUDATargetOverloads = 1;

if (Args.hasArg(OPT_fcuda_allow_variadic_functions))		if (Args.hasArg(OPT_fcuda_allow_variadic_functions))
Opts.CUDAAllowVariadicFunctions = 1;		Opts.CUDAAllowVariadicFunctions = 1;

		if (Args.hasArg(OPT_fcuda_allow_std_complex))
		Opts.CUDAAllowStdComplex = 1;

if (Opts.ObjC1) {		if (Opts.ObjC1) {
if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {		if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
StringRef value = arg->getValue();		StringRef value = arg->getValue();
if (Opts.ObjCRuntime.tryParse(value))		if (Opts.ObjCRuntime.tryParse(value))
Diags.Report(diag::err_drv_unknown_objc_runtime) << value;		Diags.Report(diag::err_drv_unknown_objc_runtime) << value;
}		}

if (Args.hasArg(OPT_fobjc_gc_only))		if (Args.hasArg(OPT_fobjc_gc_only))
▲ Show 20 Lines • Show All 808 Lines • Show Last 20 Lines

lib/Sema/SemaCUDA.cpp

//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//		//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
/// \brief This file implements semantic analysis for CUDA constructs.		/// \brief This file implements semantic analysis for CUDA constructs.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Sema/Sema.h"		#include "clang/Sema/Sema.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
		#include "clang/AST/DeclTemplate.h"
#include "clang/AST/ExprCXX.h"		#include "clang/AST/ExprCXX.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "clang/Sema/SemaDiagnostic.h"		#include "clang/Sema/SemaDiagnostic.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/ADT/StringSet.h"
using namespace clang;		using namespace clang;

ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,		ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,
MultiExprArg ExecConfig,		MultiExprArg ExecConfig,
SourceLocation GGGLoc) {		SourceLocation GGGLoc) {
FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();		FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();
if (!ConfigDecl)		if (!ConfigDecl)
return ExprError(Diag(LLLLoc, diag::err_undeclared_var_use)		return ExprError(Diag(LLLLoc, diag::err_undeclared_var_use)
▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	if (!llvm::all_of(CD->inits(), [&](const CXXCtorInitializer *CI) {
dyn_cast<CXXConstructExpr>(CI->getInit()))		dyn_cast<CXXConstructExpr>(CI->getInit()))
return isEmptyCudaConstructor(Loc, CE->getConstructor());		return isEmptyCudaConstructor(Loc, CE->getConstructor());
return false;		return false;
}))		}))
return false;		return false;

return true;		return true;
}		}

		// Everything within namespace std inside <complex> should be host+device,
		// except operator<< and operator>> (ostreams aren't supported in CUDA device
		// code). Whitelisting the functions we want, rather than blacklisting the
		// stream operators, is a tempting alternative, but libstdc++ uses many helper
		// functions, which we'd also have to whitelist.
		//
		// TODO: Output a better error message if you try to use something from
		// <complex> without passing -fcuda-allow-std-complex.
		// TODO: Output a nvcc-compat warning if you try to use a non-constexpr function
		// from <complex> -- nvcc only lets you use constexpr functions.
		rsmithUnsubmitted Not Done Reply Inline Actions Does nvcc do this "`constexpr` implies `__host__ __device__`" thing only for functions declared within <complex>, or for all functions? Another alternative strategy: a wrapper `<complex>` header that does this: #include // ... union of includes from libc++ and libstdc++ <complex> #define constexpr __host__ __device__ constexpr #include_next <complex> #undef constexpr rsmith: Does nvcc do this "`constexpr` implies `__host__ __device__`" thing only for functions declared…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Does nvcc do this "constexpr implies host device" thing only for functions declared within <complex>, or for all functions? All functions. Although std::complex is the main use I've observed. Another alternative strategy: a wrapper <complex> header that does this: That one is quite clever, although I'm not sure about enumerating all of the includes from the headers. I guess that should be reasonably stable... I think I would like to get full complex support, though, if we can agree on a path towards that. The current limitation is silly, it seems clear that people want this, and the constexpr thing gives you but a shadow of the actual library. jlebar: > Does nvcc do this "constexpr implies __host__ __device__" thing only for functions declared…
		rsmithUnsubmitted Not Done Reply Inline Actions Supporting a "`constexpr` implies `__host__ __device__`" feature for all functions seems a lot cleaner than the approach taken by this patch, and will presumably improve NVCC compatibility in other cases too (though perhaps they're quite rare). This seems like a very odd pair of features to link in this way, but if we're going to have something weird like this to support existing NVCC-targeting code, using the same approach may be better. This would also mean we would not be further extending NVCC's extension. rsmith: Supporting a "`constexpr` implies `__host__ __device__`" feature for all functions seems a lot…
		bool Sema::declShouldBeCUDAHostDevice(const FunctionDecl &FD) {
		assert(getLangOpts().CUDA);

		if (!getLangOpts().CUDAAllowStdComplex)
		return false;

		const SourceManager &SM = getSourceManager();
		SourceLocation Loc = FD.getLocation();
		if (!SM.isInSystemHeader(Loc))
		traUnsubmitted Done Reply Inline Actions Can C++ library headers ever be non-system? I.e. can someone use libc++ via -I ? tra: Can C++ library headers ever be non-system? I.e. can someone use libc++ via -I ?
		jlebarAuthorUnsubmitted Done Reply Inline Actions Good question, I have no idea if that's supposed to work. Reid, do you know? jlebar: Good question, I have no idea if that's supposed to work. Reid, do you know?
		rnkUnsubmitted Done Reply Inline Actions libc++ complex has this pragma in it: #pragma GCC system_header So we should be safe regardless of the flags used to find it. rnk: libc++ complex has this pragma in it: #pragma GCC system_header So we should be safe…
		return false;
		const FileEntry *FE = SM.getFileEntryForID(SM.getFileID(Loc));
		if (!FE)
		return false;
		StringRef Filename = FE->getName();
		if (Filename != "complex" && !Filename.endswith("/complex"))
		return false;
		rsmithUnsubmitted Not Done Reply Inline Actions I don't think this works: the standard library might factor parts of <complex> out into separate header files. For instance, libstdc++ 4.4 includes the TR1 pieces of <complex> in that way. rsmith: I don't think this works: the standard library might factor parts of <complex> out into…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Hm, that is unfortunate. One option would be to say that we just don't support this. Otherwise we have to go down the road of identifying all the relevant functions... jlebar: Hm, that is unfortunate. One option would be to say that we just don't support this.
		rsmithUnsubmitted Not Done Reply Inline Actions I've not checked GCC 5 onwards, but it looks like in the 4.x series, this is the only problem of this kind, and only affects the TR1 pieces (which it seems we probably don't need to care about supporting here). libc++ doesn't currently have any problems of this kind. Obviously it's unknown what issues we'll see with other standard library implementations. rsmith: I've not checked GCC 5 onwards, but it looks like in the 4.x series, this is the only problem…

		bool IsInStd = FD.isInStdNamespace();
		if (const auto *Method = dyn_cast<CXXMethodDecl>(&FD))
		if (const auto *Parent = Method->getParent())
		rnkUnsubmitted Done Reply Inline Actions There's no cast on the RHS, so I'd spell out `CXXRecordDecl` here to make things more obvious. rnk: There's no cast on the RHS, so I'd spell out `CXXRecordDecl` here to make things more obvious.
		rsmithUnsubmitted Done Reply Inline Actions `Parent` can't be null for a `CXXMethodDecl`, so just `Method->getParent()->isInStdNamespace()` would work. rsmith: `Parent` can't be null for a `CXXMethodDecl`, so just `Method->getParent()->isInStdNamespace()`…
		IsInStd \|= Parent->isInStdNamespace();
		if (!IsInStd)
		return false;
		rnkUnsubmitted Done Reply Inline Actions I'd do this check after the system header test and before the "complex" test, since it's probably faster. rnk: I'd do this check after the system header test and before the "complex" test, since it's…

		auto Operator = FD.getOverloadedOperator();
		if (Operator == OO_LessLess \|\| Operator == OO_GreaterGreater)
		return false;

		return true;
		}

lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,334 Lines • ▼ Show 20 Lines	Sema::ActOnFunctionDeclarator(Scope S, Declarator &D, DeclContext DC,
NewFD->setRangeEnd(D.getSourceRange().getEnd());		NewFD->setRangeEnd(D.getSourceRange().getEnd());

if (D.isRedeclaration() && !Previous.empty()) {		if (D.isRedeclaration() && !Previous.empty()) {
checkDLLAttributeRedeclaration(		checkDLLAttributeRedeclaration(
*this, dyn_cast<NamedDecl>(Previous.getRepresentativeDecl()), NewFD,		*this, dyn_cast<NamedDecl>(Previous.getRepresentativeDecl()), NewFD,
isExplicitSpecialization \|\| isFunctionTemplateSpecialization);		isExplicitSpecialization \|\| isFunctionTemplateSpecialization);
}		}

		// CUDA: Some decls in system headers get an implicit __host__ __device__.
		if (getLangOpts().CUDA && declShouldBeCUDAHostDevice(*NewFD)) {
		rnkUnsubmitted Done Reply Inline Actions Do you want this to apply to declarations as well as definitions? Your test uses that functionality. rnk: Do you want this to apply to declarations as well as definitions? Your test uses that…
		NewFD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
		NewFD->addAttr(CUDAHostAttr::CreateImplicit(Context));
		}

if (getLangOpts().CPlusPlus) {		if (getLangOpts().CPlusPlus) {
if (FunctionTemplate) {		if (FunctionTemplate) {
if (NewFD->isInvalidDecl())		if (NewFD->isInvalidDecl())
FunctionTemplate->setInvalidDecl();		FunctionTemplate->setInvalidDecl();
return FunctionTemplate;		return FunctionTemplate;
}		}
}		}

▲ Show 20 Lines • Show All 6,694 Lines • Show Last 20 Lines

test/Driver/cuda-complex.cu

This file was added.

				// Tests CUDA compilation pipeline construction in Driver.
				// REQUIRES: clang-driver

				// Check that --cuda-allow-std-complex passes -fcuda-allow-std-complex to cc1.
				// RUN: %clang -### -target x86_64-linux-gnu --cuda-allow-std-complex -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix ALLOW-COMPLEX %s

				// ALLOW-COMPLEX: -fcuda-allow-std-complex

				// But if we don't pass --cuda-allow-std-complex, we don't pass
				// -fcuda-allow-std-complex to cc1.
				// RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix NO-ALLOW-COMPLEX %s

				// NO-ALLOW-COMPLEX-NOT: -fcuda-allow-std-complex

test/SemaCUDA/Inputs/complex

This file was added.

				// Incomplete stub of <complex> used to check that we properly annotate these
				// functions as host+device.

				namespace std {

				template <typename T>
				class complex {
				public:
				complex(const T &re = T(), const T &im = T());
				complex<T> &operator+=(const complex<T> &);

				private:
				T real;
				T imag;
				};

				template <class T>
				complex<T> operator+(const complex<T> &, const complex<T> &);

				template <class T>
				T real(const complex<T> &);

				// Stream operators are not marked as host+device.
				template <class T>
				void operator<<(const complex<T> &, const complex<T> &);

				template <class T>
				void operator>>(const complex<T> &, const complex<T> &);

				} // namespace std

test/SemaCUDA/complex.cu

This file was added.

				// RUN: %clang_cc1 -triple nvptx-unknown-cuda -fsyntax-only -fcuda-allow-std-complex -fcuda-is-device -isystem "%S/Inputs" -verify %s
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -fcuda-allow-std-complex -isystem "%S/Inputs" -verify %s

				// Checks that functions inside a system header named <complex> are marked as
				// host+device.

				#include <cuda.h>
				#include <complex>

				using std::complex;
				using std::real;

				void __device__ foo() {
				complex<float> x;
				complex<float> y(x);
				y += x;
				x + y;
				real(complex<int>(1, 2));

				// Our <complex> header defines complex-to-complex operator<< and operator>>,
				// but these are not implicitly marked as host+device.

				x << y; // expected-error {{invalid operands to binary expression}}
				// expected-note@complex:* {{call to __host__ function from __device__ function}}
				x >> y; // expected-error {{invalid operands to binary expression}}
				// expected-note@complex:* {{call to __host__ function from __device__ function}}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Add option to mark most functions inside <complex> as host+device.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 51221

include/clang/Basic/LangOptions.def

include/clang/Driver/CC1Options.td

include/clang/Driver/Options.td

include/clang/Sema/Sema.h

lib/Driver/Tools.cpp

lib/Frontend/CompilerInvocation.cpp

lib/Sema/SemaCUDA.cpp

lib/Sema/SemaDecl.cpp

test/Driver/cuda-complex.cu

test/SemaCUDA/Inputs/complex

test/SemaCUDA/complex.cu

[CUDA] Add option to mark most functions inside <complex> as host+device.
AbandonedPublic