clang --cuda-allow-std-complex translates into cc1
-fcuda-allow-std-complex. With this flag, we will mark all functions
inside <complex> within namespace std as host+device, other than
operator>> and operator<<, which use ostreams, which are not supported
in CUDA device code.
Details
Diff Detail
Event Timeline
One minor question, LGTM otherwise.
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
474 | Can C++ library headers ever be non-system? I.e. can someone use libc++ via -I ? |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
474 | Good question, I have no idea if that's supposed to work. Reid, do you know? |
I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like
// <complex> #pragma clang cuda_implicit_host_device { #include_next <complex> #pragma clang cuda_implicit_host_device }
or to provide an explicit list of the functions that we're promoting to __host__ __device__, or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.
include/clang/Driver/Options.td | ||
---|---|---|
383–384 | I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface. | |
lib/Sema/SemaCUDA.cpp | ||
479–481 | I don't think this works: the standard library might factor parts of <complex> out into separate header files. For instance, libstdc++ 4.4 includes the TR1 pieces of <complex> in that way. |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
474 | libc++ complex has this pragma in it: #pragma GCC system_header So we should be safe regardless of the flags used to find it. | |
483–488 | I'd do this check after the system header test and before the "complex" test, since it's probably faster. | |
485 | There's no cast on the RHS, so I'd spell out CXXRecordDecl here to make things more obvious. | |
lib/Sema/SemaDecl.cpp | ||
8344 | Do you want this to apply to declarations as well as definitions? Your test uses that functionality. |
Thanks for the suggestions, Richard. I'm not sure any of them will work, but I don't defend this patch as anything other than a hack, so if we can come up with something that works for what we need to accomplish and is cleaner, that's great.
We considered this and ruled it out for two reasons:
- We'd have to exclude operator>> and operator<<, presumably with additional pragmas, and
- We'd have to exclude everything included by <complex>.
Of course with enough pragmas anything is possible, but at this point it seemed to become substantially more complicated than this (admittedly awful) hack.
or to provide an explicit list of the functions that we're promoting to __host__ __device__
The problem with that is that libstdc++ uses many helper functions, which we'd also have to enumerate. Baking those kinds of implementation details into clang seemed worse than this hack.
or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.
I think asking people to use a custom standard library is a nonstarter for e.g. OSS tensorflow, and I suspect it would be a considerable amount of work to accomplish in google3. (Not to suggest that two wrongs make a right, but we already have many similar hacks in place to match nvcc's behavior with standard library functions -- the main difference here is that we're spelling the hack in clang's C++ as opposed to in __clang_cuda_runtime_wrapper.h.)
We'll still need some filtering as not everything inside <complex> should be __host__ __device__.
include/clang/Driver/Options.td | ||
---|---|---|
383–384 | What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either. Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks. |
Here are two other approaches we considered and rejected, for the record:
- Copy-paste a <complex> implementation from e.g. libc++ into __clang_cuda_runtime_wrapper.h, and edit it appropriately. Then #define the real <complex>'s include guards.
Main problem with this is the obvious one: We're copying a big chunk of the standard library into the compiler, where it doesn't belong, and now we have two divergent copies of this code to maintain. In addition, we can't necessarily use libc++, since we need to support pre-c++11 and AIUI libc++ does not.
- Provide __device__ overrides for all the functions defined in <complex>. This almost works, except that we do not (currently) have a way to let you inject new overloads for member functions into classes we don't own. E.g. we can add a __device__ overload std::real(const complex<T>&), just like we could override std::real in any other way, but we can't add a new __device__ overload to std::complex<T>::real().
This approach also has a similar problem to (1), which is that we'd end up copy/pasting almost all of <complex> into the compiler.
include/clang/Driver/Options.td | ||
---|---|---|
383–384 |
This is an important feature for a lot of projects, including tensorflow and eigen. No matter how we define the flag, I suspect people are going to use it en masse. (Most projects I've seen pass the equivalent flag to nvcc.) At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API. There's also the issue of discoverability. nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help. I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works. |
include/clang/Driver/Options.td | ||
---|---|---|
383–384 | An alternative wrt the flag is to enable it by default. This would be somewhat consistent with existing behavior, wherein we make most std math functions available without a special flag, even though they're not technically host-device. The main difference here is that there we're matching nvcc's default behavior, whereas here we're actually going further than nvcc -- nvcc by default doesn't let you touch std::complex from device code at all, and with a flag, you can touch its *constexpr* functions. Which is not actually very much. Nonetheless, since the user-visible effect is consistent with our approach of making std math stuff available, and since this shouldn't make us reject code nvcc accepts, I'd be more OK hiding the flag to turn it off. |
I don't really like include_next wrapper headers, but adding a pragma spelling of the cuda device attributes might be nice. There would still be issues with the streaming operators, though.
include/clang/Driver/Options.td | ||
---|---|---|
383–384 | What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably don't want a super fine grained LangOpt like this. |
include/clang/Driver/Options.td | ||
---|---|---|
383–384 |
I think we midair'ed on this. See above comment about turning this flag on by default -- calling this "nvcc compat" wouldn't quite be right. We could certainly have a broader flag, but I'm not sure at the moment what else would reasonably go in with this one. |
include/clang/Driver/Options.td | ||
---|---|---|
383–384 | I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this behaviour off) more palatable than the current approach. I'd also be a lot happier about this if we can view it as a short-term workaround, with the longer-term fix being to get the host/device attributes added to standard library implementations (even if it turns out we can never actually remove this workaround in practice). If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang. | |
lib/Sema/SemaCUDA.cpp | ||
464–465 | Does nvcc do this "constexpr implies __host__ __device__" thing only for functions declared within <complex>, or for all functions? Another alternative strategy: a wrapper <complex> header that does this: #include // ... union of includes from libc++ and libstdc++ <complex> #define constexpr __host__ __device__ constexpr #include_next <complex> #undef constexpr | |
485 | Parent can't be null for a CXXMethodDecl, so just Method->getParent()->isInStdNamespace() would work. |
include/clang/Driver/Options.td | ||
---|---|---|
383–384 |
I think that
may be decent arguments for this. But I don't know if I'm a great judge of what we can legitimately claim here. | |
lib/Sema/SemaCUDA.cpp | ||
464–465 |
All functions. Although std::complex is the main use I've observed.
That one is quite clever, although I'm not sure about enumerating all of the includes from the headers. I guess that should be reasonably stable... I think I would like to get full complex support, though, if we can agree on a path towards that. The current limitation is silly, it seems clear that people want this, and the constexpr thing gives you but a shadow of the actual library. | |
479–481 | Hm, that is unfortunate. One option would be to say that we just don't support this. Otherwise we have to go down the road of identifying all the relevant functions... |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
479–481 | I've not checked GCC 5 onwards, but it looks like in the 4.x series, this is the only problem of this kind, and only affects the TR1 pieces (which it seems we probably don't need to care about supporting here). libc++ doesn't currently have any problems of this kind. Obviously it's unknown what issues we'll see with other standard library implementations. |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
464–465 | Supporting a "constexpr implies __host__ __device__" feature for all functions seems a lot cleaner than the approach taken by this patch, and will presumably improve NVCC compatibility in other cases too (though perhaps they're quite rare). This seems like a very odd pair of features to link in this way, but if we're going to have something weird like this to support existing NVCC-targeting code, using the same approach may be better. This would also mean we would not be further extending NVCC's extension. |
Okay, after much discussion, we've decided to go with --relaxed-constexpr instead of this. I have a patch for that which seems to mostly work, will send it out soon.
I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.