Details
Diff Detail
Event Timeline
lib/Headers/clang_cuda_support.h | ||
---|---|---|
54 | If this includes CUDA headers, maybe you can #error out if the CUDA version isn't good? |
Bikeshed: it's part of the clang headers, do we really need "clang" in the header name?
-eric
The (vague) idea was to make clear that the header is *not* part of cuda
distribution.
That said, the file could use a better name.
Do any of these sound better?
- fix_cuda_headers.h
- adapt_cuda_headers.h
- cuda_shim.h
--Artem
cuda_runtime.h may be a better choice. nvcc -includes it during both host
and device compilation and this wrapper file is intended to serve a similar
purpose and will probably end up being -included by cc1 in the end.
--Artem
Renamed wrapper to cuda_runtime.h
Similarly to nvcc, automatically add "-include cuda_runtime.h" to CC1 invocations unless -nocudainc is specified.
Changed header wrapping strategy. Previous version was attempting to
make CUDA headers work for host/device compilations separately. In the
end host and device compilations ended up with different view of
CUDA-provided functions. While it mostly worked, that is not what we
really want. What we want is to have identical view of device-specific
functions in both cases and let function overloading handle name clashes
between host and device functions.
This wrapper now always includes CUDA headers exactly the same way during
host and device compilation passes and produces identical preprocessed
content during host and device side compilation for sm_35 GPUs. Device
compilation passes for older GPUs will see a smaller subset of device
functions supported by particular GPU.
As a bonus this wrapper works with CUDA 7.5 now.
I'm ignoring the content of the header, but this seems to be a not terrible way to do things. I gather that cuda_runtime.h is something that's typically included by the driver by nvidia and not the client?
Also, tests?
-eric
Correct. cuda_runtime.h (and all it pulls in) is -include'd under the hood by nvcc.
Also, tests?
I'll add a test to verify that "-include cuda_runtime.h" shows up on cc1 command line where/when it's expected.
What would be a good way to test the wrapper itself within clang tree without real CUDA headers?
I've done fair amount of manual testing outside of clang source tree.
- manual comparison of preprocessed output from cuda_runtime.h between host and device passes.
- compiled 39 out of 46 thrust examples and verified that they produce output identical to nvcc-compiled binaries.
Ick.
What would be a good way to test the wrapper itself within clang tree without real CUDA headers?
Hrm. Maybe a set of inputs that stub out things? Hard really.
I've done fair amount of manual testing outside of clang source tree.
- manual comparison of preprocessed output from cuda_runtime.h between host and device passes.
- compiled 39 out of 46 thrust examples and verified that they produce output identical to nvcc-compiled binaries.
Cool.
LGTM with those changes and give a thought at how to test this in tree better.
Thanks!
-eric
Added test cases for force-including of cuda_runtime.h
Tweaked inclusion of one header due to use of default arguments.
If this includes CUDA headers, maybe you can #error out if the CUDA version isn't good?