This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Added a wrapper header for inclusion of stock CUDA headers.
ClosedPublic

Authored by tra on Sep 25 2015, 10:58 AM.

Details

Summary

Header files that come with CUDA are assuming split host/device compilation and are not usable by clang out of the box.
With a bit of preprocessor magic it's possible to twist them into a form usable by clang after D13170 and D13144 land.

Diff Detail

Repository
rL LLVM

Event Timeline

tra updated this revision to Diff 35739.Sep 25 2015, 10:58 AM
tra retitled this revision from to [CUDA] Added a wrapper header for inclusion of stock CUDA headers..
tra updated this object.
tra added reviewers: echristo, eliben, jholewinski.
tra added a subscriber: cfe-commits.
tra updated this revision to Diff 35747.Sep 25 2015, 11:35 AM

Fixed typos and whitespace nits.
use #pragma push_macro for CUDACC_RTC, too.

eliben added inline comments.Sep 25 2015, 12:14 PM
lib/Headers/clang_cuda_support.h
53 ↗(On Diff #35747)

If this includes CUDA headers, maybe you can #error out if the CUDA version isn't good?

tra updated this revision to Diff 35760.Sep 25 2015, 1:14 PM

Include cuda.h and #error if we see wrong CUDA_VERSION.

tra marked an inline comment as done.Sep 25 2015, 1:14 PM
eliben accepted this revision.Sep 25 2015, 1:23 PM
eliben edited edge metadata.

lgtm

This revision is now accepted and ready to land.Sep 25 2015, 1:23 PM
echristo edited edge metadata.Sep 26 2015, 12:20 PM

Bikeshed: it's part of the clang headers, do we really need "clang" in the header name?

-eric

tra added a subscriber: tra.Sep 28 2015, 12:52 PM

The (vague) idea was to make clear that the header is *not* part of cuda
distribution.

That said, the file could use a better name.

Do any of these sound better?

  • fix_cuda_headers.h
  • adapt_cuda_headers.h
  • cuda_shim.h

--Artem

Why not just call it cuda.h and use #include next for it and then fix it up?

-eric

tra added a comment.Sep 28 2015, 6:43 PM

cuda_runtime.h may be a better choice. nvcc -includes it during both host
and device compilation and this wrapper file is intended to serve a similar
purpose and will probably end up being -included by cc1 in the end.

--Artem

tra updated this revision to Diff 36048.Sep 29 2015, 4:24 PM
tra edited edge metadata.

Renamed wrapper to cuda_runtime.h
Similarly to nvcc, automatically add "-include cuda_runtime.h" to CC1 invocations unless -nocudainc is specified.

tra updated this revision to Diff 37912.Oct 20 2015, 12:44 PM

Changed header wrapping strategy. Previous version was attempting to
make CUDA headers work for host/device compilations separately. In the
end host and device compilations ended up with different view of
CUDA-provided functions. While it mostly worked, that is not what we
really want. What we want is to have identical view of device-specific
functions in both cases and let function overloading handle name clashes
between host and device functions.

This wrapper now always includes CUDA headers exactly the same way during
host and device compilation passes and produces identical preprocessed
content during host and device side compilation for sm_35 GPUs. Device
compilation passes for older GPUs will see a smaller subset of device
functions supported by particular GPU.

As a bonus this wrapper works with CUDA 7.5 now.

I'm ignoring the content of the header, but this seems to be a not terrible way to do things. I gather that cuda_runtime.h is something that's typically included by the driver by nvidia and not the client?

Also, tests?

-eric

tra added a comment.Oct 21 2015, 1:18 PM

I'm ignoring the content of the header, but this seems to be a not terrible way to do things. I gather that cuda_runtime.h is something that's typically included by the driver by nvidia and not the client?

Correct. cuda_runtime.h (and all it pulls in) is -include'd under the hood by nvcc.

Also, tests?

I'll add a test to verify that "-include cuda_runtime.h" shows up on cc1 command line where/when it's expected.

What would be a good way to test the wrapper itself within clang tree without real CUDA headers?

I've done fair amount of manual testing outside of clang source tree.

  • manual comparison of preprocessed output from cuda_runtime.h between host and device passes.
  • compiled 39 out of 46 thrust examples and verified that they produce output identical to nvcc-compiled binaries.
echristo accepted this revision.Oct 21 2015, 1:21 PM
echristo edited edge metadata.
In D13171#272441, @tra wrote:

I'm ignoring the content of the header, but this seems to be a not terrible way to do things. I gather that cuda_runtime.h is something that's typically included by the driver by nvidia and not the client?

Correct. cuda_runtime.h (and all it pulls in) is -include'd under the hood by nvcc.

Also, tests?

I'll add a test to verify that "-include cuda_runtime.h" shows up on cc1 command line where/when it's expected.

Ick.

What would be a good way to test the wrapper itself within clang tree without real CUDA headers?

Hrm. Maybe a set of inputs that stub out things? Hard really.

I've done fair amount of manual testing outside of clang source tree.

  • manual comparison of preprocessed output from cuda_runtime.h between host and device passes.
  • compiled 39 out of 46 thrust examples and verified that they produce output identical to nvcc-compiled binaries.

Cool.

LGTM with those changes and give a thought at how to test this in tree better.

Thanks!

-eric

tra updated this revision to Diff 38574.Oct 27 2015, 11:26 AM
tra edited edge metadata.

Added test cases for force-including of cuda_runtime.h
Tweaked inclusion of one header due to use of default arguments.

jingyue added a subscriber: jingyue.Nov 6 2015, 8:58 AM
This revision was automatically updated to reflect the committed changes.