This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] add support for the new kernel launch API in CUDA-9.2+.
ClosedPublic

Authored by tra on Jan 30 2019, 4:36 PM.

Details

Summary

Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().

The old API has been deprecated and is expected to go away
in the next CUDA release.

Event Timeline

tra created this revision.Jan 30 2019, 4:36 PM
jlebar accepted this revision.Jan 30 2019, 5:05 PM

LGTM, mostly nits.

clang/include/clang/Sema/Sema.h
10316

Could we be a little less vague, what exactly is the launch-configuration function? (Could be as simple as adding e.g. cudaFooBar().)

clang/lib/CodeGen/CGCUDANV.cpp
201–297

nit in a local array

212

Nit, s/1UL/uint64{1}/ or size_t, whatever this function takes. As-is we're baking in the assumption that unsigned long is the same as the type returned by Args.size(), which isn't necessarily true.

As an alternative, you could do std::max<size_t>(1, Args.size()) or whatever the appropriate type is.

239

Unfixed FIXME?

260

I see lots of references to __cudaPushCallConfiguration, but this is the only reference I see to __cudaPopCallConfiguration. Is this a typo? Also are we supposed to emit matching push and pop function calls? Kind of weird to do one without the other...

266

Whitespace nit, maybe move this whitespace line before the comment?

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
429

s/undocumented function/this undocumented function/?

This revision is now accepted and ready to land.Jan 30 2019, 5:05 PM
tra updated this revision to Diff 184543.Jan 31 2019, 10:29 AM
tra marked 8 inline comments as done.
tra edited the summary of this revision. (Show Details)

Addressed Justin's comments.

tra added inline comments.Jan 31 2019, 10:37 AM
clang/lib/CodeGen/CGCUDANV.cpp
239

Fixed the comment. :-)
There's not much we can do if we have no declaration for cudaLaunchKernel, so throwing the error here is the best we can do.

260

the pop part is indeed used only here.
Push is something that takes user-specified parameters, so we get Sema to check them.
Pop is much simpler and does not have any direct user exposure, so we can just create and use it here.

As for matching, it is balanced. Push is called at the kernel launch site with the parameters of <<<>>> .Pop is done in the host-side kernel stub where we retrieve those parameters and pass them to the CUDA runtime.

Essentially, push/pop are poor names for these functions are the nesting is never more than one level deep. We could've just stashed the arguments in a fixed buffer somewhere.

tra updated this revision to Diff 184592.Jan 31 2019, 1:24 PM

Updated ASTMatchers unit test.

This revision was automatically updated to reflect the committed changes.