Page MenuHomePhabricator

[CUDA] add support for the new kernel launch API in CUDA-9.2+.

Authored by tra on Jan 30 2019, 4:36 PM.



Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().

The old API has been deprecated and is expected to go away
in the next CUDA release.

Diff Detail

rC Clang

Event Timeline

tra created this revision.Jan 30 2019, 4:36 PM
jlebar accepted this revision.Jan 30 2019, 5:05 PM

LGTM, mostly nits.

10316 ↗(On Diff #184405)

Could we be a little less vague, what exactly is the launch-configuration function? (Could be as simple as adding e.g. cudaFooBar().)

201 ↗(On Diff #184405)

nit in a local array

212 ↗(On Diff #184405)

Nit, s/1UL/uint64{1}/ or size_t, whatever this function takes. As-is we're baking in the assumption that unsigned long is the same as the type returned by Args.size(), which isn't necessarily true.

As an alternative, you could do std::max<size_t>(1, Args.size()) or whatever the appropriate type is.

239 ↗(On Diff #184405)

Unfixed FIXME?

260 ↗(On Diff #184405)

I see lots of references to __cudaPushCallConfiguration, but this is the only reference I see to __cudaPopCallConfiguration. Is this a typo? Also are we supposed to emit matching push and pop function calls? Kind of weird to do one without the other...

266 ↗(On Diff #184405)

Whitespace nit, maybe move this whitespace line before the comment?

429 ↗(On Diff #184405)

s/undocumented function/this undocumented function/?

This revision is now accepted and ready to land.Jan 30 2019, 5:05 PM
tra updated this revision to Diff 184543.Jan 31 2019, 10:29 AM
tra marked 8 inline comments as done.
tra edited the summary of this revision. (Show Details)

Addressed Justin's comments.

tra added inline comments.Jan 31 2019, 10:37 AM
239 ↗(On Diff #184405)

Fixed the comment. :-)
There's not much we can do if we have no declaration for cudaLaunchKernel, so throwing the error here is the best we can do.

260 ↗(On Diff #184405)

the pop part is indeed used only here.
Push is something that takes user-specified parameters, so we get Sema to check them.
Pop is much simpler and does not have any direct user exposure, so we can just create and use it here.

As for matching, it is balanced. Push is called at the kernel launch site with the parameters of <<<>>> .Pop is done in the host-side kernel stub where we retrieve those parameters and pass them to the CUDA runtime.

Essentially, push/pop are poor names for these functions are the nesting is never more than one level deep. We could've just stashed the arguments in a fixed buffer somewhere.

tra updated this revision to Diff 184592.Jan 31 2019, 1:24 PM

Updated ASTMatchers unit test.

This revision was automatically updated to reflect the committed changes.