Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().
The old API has been deprecated and is expected to go away
in the next CUDA release.
Could we be a little less vague, what exactly is the launch-configuration function? (Could be as simple as adding e.g. cudaFooBar().)