Basic CUDA platform implementation and cmake infrastructur to control whether
it's used. A few important TODOs will be handled in later patches:
- Log some error messages that can't easily be returned as Errors.
- Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded.
- Tolerate shared memory arguments for kernel launches.
Not necessarily in this patch, but we should document how to configure this with CUDA enabled and (see below) how to point cmake at different CUDA installs.