Basic CUDA platform implementation and cmake infrastructur to control whether
it's used. A few important TODOs will be handled in later patches:
- Log some error messages that can't easily be returned as Errors.
- Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded.
- Tolerate shared memory arguments for kernel launches.