Don't outline the kernel in the test file as this prevent some debug info from being stripped out. Cuda driver doesn't support PTX with debug info causing conversion to cubin to fail.
The test had started failing with https://github.com/llvm/llvm-project/commit/81467f500f6ad106a69088bc276024c5e1938571. I'll also enable those tests in google build bots that have Tesla T4 GPUs once this is fixed.
Thanks! this is something I wasn't aware of. BTW I tested these on a Turing with CUDA10.2, and they passed, but maybe they fail on some other devices.
This is I think present since this file was added but not required anymore. Can you please drop this? Or should I remove this in a subsequent patch?