[CUDA] Unbreak CUDA compilation with -std=c++20
Standard libc++ headers in stdc++ mode include <new> which picks up
cuda_wrappers/new before any of the CUDA macros have been defined.
We can not include CUDA headers that early, so the work-around is to define
device in the wrapper header itself.
Differential Revision: https://reviews.llvm.org/D91807