Standard libc++ headers in stdc++ mode include <new> which picks up
cuda_wrappers/new before any of the CUDA macros have been defined.
We can not include CUDA headers that early, so the work-around is to
postpone the CUDA-specific parts until after <new> is included from user
sources and all CUDA-related macros and declarations are available.