Current OpenMP wrapper header __clang_openmp_device_functions.h
doesn't include the header for CUDA builtin vars, so variable like threadIdx
cannot be used in OpenMP code, even within declare target.
This patch includes the header. One thing left is, is it fine that we still use
the name __clang_openmp_device_functions.h? Those builtin vars seems like not
part of "device functions".
Perhaps we should move all C++-related code under #ifdef __cplusplus intead of cherry-picking them all one by one and let the compilation fail if some C code references builtin vars.