[libomptarget] Fix devicertl build
The target specific functions in target_interface are extern C, but the
implementations for nvptx were mostly C++ mangling. That worked out as
a quirk of DEVICE macro expanding to nothing, except for shuffle.h which
only forward declared the functions with C++ linkage.
Also implements GetWarpSize, as used by shuffle, and includes target_interface
in nvptx target_impl.cu to help catch future divergence between interface and
implementation.
clang-tidy: error: use of undeclared identifier '__builtin_amdgcn_ds_bpermute' [clang-diagnostic-error]
not useful