[libomptarget] Refactor macros from omptarget-nvptx to inline functions
Removes architecture dependent macros from omptarget-nvptx.h in favour of inline
functions in target-impl.h. Inlines the sole use of #define __SYNCTHREADS_N.
Uses explicit types in preference to the int / unsigned of the cuda API.
An alternative to the inline #ifdef would be:
#if CUDA_VERSION >= 9000
#include "cuda_9/target_impl.h"
#else
#include "tbd/target_impl.h"
#endif
Tested by disassembling libomptarget-nvptx.a with and without. No codegen change.
This is wrong, just call __syncthreads() here.