[libomptarget][devicertl] Port amdgcn devicertl to openmp
Uses macros to abstract over cuda and openmp syntax for shared variables.
Cuda ignores unknown #pragma, so that macro plus declare target is mostly
sufficient. Nvptx still compiles as cuda for now.
This change is localised to the runtime. It can be improved later by changing
the compiler, e.g. to implicitly wrap a CU with the pragma target, or to bring
the intrinsics into scope implicitly.
Trunk clang doesn't accept these flags just yet, but trunk clang also refuses to compile this code as -x hip so this isn't much of a regression.