[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant
The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95752