Rebase D59319 on trunk
Creating a new diff for convenience instead of modifying D59319.
Changes from D59319:
Fix up includes and cmake to match trunk
Minor fixes to comments
Move the shared_buffer struct into target_region.cu
__kmpc_target_region_kernel_parallel change UseSPMDMode from
uint16_t to int16_t, as it's called with (int)-1 from clang patch.
Change memcpy to builtin_memcpy as <cstring> memcpy may be unavailable
Is this still CUDA?