This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device.
For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness.
Apparently, this does not work: The generated code will call __tgt_register_lib first which caches the global RequiresFlags in Device.RTLRequiresFlags. Because __tgt_register_requires has not been called yet, the value is still 0 so the new code won't be executed. Please fix and test on your end that it works with older versions of the compiler!