Creating a debug option to synchronize GPU kernel launches and data transfers immediately. Done through an environment variable. Currently done in the common plugin interface, so hopefully it would be applicable to all architectures. We instead could do it inside the individual architecture implementations instead, such as having a "cudaStreamSynchronize" call immediately after the "cudaLaunchKernel", for example. Though I think the way I have it now is basically equivalent and doesn't need to be done for each architecture.]
The env variable for now is "LIBOMPTARGET_FORCE_SYNCHRONIZE"
Please mention the default value.