This patch unifies our libomptarget API in two ways:
- always pass a __tgt_async_info object, the Queue member decides if it is in use or not.
- (almost) always synchronize in the interface layer and not in the omptarget layer.
A side effect is that we now put all constructor and static initializer
kernels in a stream too, if the device utilizes __tgt_async_info.
The patch contains a TODO which can be addressed as we add support for
asynchronous malloc and free in the plugin API. This is the only
synchronizeAsyncInfo left in the omptarget layer.
Site note: On a V100 system the GridMini performance for small sizes
more than doubled.
It's good to unify the usage of device id and device object. Probably better to separate it to another patch.