This patch unifies our libomptarget API in two ways:
- always pass a __tgt_async_info object, the Queue member decides if it is in use or not.
- (almost) always synchronize in the interface layer and not in the omptarget layer.
A side effect is that we now put all constructor and static initializer
kernels in a stream too, if the device utilizes __tgt_async_info.
The patch contains a TODO which can be addressed as we add support for
asynchronous malloc and free in the plugin API. This is the only
synchronizeAsyncInfo left in the omptarget layer.
Site note: On a V100 system the GridMini performance for small sizes
more than doubled.