If the plugin supports asynchronous malloc and free we can avoid early
synchronization and hide more runtime work as part of the ongoing kernel
We aren't we passing an AsyncInfo object as an extra argument to deleteOnDevice and subsequently call DeviceAllocator.free() with that AsyncInfo instead of nullptr? I suspect it's because we need to make room for subsequent allocations before attempting to re-allocate, if that's the case can you add a comment to make it clearer?
If there is a reason why deleteOnDevice() cannot be called with an extra AsyncInfo argument, we are we adding it here? It's not used anywhere in the body of free().
What about async malloc?
pass StreamPoolTy& as an argument and there is no need of DeviceId
This class is one object per device. I would expect removing DeviceId replace vectors with a single element.
The current Event is intended for D2H, you expanded its usage. In case of one record after alloc and another record after the transfer. I think this will go wrong. Once the event is fulfilled after alloc. other thread may think the transfer has been completed.
That is somewhat orthogonal, a cleanup for all this is following.
So checking if we also do D2H would partially help but not completely. The stickiness of the event is still a problem. It might also be for the pure D2H case with multiple transfers as only the first is properly guarded.
Sorry I meant H2D not D2H. But seems you got my point. Both 1 and 2 are desired. I thought about 2 when the event was introduced but I didn't immediately figure out what is the best location to do 2.