If the plugin supports asynchronous malloc and free we can avoid early
synchronization and hide more runtime work as part of the ongoing kernel
execution.
Details
- Reviewers
jhuber6 ggeorgakoudis tianshilei1992 ye-luo
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h | ||
---|---|---|
146–152 | We aren't we passing an AsyncInfo object as an extra argument to deleteOnDevice and subsequently call DeviceAllocator.free() with that AsyncInfo instead of nullptr? I suspect it's because we need to make room for subsequent allocations before attempting to re-allocate, if that's the case can you add a comment to make it clearer? | |
296 | If there is a reason why deleteOnDevice() cannot be called with an extra AsyncInfo argument, we are we adding it here? It's not used anywhere in the body of free(). |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
1517 | why would we need a sync before calling data alloc? |
openmp/libomptarget/include/device.h | ||
---|---|---|
305 | What about async malloc? | |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
321 | pass StreamPoolTy& as an argument and there is no need of DeviceId | |
371 | This class is one object per device. I would expect removing DeviceId replace vectors with a single element. | |
openmp/libomptarget/src/device.cpp | ||
290 | The current Event is intended for D2H, you expanded its usage. In case of one record after alloc and another record after the transfer. I think this will go wrong. Once the event is fulfilled after alloc. other thread may think the transfer has been completed. |
openmp/libomptarget/include/device.h | ||
---|---|---|
305 | I'll rename. | |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
321 | sure. | |
371 | That is somewhat orthogonal, a cleanup for all this is following. | |
openmp/libomptarget/src/device.cpp | ||
290 | Right. So checking if we also do D2H would partially help but not completely. The stickiness of the event is still a problem. It might also be for the pure D2H case with multiple transfers as only the first is properly guarded.
|
openmp/libomptarget/src/device.cpp | ||
---|---|---|
290 | Sorry I meant H2D not D2H. But seems you got my point. Both 1 and 2 are desired. I thought about 2 when the event was introduced but I didn't immediately figure out what is the best location to do 2. |
cuDeviceGetAttribute has to be used to check if the device supports memory pool; otherwise runtime error will be emitted.
What about async malloc?