If the plugin supports asynchronous malloc and free we can avoid early
synchronization and hide more runtime work as part of the ongoing kernel
execution.
Details
- Reviewers
jhuber6 ggeorgakoudis tianshilei1992 ye-luo
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| openmp/libomptarget/plugins/common/MemoryManager/MemoryManager.h | ||
|---|---|---|
| 146–152 | We aren't we passing an AsyncInfo object as an extra argument to deleteOnDevice and subsequently call DeviceAllocator.free() with that AsyncInfo instead of nullptr? I suspect it's because we need to make room for subsequent allocations before attempting to re-allocate, if that's the case can you add a comment to make it clearer? | |
| 296 | If there is a reason why deleteOnDevice() cannot be called with an extra AsyncInfo argument, we are we adding it here? It's not used anywhere in the body of free(). | |
| openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
|---|---|---|
| 1517 | why would we need a sync before calling data alloc? | |
| openmp/libomptarget/include/device.h | ||
|---|---|---|
| 305 | What about async malloc? | |
| openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
| 321 | pass StreamPoolTy& as an argument and there is no need of DeviceId | |
| 371 | This class is one object per device. I would expect removing DeviceId replace vectors with a single element. | |
| openmp/libomptarget/src/device.cpp | ||
| 290 | The current Event is intended for D2H, you expanded its usage. In case of one record after alloc and another record after the transfer. I think this will go wrong. Once the event is fulfilled after alloc. other thread may think the transfer has been completed. | |
| openmp/libomptarget/include/device.h | ||
|---|---|---|
| 305 | I'll rename. | |
| openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
| 321 | sure. | |
| 371 | That is somewhat orthogonal, a cleanup for all this is following. | |
| openmp/libomptarget/src/device.cpp | ||
| 290 | Right. So checking if we also do D2H would partially help but not completely. The stickiness of the event is still a problem. It might also be for the pure D2H case with multiple transfers as only the first is properly guarded.
| |
| openmp/libomptarget/src/device.cpp | ||
|---|---|---|
| 290 | Sorry I meant H2D not D2H. But seems you got my point. Both 1 and 2 are desired. I thought about 2 when the event was introduced but I didn't immediately figure out what is the best location to do 2. | |
cuDeviceGetAttribute has to be used to check if the device supports memory pool; otherwise runtime error will be emitted.
What about async malloc?