Without explicitly unregistering you will get
'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'
in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).
Paths
| Differential D147277
Add gpu::HostUnregisterOp ClosedPublic Authored by makslevental on Mar 30 2023, 2:49 PM.
Details Summary Without explicitly unregistering you will get 'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED' in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).
Diff Detail
Event Timeline
Comment Actions Otherwise it seems symmetrical to the host_register op, so that seems like an obvious good addition to me, but it'd be best if someone else focusing on GPUs could confirm! ftynse added inline comments.
This revision is now accepted and ready to land.Apr 6 2023, 5:01 AM makslevental marked an inline comment as done. Closed by commit rG8f7c8a6ea765: Add gpu::HostUnregisterOp (authored by makslevental). · Explain WhyApr 6 2023, 1:13 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 509825 mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cppmlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
|
As far as I can tell, the calling convention requires that the first argument passed is the rank of the memref and thus the hostUnregisterCallBuilder must be exactly the same (even though rank isn't used in the actual runtime call).