This is an archive of the discontinued LLVM Phabricator instance.

Add gpu::HostUnregisterOp
ClosedPublic

Authored by makslevental on Mar 30 2023, 2:49 PM.

Details

Summary

Without explicitly unregistering you will get

'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'

in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).

Diff Detail

Event Timeline

makslevental created this revision.Mar 30 2023, 2:49 PM
makslevental requested review of this revision.Mar 30 2023, 2:49 PM
makslevental added inline comments.Mar 30 2023, 2:51 PM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
167 ↗(On Diff #509825)

As far as I can tell, the calling convention requires that the first argument passed is the rank of the memref and thus the hostUnregisterCallBuilder must be exactly the same (even though rank isn't used in the actual runtime call).

makslevental edited the summary of this revision. (Show Details)Mar 30 2023, 2:51 PM

Can you upload the patch with full context?

Otherwise it seems symmetrical to the host_register op, so that seems like an obvious good addition to me, but it'd be best if someone else focusing on GPUs could confirm!

makslevental updated this revision to Diff 509854.
ftynse accepted this revision.Apr 6 2023, 5:01 AM
ftynse added inline comments.
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
472 ↗(On Diff #509876)

Please expand auto unless there is a cast on the RHS or the type is difficult to spell (lambdas, iterators).

167 ↗(On Diff #509825)

Unranked memref gets lowered to { i64, ptr } in LLVM, where the first element is the rank and the second is a pointer to the ranked descriptor. At function boundary, the struct is unpacked into two individual arguments, hence the interface you see.

This revision is now accepted and ready to land.Apr 6 2023, 5:01 AM
makslevental marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.