HomePhabricator

[OpenMP] Improve D2D memcpy to use more efficient driver API

Authored by tianshilei1992 on Jun 4 2020, 1:58 PM.

Description

[OpenMP] Improve D2D memcpy to use more efficient driver API

Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:

  • Same devices: this is the D2D memcpy in the CUDA context.
  • Different devices: this is the PeerToPeer memcpy in the CUDA context.

My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649

Details