The entries inside a "target data end" is processed in three steps:
- Query internal data maps for the entries and dispatch any necessary device-side operations (i.e., data retrieval);
- Synchronize the such operations;
- Update the host-side pointers and remove any entry which reference counter reached zero.
Such steps may be executed by multiple threads which may even operate on
the same entries. The current implementation (D121058) tries to
synchronize these threads by tracking the "owner" for the deletion of
each entry using their thread ID. Unfortunately it may failed to do so
because of the following reasons:
- The owner is always assigned at the first step only if the reference count is 0 when the map is queried. This does not work when such owner thread is faster than a previous one that is also processing the same entry on another "target data end", leading to user-after-free problems.
- The entry is only added for post-processing (step 3) if its reference count was 0 at query time (step 1). This does not allow for threads to exchange responsibility for the deletion, leading again to user-after-free problems.
- An entry may appear multiple times in the arguments array of a "target data end", which may lead to deleting the entry prematurely, leading, again, to user-after-free problems.
This patch addresses these problems by tracking all the threads that are
using an entry at "target data end" region through a counter, ensuring
only the last one deletes it when needed. It also ensures that all
entries that are successfully found inside the data maps in step 1 are
also processed in step 3, regardless if their reference count was zeroed
or not at query time. This ensures the deletion ownership may be passed
to any thread that is using such entry.
Make a separate header, this is reusable.