This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Avoid deleting the same entry multiple times
Needs ReviewPublic

Authored by ye-luo on Aug 23 2022, 10:44 PM.

Details

Reviewers
jdoerfert
Summary

Once refcount drops to 0, remove the entry from the HDTTMap but delay the deletion of the actual entry.

Diff Detail

Event Timeline

ye-luo created this revision.Aug 23 2022, 10:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 10:44 PM
ye-luo requested review of this revision.Aug 23 2022, 10:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 10:44 PM

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

gValarini added a comment.EditedAug 24 2022, 7:39 AM

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I believe what @jdoerfert is trying to say is that the race will happen between a thread deleting the device data and another making the data transfer. Imagine the following case:

  • Thread A: is not the "owner" (IsOwned == false) of the entry but will do a data transfer.
  • Thread B: is the "owner" (IsOwned == true) of the entry and will delete it.

If A is switched out of execution right after getTgtPtrBegin returns, B might remove the data before A starts any data transfer. This will lead to a segfault on the device side. Although it seems to be a problem with the user program, I think it should lead to data corruption and not a segfault. I also believe this may happen on correct programs, but I cannot think of a simple example right now.

I was planning to send a patch with this same fix this week. It was quite similar to this one, but it also had a new reference counter designed to track threads using the entry on the data end function. This way, multiple threads might get the entry, but only the last one would delete the buffer and the entry data.

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I still believe there is a problem.

T1: copy back old device memory into HostMem after entry has been deleted
T2: copy from HostMem into some new device region as the entry is new

We cannot use events or anything to prevent this race on the host memory and since maps should be atomic we break the program.

Where am I wrong?

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I believe what @jdoerfert is trying to say is that the race will happen between a thread deleting the device data and another making the data transfer. Imagine the following case:

  • Thread A: is not the "owner" (IsOwned == false) of the entry but will do a data transfer.
  • Thread B: is the "owner" (IsOwned == true) of the entry and will delete it.

If A is switched out of execution right after getTgtPtrBegin returns, B might remove the data before A starts any data transfer. This will lead to a segfault on the device side. Although it seems to be a problem with the user program, I think it should lead to data corruption and not a segfault. I also believe this may happen on correct programs, but I cannot think of a simple example right now.

I was planning to send a patch with this same fix this week. It was quite similar to this one, but it also had a new reference counter designed to track threads using the entry on the data end function. This way, multiple threads might get the entry, but only the last one would delete the buffer and the entry data.

Your example is valid. I think we probably need to distinguish the transfer not caused by refcount to 0 and transfer caused by refcount to 0.
transfer not caused by refcount to 0 needs to be placed before actually adjusting the recount. I feel we need probably write down a few rules before devising a scheme.
Another more conservative route I though about is only schedule ahead transfers of refcount=INF.

tracking threads. My quick take will be another can of worms. I need to think about it when your patch shows up.

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I still believe there is a problem.

T1: copy back old device memory into HostMem after entry has been deleted
T2: copy from HostMem into some new device region as the entry is new

We cannot use events or anything to prevent this race on the host memory and since maps should be atomic we break the program.

Where am I wrong?

You are right. There is potential data corruption. The H2D to a new device memory location may start before D2H completes.

tracking threads. My quick take will be another can of worms. I need to think about it when your patch shows up.

I have just submitted patch D132676 with the implementation that I described before. I believe that by tracking the threads in a scoped manner, we may reduce the work of maintaining the solution. What do you think?