This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][Offloading] Fix data race in data mapping by using two locks
ClosedPublic

Authored by tianshilei1992 on Jun 18 2021, 11:00 AM.

Details

Summary

This patch tries to partially fix one of the two data race issues reported in
[1] by introducing a per-entry mutex. Additional discussion can also be found in
D104418, which will also be refined to fix another data race problem.

Here is how it works. Like before, DataMapMtx is still being used for mapping
table lookup and update. In any case, we will get a table entry. If we need to
make a data transfer (update the data on the device), we need to lock the entry
right before releasing DataMapMtx, and the issue of data transfer should be
after releasing DataMapMtx, and the entry is unlocked afterwards. This can
guarantee that: 1) issue of data movement is not in critical region, which will
not affect performance too much, and also will not affect other threads that don't
touch the same entry; 2) if another thread accesses the same entry, the state of
data movement is consistent (which requires that a thread must first get the
update lock before getting data movement information).

For a target that doesn't support async data transfer, issue of data movement is
data transfer. This two-lock design can potentially improve concurrency compared
with the design that guards data movement with DataMapMtx as well. For a target
that supports async data movement, we could simply attach the event between the
issue of data movement and unlock the entry. For a thread that wants to get the
event, it must first get the lock. This can also get rid of the busy wait until
the event pointer is valid.

Reference:
[1] https://bugs.llvm.org/show_bug.cgi?id=49940

Diff Detail

Event Timeline

tianshilei1992 requested review of this revision.Jun 18 2021, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 11:00 AM
tianshilei1992 edited the summary of this revision. (Show Details)
tianshilei1992 edited the summary of this revision. (Show Details)
openmp/libomptarget/src/device.cpp
20

Hmm, not sure why it is here. Might be included by clangd automatically.

openmp/libomptarget/src/device.h
221

IsHostPtr will be removed later by changing the return type of this function to a data structure containing flags, iterator to the entry, etc., as shown in D104382.

gValarini added inline comments.Jun 18 2021, 1:00 PM
openmp/libomptarget/src/device.cpp
206

As far as I can understand, this lock is protecting both the queries/updates to the data map tables and the data transfer issue, which indeed fixes the problem described in the patch.

But I think this can reduce the possibility of concurrent data transfers to the same device when there is no overlapping between the data regions (as discussed in 49940), especially when the device only implements the synchronous API. What do you think about implementing two locking regions?

  1. A DataMapMtx (the one already present) would be locked only when querying/updating the data map. This is mandatory since the mapping tables are global to each device and must be synchronized among the many runtime threads;
  2. A DataMapEntry (one per mapped entry, stored at the HostDataToTargetTy) would be locked during the data transfer issue stage, avoiding the data corruption problem.

This could allow multiple threads to issue (or even execute, on the sync. case) multiple data transfer while still being correctly synchronized.

Note: the second locking region must start right before DataMapMtx is unlocked, so the same thread that first allocated the data is the one that issues the data transfer. This is important because other threads that use the same data won't issue such transfer (unless the always modifier is used).

tianshilei1992 added inline comments.Jun 18 2021, 4:21 PM
openmp/libomptarget/src/device.cpp
206

Note: the second locking region must start right before DataMapMtx is unlocked, so the same thread that first allocated the data is the one that issues the data transfer. This is important because other threads that use the same data won't issue such transfer (unless the always modifier is used).

I'm not sure I understand your idea. If the second lock is inside the first lock, what's the point of it? For synchronous data movement, there is no issue stage. When the data movement is called, it just starts to copy and return when it's done.

ye-luo added a subscriber: ye-luo.Jun 18 2021, 5:18 PM
ye-luo added inline comments.
openmp/libomptarget/src/device.cpp
206

I think @gValarini means the second lock overlaps with the first lock not inside.
T1 : lock table -> lock entry A -> unlock table > issue transfer A -> unlock entry A
T2: lock table (only need to wait for the unlock of table from T1 instead of the unlock of entry A) -> lock entry B -> unlock table > issue transfer B -> unlock entry B.
In this way, transfer A and B can happen concurrently.

tianshilei1992 added inline comments.Jun 18 2021, 5:44 PM
openmp/libomptarget/src/device.cpp
206

Aha, interesting! Yeah, I get it. Pretty tricky. ;-)

introduced a per-entry lock to reduce resource contention

tianshilei1992 edited the summary of this revision. (Show Details)Jun 18 2021, 7:25 PM
tianshilei1992 retitled this revision from [OpenMP][Offloading] Guard data movement from host to device with mapping table lock to [OpenMP][Offloading] Fix data race in data mapping by using two locks.

rebase and ping

rebase and ping

What about other instances where we have access to the HostDataToTargetMap followed be data transfers, e.g. when removing a mapping and copy data back to the host? Shouldn't we apply the same logic there?

openmp/libomptarget/src/device.cpp
252–253

Redundant, if we use USM and don't have the close modifier, then MoveData will be false already.

openmp/libomptarget/src/omptarget.cpp
518–530

This part must be updated to match the changes introduced in D105121. Also, why do we set MoveData = true only when the ALWAYS flag is set? It should also be true if the mapping is new, but the problem here is that we don't know whether or not we have a new mapping because getTargetPointer (ex getOrAllocTgtPtr) hasn't been called yet. So I think MoveData should be a 3-state integer (maybe use an enum for it) which will pass instructions from targetDataBegin to getTargetPointer about what to do:

  • Case 0: false, the TO flag hasn't been set (or we are using USM without the close modifier etc.), so don't move data no matter what
  • Case 1: true, the TO and ALWAYS flags have been set, so move data
  • Case 2: undecided, the TO flag has been set but targetDataBegin doesn't know whether the mapping is new, so move data if RefCount==1

So this part of the logic should look something like this:

enum MoveState MoveData = MoveState::false;
if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
  if (!(PM->RTLs.RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY) ||
      HasCloseModifier) {
    if (arg_types[i] & OMP_TGT_MAPTYPE_ALWAYS)
      MoveData = MoveState::true;
    else
      MoveData = MoveState::undecided;
  }
}
564–565

Here we also need to take care of this data transfer. Copying the pointee object onto the device and updating the device pointer should behave as a single atomic operation.

tianshilei1992 added inline comments.Jul 13 2021, 6:07 PM
openmp/libomptarget/src/omptarget.cpp
518–530

My intention is to cover new entry in getTargetPointer, but I did forget to set MoveData to true in the function if there is a new entry.

tianshilei1992 marked 4 inline comments as done.Jul 17 2021, 5:57 PM

What about other instances where we have access to the HostDataToTargetMap followed be data transfers, e.g. when removing a mapping and copy data back to the host? Shouldn't we apply the same logic there?

I didn't come up with a case that can cause data race with D2H because we have a mandatory synchronization at the end of target region.

grokos added inline comments.Jul 19 2021, 3:54 AM
openmp/libomptarget/src/device.cpp
266

This is not correct either. getTargetPointer() doesn't know whether the TO flag is set, yet it will perform a data transfer even if we just map(alloc : ...) as long as we are mapping a new entry. That's why I proposed that you use a 3-state integer instead of a bool for MoveData.

openmp/libomptarget/src/device.cpp
266

Oh, you're right. I'll update it.

update the logic to determine if data movement is needed

openmp/libomptarget/src/omptarget.cpp
518–530

I refined the logic but seems like it almost broke all declare mapper tests…Need to investigate further.

tianshilei1992 marked an inline comment as done.Jul 22 2021, 3:53 PM
tianshilei1992 marked an inline comment as done.

fixed shadow mapping

Ping. Hope the bug fix could go into 13.

grokos accepted this revision.Jul 23 2021, 1:09 PM

This patch looks good now! Let's commit it and then move on with D104418.

This revision is now accepted and ready to land.Jul 23 2021, 1:09 PM