This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Remove shadow pointer map and introduce consistent locking
ClosedPublic

Authored by jdoerfert on Apr 8 2022, 10:16 PM.

Details

Summary

The shadow pointer map was problematic as we scanned an entire list if
an entry had shadow pointers. The new scheme stores the shadow pointers
inside the entries. This allows easy access without any search. It also
helps us, but also makes it necessary, to define a consistent locking
scheme. The implicit locking of entries via TargetPointerResultTy makes
this pretty effortless, however one has to:

  • Lock HDTTMap before locking an entry.
  • Do not lock HDTTMap while holding an entry lock.
  • Hold the entry lock to read or modify an entry.

The changes to submitData and retrieveData have been made to ensure 2
when the LIBOMPTARGET_INFO flag is used. Most everything else happens by
itself as TargetPointerResultTy acts as a lock_guard for the entry. It
is a little complicated when we deal with multiple entries, especially
as they can be equal. However, one can still follow the rules with
reasonable effort.

LookupResult are now finally also looking the entry before it is
inspected. This is good even if we haven't run into a problem yet.

Diff Detail

Event Timeline

jdoerfert created this revision.Apr 8 2022, 10:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:16 PM
jdoerfert requested review of this revision.Apr 8 2022, 10:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 10:16 PM
Herald added a subscriber: sstefan1. · View Herald Transcript
jdoerfert updated this revision to Diff 421690.Apr 8 2022, 10:19 PM

Improve comment

tianshilei1992 added inline comments.Apr 13 2022, 9:40 AM
openmp/libomptarget/include/device.h
101

We don't use the order here, right? If so, unordered_map can usually get better performance.

jdoerfert added inline comments.Apr 13 2022, 11:03 AM
openmp/libomptarget/include/device.h
101

We do not use the order, correct. I can change it though I doubt it makes much of a difference.

jdoerfert added inline comments.Apr 13 2022, 12:53 PM
openmp/libomptarget/include/device.h
101

That comparison is not helpful. Our expected size is *probably* < 10.

tianshilei1992 added inline comments.Apr 13 2022, 1:04 PM
openmp/libomptarget/include/device.h
101

It also says:

TL;DR

Results: the unordered_map shows its superiority as soon as there is data in the map. The only time it exhibits worse performance than the ordered map is when the maps are empty.

The answer also shows the search speed is different, not just insertion speed.

Actually, based on all the uses of ShadowPtrInfos, it doesn't have to be a map at all because we never need the key-value relation. It can be simply an unordered_set<ShadowPtrInfoTy> hashed by ShadowPtrInfoTy::HstPtrAddr. The only place we need the "hash" feature is when adding an element.

The new scheme stores the shadow pointers inside the entries.

Makes a lot more sense now.

openmp/libomptarget/include/device.h
378

I don't get the intention of this function. Need documentation. It also seems buggy. Don't you need to unlock Entry before assign?

388

Please delete copy constructor and copy assign operator.

415

Do you know what this is? Add some documentation?

I'm afraid you are abusing existing mutex. Please be specific about what part of the Entry you are trying to protect with TPR lock/unlock.

openmp/libomptarget/include/device.h
375
void lock() const { States->UpdateMtx.lock(); }
void unlock() const { States->UpdateMtx.unlock(); }

the mutex is only intended to protect data transfer submission.

jdoerfert added inline comments.Apr 14 2022, 1:14 PM
openmp/libomptarget/include/device.h
375

The mutex is now used to protect the entire entry. I will update the comment.
While the data transfer is running the entry should not be looked at, similarly, while anything might modify the entry, the entry should not be looked at. Hence, one can only get the entry locked.

378

It's not buggy, IMHO. Assignment will swap the entry, which may or may not be locked. The new object goes out of scope immediately which will trigger the destructor. If there is an entry it will be unlocked, if there was none, all is already good.

This function is needed to give up ownership of an entry early, that is while the TPR is still live but effectively not needed anymore.

388

Aren't they deleted anyway? I can make it explicit, sure.

415

We do know what this is. And we should add some documentation. We should do so in a separate patch though.

ye-luo added inline comments.Apr 14 2022, 1:37 PM
openmp/libomptarget/include/device.h
378

You are right. The new object will be responsible to call the unlock via the destructor. So this is correct.

388

I'm not sure if they are deleted by default. So helpful to make them deleted.

415

Then do it separately. Probably also for PendingCtorsDtorsPerLibrary PendingCtorsDtors;

jdoerfert updated this revision to Diff 423309.Apr 17 2022, 1:06 PM
jdoerfert marked 2 inline comments as done.

Fix rebase issues, address comments

jdoerfert updated this revision to Diff 423314.Apr 17 2022, 2:38 PM

Use set instead of map

tianshilei1992 accepted this revision.Apr 18 2022, 1:43 PM

LGTM. Like @ye-luo suggested, we need to update the documents of lock and unlock as they are being used in a different way now.

This revision is now accepted and ready to land.Apr 18 2022, 1:43 PM
jdoerfert updated this revision to Diff 507188.Mar 21 2023, 5:24 PM
jdoerfert retitled this revision from [OpenMP][FIX] Remove shadow pointer map and introduce consistent locking to [OpenMP] Remove shadow pointer map and introduce consistent locking.
jdoerfert edited the summary of this revision. (Show Details)

Rebase, addressed comments, incorporated most parts of https://reviews.llvm.org/D123443.
The mapping stress tests slows down a little but running all AMD GPU tests is as fast as before.
Consistent locking of the entries should be worth it, and the stress test is unlikely anyway.
The set vs unordered set situation is resolved with a SmallSet of size 2, unlikely we attach more pointers into an object.

This revision was landed with ongoing or failed builds.Mar 21 2023, 7:17 PM
This revision was automatically updated to reflect the committed changes.

Got compiler warning.

/scratch3/opt/llvm-clang/llvm-project/openmp/libomptarget/src/omptarget.cpp: In lambda function:
/scratch3/opt/llvm-clang/llvm-project/openmp/libomptarget/src/omptarget.cpp:1058:56: warning: cast from type 'void* const*' to type 'void*' casts away qualifiers [-Wcast-qual]
 1058 |                                     (void *)&ShadowPtr.TgtPtrVal,
      |