Page MenuHomePhabricator

Automatic asynchronous execution of OpenMP Target Regions
Needs ReviewPublic

Authored by randreshg on Aug 20 2022, 9:50 AM.

Details

Summary

This patch allows Automatic asynchronous execution of OpenMP Target Regions on Nvidia GPUs.
When the environment variable LIBOMPTARGET_INTRA_THREAD_ASYNC is enabled
(LIBOMPTARGET_INTRA_THREAD_ASYNC=1), the implicit barrier that exists at the end
of every target region is removed and synchronization is only performed on memory transfers
or at the end of the program when the OpenMP runtime calls its destructors

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jdoerfert added inline comments.Aug 29 2022, 7:31 AM
openmp/libomptarget/include/device.h
340

Why is this static, that seems wrong.

475

Documentation, please.

openmp/libomptarget/include/omptarget.h
198–203

You cannot just remove this. See https://reviews.llvm.org/D132045, as it introduces a flag you can use to disable synchronization here.

openmp/libomptarget/src/device.cpp
69

Reverse the condition, single line alternative comes first, then the complex consequence (without the else).

81–83

Use early exits and no else after return.

josemonsalve2 added inline comments.Aug 31 2022, 2:25 PM
openmp/libomptarget/src/device.cpp
61–65

It may be better to merge these into:

DP("Asynchronous execution %s\n", AsyncFlag ? "Enabled" : "Disabled");

randreshg marked 4 inline comments as done.Sep 5 2022, 7:17 AM
randreshg added inline comments.Sep 5 2022, 7:31 AM
openmp/libomptarget/include/device.h
340

The idea of AsyncInfoMng is to have a way to control the AsyncInfo object to skip synchronization that is not needed. It's part of the Device class, so we can sync like this:

Device.syncAsyncInfo(AsyncInfo, true);

however, it is not necessary for every device to have a copy of AsyncInfoMng. That's why it's static.

randreshg marked 2 inline comments as done.Sep 13 2022, 6:26 AM
randreshg updated this revision to Diff 459736.EditedSep 13 2022, 6:36 AM

diff updated.

  • The AsyncInfoManager is now part of the omptarget.h and not device.h
  • Global variable AIM was added in interface.cpp
  • Flag was added to the Synchronization in asyncinfo destructor

Documentation added

Some more initial comments

openmp/docs/optimizations/OpenMPOpt.rst
113

Doesn't the length need to match?

116

For nowait regions this is maybe misleading. We should probably say they can be executed synchronously depending on the pragmas and implementation. In that case ...

134

This is misleading. Rather explain what would happen if the map on the target was triggering a transfer.

openmp/libomptarget/include/omptarget.h
232

This default is dangerous. Either swap it or avoid the default. Also, explain what it means to synchronize but not to force synchronize.

236

Use doxygen comments /// not //.

openmp/libomptarget/src/device.cpp
52

Unrelated

openmp/libomptarget/src/interface.cpp
25

We should not have a global like this. It probably belongs in/to the Device object.

254–259

A comment would be nice as this is the one place we do not force synchronization.

randreshg marked 8 inline comments as done.Sep 26 2022, 11:54 AM

This new version has the following changes:

  • Updated documentation
  • The AsyncInfoManager is part of the Device class and now has a map that contains thread ids as its key values.
jdoerfert added inline comments.Oct 18 2022, 10:40 AM
openmp/docs/optimizations/OpenMPOpt.rst
120
135–140
openmp/libomptarget/include/device.h
320

Documentation, please.

353

I would rather expose the AIM than have 3 functions that just forward to it.

openmp/libomptarget/include/omptarget.h
26

We can use LLVM data structures now too, e.g., maps.

openmp/libomptarget/src/interface.cpp
254–259

multi-line conditionals should have braces. The comment is unhelpful. It states what the code already says, not why the code is this way.

openmp/libomptarget/src/omptarget.cpp
58

Lot's of static flags to simply lookup a env var once. Why don't we do it the same way as for other env vars?

60
66
67

You did get an iterator, why do another lookup via [...]? You can do a single [...] lookup and use the reference result to check and update it.

69

And this is the 3rd lookup into the map..

70

Do we ever have to clear the map?

87

I doubt we need both these lines.

randreshg updated this revision to Diff 468932.Oct 19 2022, 8:50 AM
randreshg marked an inline comment as done.

This new patch addresses all previous comments from reviewers

randreshg marked 11 inline comments as done.Oct 19 2022, 8:51 AM

I know this is getting tiresome but we need to make sure people understand what's happening and this plays well with future extensions. More comments.

openmp/libomptarget/include/omptarget.h
195

ShouldSyncWhenDestroyed

232

You never return a nullptr, make it a reference.

240

The description is unhelpful. What AsyncInfo, etc. The function is also unused, do we need it?

openmp/libomptarget/src/interface.cpp
100

Now this pattern is somewhat unfortunate. You get the AsyncInfo from the AIM and then you need to be careful to call the right synchronize. If the AsynFlag was global you could move the new "sync" logic into the regular AsyncInfo sync, right? AIM would just be used to manage the map ID -> AsyncInfo. WDYT about this scheme? You could check the env variable once, like we do it for some others: https://github.com/llvm/llvm-project/blob/23bc343855fdf6fb7668abadf2b064034b207981/openmp/libomptarget/src/rtl.cpp#L43

openmp/libomptarget/src/omptarget.cpp
59–62
randreshg marked 6 inline comments as done.Oct 20 2022, 12:35 PM
randreshg added inline comments.
openmp/libomptarget/src/interface.cpp
100

Agreed!
The pattern you suggest allows for separating the synchronization logic and AsyncInfo management.

randreshg updated this revision to Diff 469594.Oct 21 2022, 6:59 AM
randreshg marked an inline comment as done.

I think this is fine for now. @ye-luo can this go in as opt-in feature that we probably refine as we go?

openmp/libomptarget/src/interface.cpp
100

Better. Not super happy about the explicit free call but that's fine for now.

How many testing has been done? it seems only workable on toy examples.

openmp/docs/optimizations/OpenMPOpt.rst
140

Through this example, I only see TT1 and TT2 racing when the async feature is enabled.

openmp/libomptarget/src/device.cpp
55

Why AIM captures a pointer instead of reference?

openmp/libomptarget/src/omptarget.cpp
30–38

It is mess. synchroning or not depends on a bunch of states. Move the if-statement to the caller side and make the logic directly exposed on the use side.

76

I think this is against coding principles, pass in a reference and delete its memory.

openmp/libomptarget/src/rtl.cpp
42

Is AsyncFlag documented?

ye-luo added inline comments.Oct 26 2022, 1:29 PM
openmp/libomptarget/include/omptarget.h
195

Add const

220

This is insufficient documentation. Please explain what this struct does actually not just what it is used for.

221

Why is this a struct instead of a class? It encourage code like AIM.AsyncInfoM[1]? Use class and appropriate private/public.

222

Why AsyncInfoTy needs to be associated with thread::id. Does this add unnecessary entanglement between target tasks.

ye-luo added inline comments.Oct 26 2022, 1:48 PM
openmp/libomptarget/include/omptarget.h
222

it is necessary to document the design choice that AsyncInfoTy objects and threadIDs have one to one mapping when AsyncFlag is true.

openmp/libomptarget/src/omptarget.cpp
59

New a pointer and then return its reference. "reference" types should not be used to manage the ownership.

openmp/libomptarget/src/rtl.cpp
42

I don't think this is a good design, that tries to modify the state of libomptarget from plugins, especially the corresponding env is called LIBOMPTARGET_ASYNC, which indicates it should be handled in libomptarget instead of in each plugin. Potentially I think we could add a plugin interface function to tell what opt-in feature is enabled. This could be done in a separate patch, but is better to land it before this patch. I don't think it makes sense to open the door (now) and then close it later, because I don't think "later" will come.

BTW, LIBOMPTARGET_ASYNC is a little bit confusing. It's too general.

openmp/libomptarget/include/omptarget.h
21–22

It is recommended to add a blank line between LLVM and STL headers.

222

Better to use LLVM ADT here

openmp/libomptarget/src/rtl.cpp
42

nvm, I didn't look at it right. It's in libomptarget. My bad. But the env name is true.

How many testing has been done? it seems only workable on toy examples.

Wrt toy examples:

openmp/docs/optimizations/OpenMPOpt.rst
140

As discussed yesterday, there is no race.

openmp/libomptarget/include/omptarget.h
222

This does not depend on tasks but threads. If threads are independent wrt offload, this extension allows them to run host and device code asynchronously. If threads are not independent wrt. offload, this extension cannot be used.

openmp/libomptarget/src/omptarget.cpp
30–38

Exposing this to the user side is not only not helpful but will actively harm things. The logic depends on two flags, not "a bunch of states". One is passed by the user, one describes the system setup. This is totally fine.

59

What about this:

Replace "get" with

AsyncInfoMng::register(AsyncInfo &AI);

Remove "free".
Register is implemented as no-op w/o the AsyncInfo flag set.
Otherwise it'll replace AI with a dynamically allocated one.

openmp/libomptarget/src/rtl.cpp
42

What about LIBOMPTARGET_INTRA_THREAD_ASYNC?

openmp/libomptarget/src/rtl.cpp
42

that sounds good!

ye-luo added inline comments.Oct 27 2022, 12:36 PM
openmp/libomptarget/include/omptarget.h
222

This does not depend on tasks but threads. If threads are independent wrt offload, this extension allows them to run host and device code asynchronously. If threads are not independent wrt. offload, this extension cannot be used.

Fair enough. This feature relies on thread id to impose dependency. It can only be used under certain restrictions.

openmp/libomptarget/src/omptarget.cpp
30–38

Two flags = 4 states. Not a low number of variants. Calling synchronize() but it may or may not do the sync. When should the user set ForceSync to true? Better to have some explanation.

59

Not getting what you mean.
I'm still expecting, AsyncInfoTy object being destroyed properly at the end of the target region, when AsyncInfo=false.

jdoerfert added inline comments.Oct 27 2022, 1:28 PM
openmp/libomptarget/src/omptarget.cpp
30–38

4 states, ok. However, they collapse to 2; it's a single conditional after all: synchronize or not.
The rules are fairly simple (one or condition) and documented:

/// Synchronize all pending actions when the LIBOMPTARGET_ASYNC env var
/// is disabled or when synchronization is forced (ForceSync = true) 
// Otherwise, synchronization is skipped
/// \returns OFFLOAD_FAIL or OFFLOAD_SUCCESS appropriately.
59

I'm still expecting, AsyncInfoTy object being destroyed properly at the end of the target region, when AsyncInfo=false.

It is properly destroyed with this patch, and it will be with the proposed scheme. In the proposed scheme we have a local AsyncInfo (as we have upstream now) and we use it if AsyncInfo=false.

randreshg added inline comments.Oct 28 2022, 12:28 PM
openmp/libomptarget/src/omptarget.cpp
59

Im not following this. Could you please provide a pseudocode?

randreshg marked 22 inline comments as done.Mon, Nov 21, 7:52 AM
randreshg marked 5 inline comments as done.
randreshg updated this revision to Diff 476906.Mon, Nov 21, 8:00 AM
randreshg edited the summary of this revision. (Show Details)

This patch addresses comments from reviewers:

  • Env var name is LIBOMPTARGET_INTRA_THREAD_ASYNC
  • The pattern to get and destroy the AsyncInfoTy object changed.
  • A DenseMap is used instead of the std::map.