This is an archive of the discontinued LLVM Phabricator instance.

OpenMP asynchronous memory copy support
ClosedPublic

Authored by jz10 on Oct 17 2022, 12:51 PM.

Details

Summary

We introduced the implementation of supporting asynchronous routines with depend objects specified in Version 5.1 of the OpenMP Application Programming Interface. In brief, these routines omp_target_memcpy_async and omp_target_memcpy_rect_async perform asynchronous (nonblocking) memory copies between any
combination of host and device pointers. The basic idea is to create the implicit tasks to carry the memory copy calls and handle dependencies specified by depend objects. The implicit tasks are executed via hidden helper thread in OpenMP runtime.

Diff Detail

Event Timeline

jz10 created this revision.Oct 17 2022, 12:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 12:51 PM
jz10 requested review of this revision.Oct 17 2022, 12:51 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 17 2022, 12:51 PM

Thanks for implementing this. I have added a comment inlined.

openmp/libomptarget/src/private.h
119

This would be the third location where this struct is duplicated: interop.h, kmp.h and this file. Would it make sense to try to add it to another common header file?

First set of comments.

clang/docs/ReleaseNotes.rst
253 ↗(On Diff #468290)

Unrelated, probably bad diff origin.

openmp/libomptarget/src/api.cpp
213

static also below.

217

LLVM codeing style, please. So here, Args

220

not 0 but nullptr, also lots of other places.

226

Why ignore the return value of memcpy and return 0 here?

242

5? Why 5?

247

Proper names, please. Also coding style.

252

This is very suspicious. Why can't we have a kmp_tasking_flags_t object?

265

Does this return anything? The use of Rc seems useless. Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway?

436

This screams helper as it is literally the same code modulo 5 characters in a few places. Please refactor and run clang-format on the patch.

jz10 updated this revision to Diff 468695.Oct 18 2022, 2:29 PM

Thanks Johannes for your comments, and I relied them below

  1. format issues

I ran clang-format to reformat, please check if there's any missed things;

  1. replace '0' with 'nullptr'

fixed

  1. proper return value for helper functions and async functions

fixed

  1. Why can't we have a kmp_tasking_flags_t object?

I followed the same access approach in openmp/runtime , so I didn't change this part. But I can revise if it is needed

  1. make helper function as 'static'

fixed

  1. Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway?

there's type cast for depend objects from 'omp_depend_t' to 'kmp_depend_info_t*', and the array of casted depend objects is consumed by '__kmpc_omp_task_with_deps' , to make it safe, I just make larray of casted depend objects to live longer, thus attached it to Args object.

  1. Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway?

there's type cast for depend objects from 'omp_depend_t' to 'kmp_depend_info_t*', and the array of casted depend objects is consumed by '__kmpc_omp_task_with_deps' , to make it safe, I just make larray of casted depend objects to live longer, thus attached it to Args object.

So, you are saying the task_with_deps function does *not* copy the dependences and therefore the array has to outlive the function?

openmp/libomptarget/src/private.h
258

As mentioned before. There is a lot of literal duplication here with no obvious benefit. Use a base class, or other schemes, to cut down on that. Also, the functions further above should share the same code.

jz10 added a comment.Oct 18 2022, 3:05 PM

I'm not sure if it copies, will check it to confirm

tianshilei1992 requested changes to this revision.Oct 18 2022, 6:20 PM
tianshilei1992 added inline comments.
openmp/libomptarget/src/api.cpp
262

Why does it need a _ at the end?

407

I'm not sure if it is a failure if source and destination are same.

openmp/libomptarget/src/private.h
119

IIRC, there are some (non-technical) issues on using kmp.h in libomptarget.

181

proxy task is not needed in this patch. we don't have detached task.

222

This is really not LLVM code style.

tianshilei1992 added inline comments.Oct 18 2022, 6:20 PM
openmp/libomptarget/src/api.cpp
217

I didn't see Args is freed after use. There is memory leak.

openmp/libomptarget/src/private.h
197

argument name is not in LLVM style

This revision now requires changes to proceed.Oct 18 2022, 6:20 PM
jz10 updated this revision to Diff 469051.Oct 19 2022, 2:51 PM

Thanks Johannes and Shilei

  1. "So, you are saying the task_with_deps function does *not* copy the dependences and therefore the array has to outlive the function?"

I checked the omp_task_with_deps, it does some copy operations , so I move the depobj_list related code out of the TargetMemcpyArgsTy and TargetMemcpyRectArgsTy objects now. Those two classes are also redefined as 'struct'.

  1. "> This would be the third location where this struct is duplicated: interop.h, kmp.h and this file. Would it make sense to try to add it to another common header file?

IIRC, there are some (non-technical) issues on using kmp.h in libomptarget."
So should I keep the data structure definitions in private.h? or create a new header file?

  1. no release of Args

fixed

  1. "I'm not sure if it is a failure if source and destination are same"

we thought about this, and should allow user to do this

  1. "This is really not LLVM code style."

I changed the 'class' definition to 'struct' and reformat them. Please check if that works

  1. few formatting issues

fixed

jz10 updated this revision to Diff 469060.Oct 19 2022, 3:06 PM

one minors fix, i.e. delete the depobj_list from TargetMemcpyArgsTy struct

jdoerfert added inline comments.Oct 19 2022, 3:16 PM
openmp/libomptarget/src/api.cpp
273

Use an llvm::SmallVector

436

^^^

jz10 updated this revision to Diff 469110.Oct 19 2022, 9:22 PM

Thanks Johannes

  1. use SmallVector

fixed

  1. "module 5 characters"

ran clang-format , please check it that works

jz10 updated this revision to Diff 469112.Oct 19 2022, 9:55 PM
  1. use SmallVector

fixed

  1. "5 characters"

ran clang-format, please check if that works

jz10 updated this revision to Diff 469328.Oct 20 2022, 1:11 PM

redo the proper patch, i.e. get rid of content related to clang/doc

Thanks for the latest updates. The duplication is still my main concern here. See below.

openmp/libomptarget/src/api.cpp
373

My point about the "5 character difference" was that this function is basically the same as __kmpc_target_memcpy_async_helper and similarly omp_target_memcpy_rect_async is almost the same as omp_target_memcpy_async. Can we use common helper functions to avoid duplicating all that code twice?
(Put differently: I assume that you implemented one of these functions and then copied it and renamed a few things to get the other. Doing that should have made it obvious that there is duplication and we want to avoid duplication.)

439

That's not how vectors work. push_back(...)

openmp/libomptarget/src/private.h
228

These structs need doxygen comments describing where they are used.

jz10 updated this revision to Diff 469719.Oct 21 2022, 12:45 PM

Thanks Johannes and Shilei

  1. using a common helper function

fixed

  1. using push_back for SmallVector

fixed

  1. add doxygen comments for struct

added, please check if that works

jdoerfert added inline comments.Oct 21 2022, 12:58 PM
openmp/libomptarget/src/api.cpp
425

Now we have a first helper for the "async" part, we still should deduplicate the entry point code. Above, all but lines 385-387 are the same as in omp_target_memcpy_async. Can we also not duplicate those lines?

In this code there are also various places with variables not named according to the style guide, e.g. _Count, and with names not helpful, e.g. ErrXYZ. Please address those too. Similarly, use nullptr instead of NULL.

The problem with the int32 Flags I mentioned already. Let's avoid type punning and create the object with the right type from the very beginning.

tianshilei1992 added inline comments.Oct 21 2022, 1:07 PM
openmp/libomptarget/src/private.h
107

Can we put all KMP related code into a separate header, but of course not called kmp.h? I don't think it's a good idea to have them everywhere. That's quite unfortunate that we have to duplicate the code.

229

Having a variable suffix with _ is generally not a good coding style, not LLVM style. You can potentially use the same name. The compiler can tell them apart. For those values, such as NumDims, you could potentially do int NumDims = 0 when defining them.

jz10 updated this revision to Diff 469772.Oct 21 2022, 2:47 PM

Thanks Johannes and Shilei

  1. '385-387 are the same as in omp_target_memcpy_async. Can we also not duplicate those lines?'

I put the common code (i.e.helper task creation) into another static function

  1. 'In this code there are also various places with variables not named according to the style guide'

fixed, please check if there's remaining issues

  1. 'The problem with the int32 Flags I mentioned already. '

The flag variable was defined as 'kmp_int32', since its consumer '__kmpc_omp_target_task_alloc' needs 'kmp_int32' type as input. the type cast to 'kmp_tasking_flags_t' is to set the 'hidden_helper' bit. So it seems that there's no better option for us. Please let me know your suggestion.

  1. "Can we put all KMP related code into a separate header"

we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?

  1. 'aving a variable suffix with _ is generally not a good coding style'

fixed

Thanks Johannes and Shilei

  1. '385-387 are the same as in omp_target_memcpy_async. Can we also not duplicate those lines?'

I put the common code (i.e.helper task creation) into another static function

I feel like I'm missing something. As said before, all but 3 lines are identical in these two functions.
Now you created a helper for 1/3 of those identical lines but left the other 2/3 being duplicated. Could
you elaborate why?

  1. 'In this code there are also various places with variables not named according to the style guide'

fixed, please check if there's remaining issues

  1. 'The problem with the int32 Flags I mentioned already. '

The flag variable was defined as 'kmp_int32', since its consumer '__kmpc_omp_target_task_alloc' needs 'kmp_int32' type as input. the type cast to 'kmp_tasking_flags_t' is to set the 'hidden_helper' bit. So it seems that there's no better option for us. Please let me know your suggestion.

Yeah,.. the API is broken. Let's keep it as is. Apologies to have brought it up.

  1. "Can we put all KMP related code into a separate header"

we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?

  1. 'aving a variable suffix with _ is generally not a good coding style'

fixed

jz10 updated this revision to Diff 469810.Oct 21 2022, 3:51 PM

Thanks Johannes

  1. 'I feel like I'm missing something. As said before, all but 3 lines are identical in these two functions.

Now you created a helper for 1/3 of those identical lines but left the other 2/3 being duplicated. Could
you elaborate why?'
I put the common part into task_creation function

Looks good from my end. Thanks for the updates, I hope they make sense in the end.

Some minor style comments below. @tianshilei1992 Should accept once the header concerns are resolved.

openmp/libomptarget/src/api.cpp
263–264

You can delete a nullptr.

274–279
281–283
jz10 updated this revision to Diff 469845.Oct 21 2022, 6:37 PM

Thanks Johannes
I revised those issues, please check if those work

we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?

IMO we can put all KMP related code into one header and include it where needed. For the others, the API header has to be included anyway.

jz10 added a comment.Oct 22 2022, 6:34 PM

Regarding this, can we just move those two helper functions to private.h ?

jz10 added a comment.Oct 24 2022, 2:23 PM

I checked through private.h, this header actually does the functionality
that contains all kmp and kmpc related data structures and APIs, so should
we still have to split a separate header file?

I checked through private.h, this header actually does the functionality
that contains all kmp and kmpc related data structures and APIs, so should
we still have to split a separate header file?

Yeah you are right. Nah, that's fine. Let it be for now.

Can you add a couple of tests?

openmp/libomptarget/src/api.cpp
250

potentially an auto *Fn is more appropriate here.

jz10 added a comment.Oct 24 2022, 3:25 PM

Sure, where should I add those tests?

openmp/libomptarget/test/api/ is where we usually tests those APIs.

jz10 updated this revision to Diff 470339.Oct 24 2022, 5:52 PM

Thanks Johannes and Shilei

I added few test cases for asynchronous routine at test/api folder

tianshilei1992 added inline comments.Oct 24 2022, 6:42 PM
openmp/libomptarget/test/api/omp_target_memcpy_async1.c
2

There is no RUN line here so the test will not be triggered. Can you refer to other test cases to see how a test is run?

jz10 updated this revision to Diff 470354.Oct 24 2022, 7:20 PM

Thanks Shilei

Add RUN line for each test cases

tianshilei1992 added inline comments.Oct 24 2022, 7:26 PM
openmp/libomptarget/test/api/omp_target_memcpy_async1.c
3

Does it work on AMDGPU and other targets? Why does it require Nvidia here?

openmp/libomptarget/test/api/omp_target_memcpy_rect_async3.c
21 ↗(On Diff #470354)

If I read this test correctly, this test is more for performance evaluation instead of correctness. We don't have any performance test cases yet, and I'm not sure we need them right now.

181 ↗(On Diff #470354)

If we don't profile the time, I don't see a point in keeping them.

jz10 updated this revision to Diff 470357.Oct 24 2022, 7:45 PM

Thanks Shilei

  1. "Does it work on AMDGPU and other targets? Why does it require Nvidia here?"

No, I just remove this line

  1. "We don't have any performance test cases yet, and I'm not sure we need them right now."

Yes, I remove those two performance test cases

jz10 added a subscriber: jdoerfert.Oct 25 2022, 6:56 PM

hi Johannes and Shilei, is there revision that needs to be done on this
patch? please let me know

Generally looks good to me. Can you check all resolved comments to make sure there is no open comments?

openmp/libomptarget/src/api.cpp
213

same here

245

Since this function is not part of libomp and it's not gonna be an interface function, no need to name it as __kmpc.

jz10 updated this revision to Diff 470684.Oct 25 2022, 7:45 PM

Thanks Shilei

  1. "Can you check all resolved comments to make sure there is no open comments?"

checked through the comments you and Johannes made, no more issues

  1. "Since this function is not part of libomp and it's not gonna be an interface function, no need to name it as __kmpc."

revised helper functions' names

jdoerfert accepted this revision.Oct 26 2022, 8:25 AM

LG, thanks for all the changes.

tianshilei1992 accepted this revision.Oct 26 2022, 9:36 AM

Thanks! LG.

This revision is now accepted and ready to land.Oct 26 2022, 9:36 AM
jplehr added a subscriber: jplehr.Nov 22 2022, 12:19 PM

Any reason this hasn't been pushed yet?

jplehr updated this revision to Diff 508955.Mar 28 2023, 4:03 AM

Rebase to make ready for land

jplehr updated this revision to Diff 509023.Mar 28 2023, 8:12 AM

Rebased and enabled tests for generic devices.
Resulted in one test failure

Failed Tests (3):
libomptarget :: amdgcn-amd-amdhsa :: api/omp_target_memcpy_rect_async1.c
libomptarget :: x86_64-pc-linux-gnu :: api/omp_target_memcpy_rect_async1.c
libomptarget :: x86_64-pc-linux-gnu-LTO :: api/omp_target_memcpy_rect_async1.c#

jplehr updated this revision to Diff 509581.Mar 30 2023, 2:15 AM

Fix bug to corectly support the maximally supported dimensions as required by the spec.

jplehr updated this revision to Diff 509583.Mar 30 2023, 2:19 AM

Removed accidentally added code

jdoerfert accepted this revision.Mar 30 2023, 11:59 AM

LG

openmp/libomptarget/src/api.cpp
386

!DST && !SRC

This revision was automatically updated to reflect the committed changes.