We introduced the implementation of supporting asynchronous routines with depend objects specified in Version 5.1 of the OpenMP Application Programming Interface. In brief, these routines omp_target_memcpy_async and omp_target_memcpy_rect_async perform asynchronous (nonblocking) memory copies between any
combination of host and device pointers. The basic idea is to create the implicit tasks to carry the memory copy calls and handle dependencies specified by depend objects. The implicit tasks are executed via hidden helper thread in OpenMP runtime.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for implementing this. I have added a comment inlined.
openmp/libomptarget/src/private.h | ||
---|---|---|
119 | This would be the third location where this struct is duplicated: interop.h, kmp.h and this file. Would it make sense to try to add it to another common header file? |
First set of comments.
clang/docs/ReleaseNotes.rst | ||
---|---|---|
253 ↗ | (On Diff #468290) | Unrelated, probably bad diff origin. |
openmp/libomptarget/src/api.cpp | ||
213 | static also below. | |
217 | LLVM codeing style, please. So here, Args | |
220 | not 0 but nullptr, also lots of other places. | |
226 | Why ignore the return value of memcpy and return 0 here? | |
242 | 5? Why 5? | |
247 | Proper names, please. Also coding style. | |
252 | This is very suspicious. Why can't we have a kmp_tasking_flags_t object? | |
265 | Does this return anything? The use of Rc seems useless. Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway? | |
436 | This screams helper as it is literally the same code modulo 5 characters in a few places. Please refactor and run clang-format on the patch. |
Thanks Johannes for your comments, and I relied them below
- format issues
I ran clang-format to reformat, please check if there's any missed things;
- replace '0' with 'nullptr'
fixed
- proper return value for helper functions and async functions
fixed
- Why can't we have a kmp_tasking_flags_t object?
I followed the same access approach in openmp/runtime , so I didn't change this part. But I can revise if it is needed
- make helper function as 'static'
fixed
- Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway?
there's type cast for depend objects from 'omp_depend_t' to 'kmp_depend_info_t*', and the array of casted depend objects is consumed by '__kmpc_omp_task_with_deps' , to make it safe, I just make larray of casted depend objects to live longer, thus attached it to Args object.
- Why do you access args_ for some parts and not for others? That said, where does the hidden helper need access to the dependences anyway?
there's type cast for depend objects from 'omp_depend_t' to 'kmp_depend_info_t*', and the array of casted depend objects is consumed by '__kmpc_omp_task_with_deps' , to make it safe, I just make larray of casted depend objects to live longer, thus attached it to Args object.
So, you are saying the task_with_deps function does *not* copy the dependences and therefore the array has to outlive the function?
openmp/libomptarget/src/private.h | ||
---|---|---|
258 | As mentioned before. There is a lot of literal duplication here with no obvious benefit. Use a base class, or other schemes, to cut down on that. Also, the functions further above should share the same code. |
openmp/libomptarget/src/api.cpp | ||
---|---|---|
262 | Why does it need a _ at the end? | |
407 | I'm not sure if it is a failure if source and destination are same. | |
openmp/libomptarget/src/private.h | ||
119 | IIRC, there are some (non-technical) issues on using kmp.h in libomptarget. | |
181 | proxy task is not needed in this patch. we don't have detached task. | |
222 | This is really not LLVM code style. |
Thanks Johannes and Shilei
- "So, you are saying the task_with_deps function does *not* copy the dependences and therefore the array has to outlive the function?"
I checked the omp_task_with_deps, it does some copy operations , so I move the depobj_list related code out of the TargetMemcpyArgsTy and TargetMemcpyRectArgsTy objects now. Those two classes are also redefined as 'struct'.
- "> This would be the third location where this struct is duplicated: interop.h, kmp.h and this file. Would it make sense to try to add it to another common header file?
IIRC, there are some (non-technical) issues on using kmp.h in libomptarget."
So should I keep the data structure definitions in private.h? or create a new header file?
- no release of Args
fixed
- "I'm not sure if it is a failure if source and destination are same"
we thought about this, and should allow user to do this
- "This is really not LLVM code style."
I changed the 'class' definition to 'struct' and reformat them. Please check if that works
- few formatting issues
fixed
Thanks Johannes
- use SmallVector
fixed
- "module 5 characters"
ran clang-format , please check it that works
Thanks for the latest updates. The duplication is still my main concern here. See below.
openmp/libomptarget/src/api.cpp | ||
---|---|---|
373 | My point about the "5 character difference" was that this function is basically the same as __kmpc_target_memcpy_async_helper and similarly omp_target_memcpy_rect_async is almost the same as omp_target_memcpy_async. Can we use common helper functions to avoid duplicating all that code twice? | |
439 | That's not how vectors work. push_back(...) | |
openmp/libomptarget/src/private.h | ||
228 | These structs need doxygen comments describing where they are used. |
Thanks Johannes and Shilei
- using a common helper function
fixed
- using push_back for SmallVector
fixed
- add doxygen comments for struct
added, please check if that works
openmp/libomptarget/src/api.cpp | ||
---|---|---|
425 | Now we have a first helper for the "async" part, we still should deduplicate the entry point code. Above, all but lines 385-387 are the same as in omp_target_memcpy_async. Can we also not duplicate those lines? In this code there are also various places with variables not named according to the style guide, e.g. _Count, and with names not helpful, e.g. ErrXYZ. Please address those too. Similarly, use nullptr instead of NULL. The problem with the int32 Flags I mentioned already. Let's avoid type punning and create the object with the right type from the very beginning. |
openmp/libomptarget/src/private.h | ||
---|---|---|
107 | Can we put all KMP related code into a separate header, but of course not called kmp.h? I don't think it's a good idea to have them everywhere. That's quite unfortunate that we have to duplicate the code. | |
229 | Having a variable suffix with _ is generally not a good coding style, not LLVM style. You can potentially use the same name. The compiler can tell them apart. For those values, such as NumDims, you could potentially do int NumDims = 0 when defining them. |
Thanks Johannes and Shilei
- '385-387 are the same as in omp_target_memcpy_async. Can we also not duplicate those lines?'
I put the common code (i.e.helper task creation) into another static function
- 'In this code there are also various places with variables not named according to the style guide'
fixed, please check if there's remaining issues
- 'The problem with the int32 Flags I mentioned already. '
The flag variable was defined as 'kmp_int32', since its consumer '__kmpc_omp_target_task_alloc' needs 'kmp_int32' type as input. the type cast to 'kmp_tasking_flags_t' is to set the 'hidden_helper' bit. So it seems that there's no better option for us. Please let me know your suggestion.
- "Can we put all KMP related code into a separate header"
we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?
- 'aving a variable suffix with _ is generally not a good coding style'
fixed
I feel like I'm missing something. As said before, all but 3 lines are identical in these two functions.
Now you created a helper for 1/3 of those identical lines but left the other 2/3 being duplicated. Could
you elaborate why?
- 'In this code there are also various places with variables not named according to the style guide'
fixed, please check if there's remaining issues
- 'The problem with the int32 Flags I mentioned already. '
The flag variable was defined as 'kmp_int32', since its consumer '__kmpc_omp_target_task_alloc' needs 'kmp_int32' type as input. the type cast to 'kmp_tasking_flags_t' is to set the 'hidden_helper' bit. So it seems that there's no better option for us. Please let me know your suggestion.
Yeah,.. the API is broken. Let's keep it as is. Apologies to have brought it up.
- "Can we put all KMP related code into a separate header"
we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?
- 'aving a variable suffix with _ is generally not a good coding style'
fixed
Thanks Johannes
- 'I feel like I'm missing something. As said before, all but 3 lines are identical in these two functions.
Now you created a helper for 1/3 of those identical lines but left the other 2/3 being duplicated. Could
you elaborate why?'
I put the common part into task_creation function
Looks good from my end. Thanks for the updates, I hope they make sense in the end.
Some minor style comments below. @tianshilei1992 Should accept once the header concerns are resolved.
openmp/libomptarget/src/api.cpp | ||
---|---|---|
263–264 | You can delete a nullptr. | |
274–279 | ||
281–283 |
we used both kmp relevent data structure/types and APIs, so should I wrap all those relevant code into several tool functions and put them into separate header file?
IMO we can put all KMP related code into one header and include it where needed. For the others, the API header has to be included anyway.
I checked through private.h, this header actually does the functionality
that contains all kmp and kmpc related data structures and APIs, so should
we still have to split a separate header file?
Yeah you are right. Nah, that's fine. Let it be for now.
Can you add a couple of tests?
openmp/libomptarget/src/api.cpp | ||
---|---|---|
250 | potentially an auto *Fn is more appropriate here. |
Thanks Johannes and Shilei
I added few test cases for asynchronous routine at test/api folder
openmp/libomptarget/test/api/omp_target_memcpy_async1.c | ||
---|---|---|
2 | There is no RUN line here so the test will not be triggered. Can you refer to other test cases to see how a test is run? |
openmp/libomptarget/test/api/omp_target_memcpy_async1.c | ||
---|---|---|
3 | Does it work on AMDGPU and other targets? Why does it require Nvidia here? | |
openmp/libomptarget/test/api/omp_target_memcpy_rect_async3.c | ||
21 ↗ | (On Diff #470354) | If I read this test correctly, this test is more for performance evaluation instead of correctness. We don't have any performance test cases yet, and I'm not sure we need them right now. |
181 ↗ | (On Diff #470354) | If we don't profile the time, I don't see a point in keeping them. |
Thanks Shilei
- "Does it work on AMDGPU and other targets? Why does it require Nvidia here?"
No, I just remove this line
- "We don't have any performance test cases yet, and I'm not sure we need them right now."
Yes, I remove those two performance test cases
hi Johannes and Shilei, is there revision that needs to be done on this
patch? please let me know
Thanks Shilei
- "Can you check all resolved comments to make sure there is no open comments?"
checked through the comments you and Johannes made, no more issues
- "Since this function is not part of libomp and it's not gonna be an interface function, no need to name it as __kmpc."
revised helper functions' names
Rebased and enabled tests for generic devices.
Resulted in one test failure
Failed Tests (3):
libomptarget :: amdgcn-amd-amdhsa :: api/omp_target_memcpy_rect_async1.c
libomptarget :: x86_64-pc-linux-gnu :: api/omp_target_memcpy_rect_async1.c
libomptarget :: x86_64-pc-linux-gnu-LTO :: api/omp_target_memcpy_rect_async1.c#
Fix bug to corectly support the maximally supported dimensions as required by the spec.
static also below.