Page MenuHomePhabricator

[OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime
ClosedPublic

Authored by lildmh on Sep 20 2019, 6:51 AM.

Details

Summary

This patch implements the code generation to use OpenMP 5.0 declare mapper (e.g., user-defined mapper) constructs. It looks up the proper mapper function for each map, to, or from clause that has a user-defined mapper associated, and passes them to the OpenMP runtime function.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lildmh marked 28 inline comments as done.Dec 16 2019, 11:39 AM

Rebase and address comments

I'll split the patch into 2 later

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Yes. The reason why I did this is I found this is the single place that I can consider all situations. Otherwise, I need to divide the size by the element size in other places. Since this function has discussed all situations for the element size, I will need to duplicate the work elsewhere if I don't do it here. The latter solution is not good for code mountainous I think.

7592–7601

Maybe not in this patch?

8856–8857

Not sure where this code will be used. I guess it's still correct to add a null to the mappers here?

8988–8990

I saw all the other places are using CreateConstInBoundsGEP2_32 to get sizes, types, etc. So I kept it consistent here. Maybe these modifications should be in a future patch?

8991–8992

Again, this is to keep it consistent with previous code above?

9024–9027

Again, this is to keep consistent with previous code?

ABataev added inline comments.Dec 16 2019, 12:14 PM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

Do we really need to set HasMapper to true unconditionally here and in other places?

clang/lib/CodeGen/CGOpenMPRuntime.h
841–842 ↗(On Diff #234113)

Is this required to make it virtual?

1556 ↗(On Diff #234113)

(!HasMapper || MappersArray)?

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Changing the original function contract is also not the best solution. Better to introduce the new function. If some of the code can be reused, need to factor out the common code and reuse it in the old function and the new one.

7440

Also, additional question. Does it mean that we pass to the runtime functions different values in different situations: in one case we pass size in bytes, with mappers - number of elements? If so, it is a very bad idea caused by bad design. We need to pass either size in bytes, or number of elements in this argument.

7592–7601

It is better to make it in this patch or make a separate patch and commit it before this one. We try to resolve this as soon as possible.

8856–8857

I mean, that in general, it would be good to see here some kind of an assertion. Almost all situations must be resolved in Sema.

8988–8990

Some of the modifications were made before we had these new functions. Now, we have the new ones and if we can use it, better to switch to the new safer versions.

8991–8992

No need to rely on the old code if can make something cleaner and better. Cast Mfunc to voidptr in the if statement and no need to make some extra casts later.

9024–9027

Same, use the best code if we can use it.

lildmh marked 5 inline comments as done.Dec 16 2019, 12:41 PM

Alexey, thanks for the review

clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

For map, to, and from clauses, they can have mappers, so it's set to true. For is_device_ptr and use_device_ptr, it's set to false.

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Yes, the current design is like this. Let me see how to make it better

8856–8857

Okay, I guess it's not the work of this patch

ABataev added inline comments.Dec 16 2019, 12:45 PM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

So, it completely depends on the clause kind, right? If so, can we just remove it and rely on the clause kind?

lildmh marked 4 inline comments as done.Dec 16 2019, 4:38 PM
lildmh added inline comments.
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

This is used in OMPMappableExprListClause which is inherited by all these clauses. Do you mean to check the template type there to get the clause kind?

ABataev added inline comments.Dec 17 2019, 7:31 AM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

I see. I think it is better to rename it to ClauseSupportsMapper (or something like this) and make it const. Currently, it is confusing. I thought that this flag shows the use of the mapper in the clause.

lildmh marked an inline comment as done.Dec 17 2019, 7:46 AM
lildmh added inline comments.
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

Ok, will do that

lildmh updated this revision to Diff 234554.Dec 18 2019, 8:53 AM
lildmh marked 3 inline comments as done.

Address Alexey's comments to change mapper function size and refactor code

Looks much better in general, need to resolve last one issue with number of elements/size and you're ready.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

So, we're still going to use number of elements for mappers? And pass it in the same parameter that in other cases is used as size in bytes? If so, point to it explicitly in the review for the runtime part so all are informed about it.

9244 ↗(On Diff #234554)

getOrEmitUserDefinedMapperFunc?

lildmh marked 2 inline comments as done.Dec 18 2019, 9:40 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

From interface, the mapper function uses size in bytes now. Inside, it needs number of elements to iterate through all elements. This has no impact on the runtime part, since it looks like normal mapping from the interface. All conversion happens inside the mapper function which is completely generated by the compiler.

9244 ↗(On Diff #234554)

I guess getUserDefinedMapperFunc is a better name? Because the user uses this function to get the mapper function. Emitting a mapper function is a side effect.

ABataev added inline comments.Dec 18 2019, 9:43 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Ok. Then why do we need to convert size in bytes to number of elements here?

9244 ↗(On Diff #234554)

I meant something like getOrCreate... but it is up to you.

lildmh marked 2 inline comments as done.Dec 18 2019, 9:54 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

This is used to 1) see if we are going to map an array of elements with mapper, and 2) iterate all to map them individually.

9244 ↗(On Diff #234554)

Okay, sounds good

ABataev added inline comments.Dec 18 2019, 10:00 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Could you point where we have this kind of analysis here? Because I don't see anything affected by this change in the patch.

lildmh marked 2 inline comments as done.Dec 18 2019, 10:31 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8986 ↗(On Diff #234554)

Size is used to compute the last element position, which is used as the loop boundary.

9207 ↗(On Diff #234554)

Size is used to see if this is an array.

ABataev added inline comments.Dec 18 2019, 10:40 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Is this a bug fix in the previous implementation?

lildmh updated this revision to Diff 234579.Dec 18 2019, 10:44 AM

Change the function name

lildmh marked an inline comment as done.Dec 18 2019, 10:54 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

The previous implementation assumes the size is the number of elements, and it works correctly under that assumption. Since we change the meaning of size here, I add this line of code so the previous implementation can work correctly in the new assumption that the size is the size in bytes.

ABataev added inline comments.Dec 18 2019, 10:59 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Ah, got it. Then, in general, looks good. Please, split the patches in to, possibly, 2 NFC (one with using of the new functions + another one for aggregating too many params into records) + another one with the new functionality.

lildmh marked an inline comment as done.Dec 18 2019, 11:14 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Okay, will do that. What do you think need to be done for the runtime patch D68100?

ABataev added inline comments.Dec 18 2019, 11:18 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Address comments and rebase, I think

lildmh marked an inline comment as done.Dec 18 2019, 11:29 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

The only left comment is to split it into 2 patches, I think.

ABataev added inline comments.Dec 18 2019, 11:36 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Do it.

lildmh accepted this revision.Jul 14 2020, 12:59 PM
This revision is now accepted and ready to land.Jul 14 2020, 12:59 PM
lildmh requested review of this revision.Jul 14 2020, 1:28 PM

I'll update the diff and please check and accept after that

I'll update the diff and please check and accept after that

Ok

grokos added a subscriber: grokos.Jul 15 2020, 1:07 PM
lildmh updated this revision to Diff 278291.Jul 15 2020, 1:13 PM

Update diff and pass test. Please accept

This revision is now accepted and ready to land.Jul 15 2020, 1:17 PM

@grokos Could you upstream this patch and the runtime patch neck to neck? Upstreaming one of them will break the OpenMP offloading. It will be nice if you can test this patch locally with the runtime patch. Thanks

I tried to build clang with this patch and I get errors like:

CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope
                                    ? OMPRTL___tgt_target_teams_nowait_mapper

Where are these OMPRTL___tgt_ symbols defined?

lildmh updated this revision to Diff 278344.Jul 15 2020, 6:01 PM

Include the llvm part

Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 6:01 PM

I tried to build clang with this patch and I get errors like:

CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope
                                    ? OMPRTL___tgt_target_teams_nowait_mapper

Where are these OMPRTL___tgt_ symbols defined?

Sorry I forgot to include the llvm part. Please try again

This revision was automatically updated to reflect the committed changes.

OK, now it works. Thanks!

fhahn added a subscriber: fhahn.Jul 16 2020, 2:34 AM

I think this change introduced the following warnings, where auto & is used for types that are always copied. It would be great if you could take a look.

lvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
3 warnings generated.

I think this change introduced the following warnings, where auto & is used for types that are always copied. It would be great if you could take a look.

lvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
3 warnings generated.

Thanks, I'll fix this soon. Basically &L needs to be L instead in those 3 places. Since I don't have upload permission, it will be great if you can do the change quickly

Fix is at https://reviews.llvm.org/D83959
@Meinersbur Michael, could you please help upstream the patch above? Thanks

Seems that it already has been applied ;-)

Seems that it already has been applied ;-)

Yes, thanks Michael. Hope everything goes well with you

Starting with the commit of this patch, the libomptarget test llvm-project/openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails. Here is a stacktrace of the segfault:

$ gdb target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x0000000000400fcf in .omp_outlined._debug__ (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:22
#1  0x0000000000401e8d in .omp_outlined..23 (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:18
#2  0x00002aaaab574213 in __kmp_invoke_microtask () from libomp.so
#3  0x00002aaaab51338e in __kmp_invoke_task_func () from libomp.so
#4  0x00002aaaab5126bf in __kmp_launch_thread () from libomp.so
#5  0x00002aaaab55d3d0 in __kmp_launch_worker(void*) () from libomp.so
#6  0x00002aaaabbd6e65 in start_thread () from libpthread.so.0
#7  0x00002aaaabee988d in clone () from libc.so.6

Starting with the commit of this patch, the libomptarget test llvm-project/openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails. Here is a stacktrace of the segfault:

$ gdb target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x0000000000400fcf in .omp_outlined._debug__ (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:22
#1  0x0000000000401e8d in .omp_outlined..23 (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:18
#2  0x00002aaaab574213 in __kmp_invoke_microtask () from libomp.so
#3  0x00002aaaab51338e in __kmp_invoke_task_func () from libomp.so
#4  0x00002aaaab5126bf in __kmp_launch_thread () from libomp.so
#5  0x00002aaaab55d3d0 in __kmp_launch_worker(void*) () from libomp.so
#6  0x00002aaaabbd6e65 in start_thread () from libpthread.so.0
#7  0x00002aaaabee988d in clone () from libc.so.6

Thanks. If possible, could you send me an IR generated before this patch and one after this patch, so I can probably see the problem?

I couldn't reproduce this problem because all commands are ignored on my machine. Could you send the exact compiling and running commands?
In the IR files you sent me before, any idea where segfault happens?

protze.joachim added a comment.EditedJul 22 2020, 2:40 PM

I'm executing:

$ /usr/bin/python bin/llvm-lit -vv -a projects/openmp/libomptarget/test/offloading/target_depend_nowait.cpp

which executes:

"./bin/clang++" "-fopenmp" "-pthread" "-fno-experimental-isel" "-I" "../openmp/libomptarget/test" "-I" "./projects/openmp/libomptarget/../runtime/src" "-L" "./lib" "-fopenmp-targets=x86_64-pc-linux-gnu" "../openmp/libomptarget/test/offloading/target_depend_nowait.cpp" "-o" "./projects/openmp/libomptarget/test/offloading/Output/target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu"

According to gdb, the segfault happens in 0x400fcf. %rcx is 0 at that time.

 400f89:       e8 62 fc ff ff          callq  400bf0 <__kmpc_omp_target_task_alloc@plt>
  400f8e:       45 31 c9                xor    %r9d,%r9d
  400f91:       31 c9                   xor    %ecx,%ecx
  400f93:       48 8d 55 88             lea    -0x78(%rbp),%rdx
  400f97:       48 8b 75 e0             mov    -0x20(%rbp),%rsi
  400f9b:       48 89 70 28             mov    %rsi,0x28(%rax)
  400f9f:       48 8b 75 e8             mov    -0x18(%rbp),%rsi
  400fa3:       48 89 70 30             mov    %rsi,0x30(%rax)
  400fa7:       48 8b 75 d0             mov    -0x30(%rbp),%rsi
  400fab:       48 89 70 38             mov    %rsi,0x38(%rax)
  400faf:       48 8b 75 d8             mov    -0x28(%rbp),%rsi
  400fb3:       48 89 70 40             mov    %rsi,0x40(%rax)
  400fb7:       48 8b 34 25 40 1f 40    mov    0x401f40,%rsi
  400fbe:       00 
  400fbf:       48 89 70 48             mov    %rsi,0x48(%rax)
  400fc3:       48 8b 34 25 48 1f 40    mov    0x401f48,%rsi
  400fca:       00 
  400fcb:       48 89 70 50             mov    %rsi,0x50(%rax)
> 400fcf:       48 8b 31                mov    (%rcx),%rsi
  400fd2:       48 89 70 58             mov    %rsi,0x58(%rax)
  400fd6:       48 8b 49 08             mov    0x8(%rcx),%rcx
  400fda:       48 89 48 60             mov    %rcx,0x60(%rax)
  400fde:       48 b9 00 61 60 00 00    movabs $0x606100,%rcx
  400fe5:       00 00 00 
  400fe8:       48 89 4d 88             mov    %rcx,-0x78(%rbp)
  400fec:       48 c7 45 90 04 00 00    movq   $0x4,-0x70(%rbp)
  400ff3:       00 
  400ff4:       c6 45 98 03             movb   $0x3,-0x68(%rbp)
  400ff8:       48 c7 45 80 01 00 00    movq   $0x1,-0x80(%rbp)
  400fff:       00 
  401000:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401004:       8b 31                   mov    (%rcx),%esi
  401006:       48 b9 97 1f 40 00 00    movabs $0x401f97,%rcx
  40100d:       00 00 00 
  401010:       48 89 4d b0             mov    %rcx,-0x50(%rbp)
  401014:       48 8d 7d a0             lea    -0x60(%rbp),%rdi
  401018:       48 89 95 18 fe ff ff    mov    %rdx,-0x1e8(%rbp)
  40101f:       48 89 c2                mov    %rax,%rdx
  401022:       b9 01 00 00 00          mov    $0x1,%ecx
  401027:       4c 8b 85 18 fe ff ff    mov    -0x1e8(%rbp),%r8
  40102e:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  401035:       00 
  401036:       e8 55 fb ff ff          callq  400b90 <__kmpc_omp_task_with_deps@plt>

Can you check if https://reviews.llvm.org/D84470 fixes the problem as this test doesn't work on my machine? If yes, please accept and upstream it since I don't have the permission.