This patch implements the code generation to use OpenMP 5.0 declare mapper (e.g., user-defined mapper) constructs. It looks up the proper mapper function for each map, to, or from clause that has a user-defined mapper associated, and passes them to the OpenMP runtime function.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I would suggest to split this patch into 2 part: the first one is NFC and just makes the ompiler to use the new interfaces and the second with the new functionality for mappers.
include/clang/AST/OpenMPClause.h | ||
---|---|---|
4298 ↗ | (On Diff #229200) | Not in Camel format |
4543 ↗ | (On Diff #229200) | Why changed to const Expr * here? I believe, ArrayRef makes it constant automatically. |
4596 ↗ | (On Diff #229200) | Not in Camel format |
4599 ↗ | (On Diff #229200) | No need for const in ArrayRef |
4621 ↗ | (On Diff #229200) | Not in Camel format |
4743 ↗ | (On Diff #229200) | ArrayRef<const Expr *>()->llvm::None? |
4750 ↗ | (On Diff #229200) | ArrayRef<const Expr *>()->llvm::None |
4763 ↗ | (On Diff #229200) | Same here |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
742 ↗ | (On Diff #229200) | Better to fix it in a separate patch |
755–794 ↗ | (On Diff #229200) | I believe these new functions replace completely the old one, so old functions must be removed |
2746 ↗ | (On Diff #229200) | Same here, remove processing for old functions. |
7388 ↗ | (On Diff #229200) | CamelCase |
7388 ↗ | (On Diff #229200) | Seems to me, with mapper you're changing the contract of this function. It is supposed to return the size in bytes, with Mapper it returns just number of elements. Bad decision. Better to convert size in bytes to number of elements in place where you need it. |
7411 ↗ | (On Diff #229200) | Why do we return 1 here? |
7412 ↗ | (On Diff #229200) | No need for else here |
7431–7434 ↗ | (On Diff #229200) | Same here |
7445 ↗ | (On Diff #229200) | No need for else here |
7463–7472 ↗ | (On Diff #229200) | Same here |
7564–7573 ↗ | (On Diff #229200) | Too many params in the function, worth thinking about wrapping them in a record. |
7940–7943 ↗ | (On Diff #229200) | Mappers.push_back(HasMappers?Mapper:nullptr); |
8716–8718 ↗ | (On Diff #229200) | const auto *VD = dyn_cast_or_null<VarDecl>(std::get<0>(L)); |
8821–8822 ↗ | (On Diff #229200) | In general, there should not be branch for default mapping during the codegen, the implicit map clauses must be generated in sema. |
8953–8955 ↗ | (On Diff #229200) | Better to use Builder.CreateConstArrayGEP |
8956–8957 ↗ | (On Diff #229200) | Easier cast function to the voidptr, I think |
8991–8994 ↗ | (On Diff #229200) | Hmm, just cast of the Info.MappersArray to the correct type if all indices are 0? |
8995–8996 ↗ | (On Diff #229200) | If then sub-statement in braces, the else substatement also must be in braces |
9359 ↗ | (On Diff #229200) | Better to use MapperCGF.EmitNounwindRuntimeCall. |
lib/CodeGen/CGOpenMPRuntime.h | ||
1533 ↗ | (On Diff #229200) | Camel case |
1549 ↗ | (On Diff #229200) | The flag HasMappers also must be cleared, no? |
I'll split the patch into 2 later
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
7388 ↗ | (On Diff #229200) | Yes. The reason why I did this is I found this is the single place that I can consider all situations. Otherwise, I need to divide the size by the element size in other places. Since this function has discussed all situations for the element size, I will need to duplicate the work elsewhere if I don't do it here. The latter solution is not good for code mountainous I think. |
7564–7573 ↗ | (On Diff #229200) | Maybe not in this patch? |
8821–8822 ↗ | (On Diff #229200) | Not sure where this code will be used. I guess it's still correct to add a null to the mappers here? |
8953–8955 ↗ | (On Diff #229200) | I saw all the other places are using CreateConstInBoundsGEP2_32 to get sizes, types, etc. So I kept it consistent here. Maybe these modifications should be in a future patch? |
8956–8957 ↗ | (On Diff #229200) | Again, this is to keep it consistent with previous code above? |
8991–8994 ↗ | (On Diff #229200) | Again, this is to keep consistent with previous code? |
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | Do we really need to set HasMapper to true unconditionally here and in other places? | |
clang/lib/CodeGen/CGOpenMPRuntime.h | ||
916–917 | Is this required to make it virtual? | |
1653 | (!HasMapper || MappersArray)? | |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
7388 ↗ | (On Diff #229200) | Changing the original function contract is also not the best solution. Better to introduce the new function. If some of the code can be reused, need to factor out the common code and reuse it in the old function and the new one. |
7388 ↗ | (On Diff #229200) | Also, additional question. Does it mean that we pass to the runtime functions different values in different situations: in one case we pass size in bytes, with mappers - number of elements? If so, it is a very bad idea caused by bad design. We need to pass either size in bytes, or number of elements in this argument. |
7564–7573 ↗ | (On Diff #229200) | It is better to make it in this patch or make a separate patch and commit it before this one. We try to resolve this as soon as possible. |
8821–8822 ↗ | (On Diff #229200) | I mean, that in general, it would be good to see here some kind of an assertion. Almost all situations must be resolved in Sema. |
8953–8955 ↗ | (On Diff #229200) | Some of the modifications were made before we had these new functions. Now, we have the new ones and if we can use it, better to switch to the new safer versions. |
8956–8957 ↗ | (On Diff #229200) | No need to rely on the old code if can make something cleaner and better. Cast Mfunc to voidptr in the if statement and no need to make some extra casts later. |
8991–8994 ↗ | (On Diff #229200) | Same, use the best code if we can use it. |
Alexey, thanks for the review
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | For map, to, and from clauses, they can have mappers, so it's set to true. For is_device_ptr and use_device_ptr, it's set to false. | |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
7388 ↗ | (On Diff #229200) | Yes, the current design is like this. Let me see how to make it better |
8821–8822 ↗ | (On Diff #229200) | Okay, I guess it's not the work of this patch |
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | So, it completely depends on the clause kind, right? If so, can we just remove it and rely on the clause kind? |
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | This is used in OMPMappableExprListClause which is inherited by all these clauses. Do you mean to check the template type there to get the clause kind? |
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | I see. I think it is better to rename it to ClauseSupportsMapper (or something like this) and make it const. Currently, it is confusing. I thought that this flag shows the use of the mapper in the clause. |
clang/include/clang/AST/OpenMPClause.h | ||
---|---|---|
5440–5442 | Ok, will do that |
Looks much better in general, need to resolve last one issue with number of elements/size and you're ready.
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | So, we're still going to use number of elements for mappers? And pass it in the same parameter that in other cases is used as size in bytes? If so, point to it explicitly in the review for the runtime part so all are informed about it. | |
9307 | getOrEmitUserDefinedMapperFunc? |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | From interface, the mapper function uses size in bytes now. Inside, it needs number of elements to iterate through all elements. This has no impact on the runtime part, since it looks like normal mapping from the interface. All conversion happens inside the mapper function which is completely generated by the compiler. | |
9307 | I guess getUserDefinedMapperFunc is a better name? Because the user uses this function to get the mapper function. Emitting a mapper function is a side effect. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | Could you point where we have this kind of analysis here? Because I don't see anything affected by this change in the patch. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | Is this a bug fix in the previous implementation? |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | The previous implementation assumes the size is the number of elements, and it works correctly under that assumption. Since we change the meaning of size here, I add this line of code so the previous implementation can work correctly in the new assumption that the size is the size in bytes. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | Ah, got it. Then, in general, looks good. Please, split the patches in to, possibly, 2 NFC (one with using of the new functions + another one for aggregating too many params into records) + another one with the new functionality. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | Address comments and rebase, I think |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | The only left comment is to split it into 2 patches, I think. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
9038–9039 | Do it. |
@grokos Could you upstream this patch and the runtime patch neck to neck? Upstreaming one of them will break the OpenMP offloading. It will be nice if you can test this patch locally with the runtime patch. Thanks
I tried to build clang with this patch and I get errors like:
CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope ? OMPRTL___tgt_target_teams_nowait_mapper
Where are these OMPRTL___tgt_ symbols defined?
I think this change introduced the following warnings, where auto & is used for types that are always copied. It would be great if you could take a look.
lvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis] for (const auto &L : C->component_lists()) { ^ llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>' for (const auto &L : C->component_lists()) { ^~~~~~~~~~~~~~~ llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis] for (const auto &L : C->component_lists()) { ^ llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>' for (const auto &L : C->component_lists()) { ^~~~~~~~~~~~~~~ llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis] for (const auto &L : C->component_lists()) { ^ llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>' for (const auto &L : C->component_lists()) { ^~~~~~~~~~~~~~~ 3 warnings generated.
Thanks, I'll fix this soon. Basically &L needs to be L instead in those 3 places. Since I don't have upload permission, it will be great if you can do the change quickly
Fix is at https://reviews.llvm.org/D83959
@Meinersbur Michael, could you please help upstream the patch above? Thanks
Starting with the commit of this patch, the libomptarget test llvm-project/openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails. Here is a stacktrace of the segfault:
$ gdb target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu (gdb) run ... Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x0000000000400fcf in .omp_outlined._debug__ (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:22 #1 0x0000000000401e8d in .omp_outlined..23 (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:18 #2 0x00002aaaab574213 in __kmp_invoke_microtask () from libomp.so #3 0x00002aaaab51338e in __kmp_invoke_task_func () from libomp.so #4 0x00002aaaab5126bf in __kmp_launch_thread () from libomp.so #5 0x00002aaaab55d3d0 in __kmp_launch_worker(void*) () from libomp.so #6 0x00002aaaabbd6e65 in start_thread () from libpthread.so.0 #7 0x00002aaaabee988d in clone () from libc.so.6
Thanks. If possible, could you send me an IR generated before this patch and one after this patch, so I can probably see the problem?
I couldn't reproduce this problem because all commands are ignored on my machine. Could you send the exact compiling and running commands?
In the IR files you sent me before, any idea where segfault happens?
I'm executing:
$ /usr/bin/python bin/llvm-lit -vv -a projects/openmp/libomptarget/test/offloading/target_depend_nowait.cpp
which executes:
"./bin/clang++" "-fopenmp" "-pthread" "-fno-experimental-isel" "-I" "../openmp/libomptarget/test" "-I" "./projects/openmp/libomptarget/../runtime/src" "-L" "./lib" "-fopenmp-targets=x86_64-pc-linux-gnu" "../openmp/libomptarget/test/offloading/target_depend_nowait.cpp" "-o" "./projects/openmp/libomptarget/test/offloading/Output/target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu"
According to gdb, the segfault happens in 0x400fcf. %rcx is 0 at that time.
400f89: e8 62 fc ff ff callq 400bf0 <__kmpc_omp_target_task_alloc@plt> 400f8e: 45 31 c9 xor %r9d,%r9d 400f91: 31 c9 xor %ecx,%ecx 400f93: 48 8d 55 88 lea -0x78(%rbp),%rdx 400f97: 48 8b 75 e0 mov -0x20(%rbp),%rsi 400f9b: 48 89 70 28 mov %rsi,0x28(%rax) 400f9f: 48 8b 75 e8 mov -0x18(%rbp),%rsi 400fa3: 48 89 70 30 mov %rsi,0x30(%rax) 400fa7: 48 8b 75 d0 mov -0x30(%rbp),%rsi 400fab: 48 89 70 38 mov %rsi,0x38(%rax) 400faf: 48 8b 75 d8 mov -0x28(%rbp),%rsi 400fb3: 48 89 70 40 mov %rsi,0x40(%rax) 400fb7: 48 8b 34 25 40 1f 40 mov 0x401f40,%rsi 400fbe: 00 400fbf: 48 89 70 48 mov %rsi,0x48(%rax) 400fc3: 48 8b 34 25 48 1f 40 mov 0x401f48,%rsi 400fca: 00 400fcb: 48 89 70 50 mov %rsi,0x50(%rax) > 400fcf: 48 8b 31 mov (%rcx),%rsi 400fd2: 48 89 70 58 mov %rsi,0x58(%rax) 400fd6: 48 8b 49 08 mov 0x8(%rcx),%rcx 400fda: 48 89 48 60 mov %rcx,0x60(%rax) 400fde: 48 b9 00 61 60 00 00 movabs $0x606100,%rcx 400fe5: 00 00 00 400fe8: 48 89 4d 88 mov %rcx,-0x78(%rbp) 400fec: 48 c7 45 90 04 00 00 movq $0x4,-0x70(%rbp) 400ff3: 00 400ff4: c6 45 98 03 movb $0x3,-0x68(%rbp) 400ff8: 48 c7 45 80 01 00 00 movq $0x1,-0x80(%rbp) 400fff: 00 401000: 48 8b 4d f8 mov -0x8(%rbp),%rcx 401004: 8b 31 mov (%rcx),%esi 401006: 48 b9 97 1f 40 00 00 movabs $0x401f97,%rcx 40100d: 00 00 00 401010: 48 89 4d b0 mov %rcx,-0x50(%rbp) 401014: 48 8d 7d a0 lea -0x60(%rbp),%rdi 401018: 48 89 95 18 fe ff ff mov %rdx,-0x1e8(%rbp) 40101f: 48 89 c2 mov %rax,%rdx 401022: b9 01 00 00 00 mov $0x1,%ecx 401027: 4c 8b 85 18 fe ff ff mov -0x1e8(%rbp),%r8 40102e: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) 401035: 00 401036: e8 55 fb ff ff callq 400b90 <__kmpc_omp_task_with_deps@plt>
Can you check if https://reviews.llvm.org/D84470 fixes the problem as this test doesn't work on my machine? If yes, please accept and upstream it since I don't have the permission.
Do we really need to set HasMapper to true unconditionally here and in other places?