This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime
ClosedPublic

Authored by lildmh on Sep 20 2019, 6:51 AM.

Details

Summary

This patch implements the code generation to use OpenMP 5.0 declare mapper (e.g., user-defined mapper) constructs. It looks up the proper mapper function for each map, to, or from clause that has a user-defined mapper associated, and passes them to the OpenMP runtime function.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

Diff Detail

Event Timeline

lildmh created this revision.Sep 20 2019, 6:51 AM
Herald added a project: Restricted Project. · View Herald Transcript
lildmh updated this revision to Diff 221578.Sep 24 2019, 1:00 PM
lildmh added reviewers: ABataev, Meinersbur, hfinkel.
lildmh added a project: Restricted Project.

I would suggest to split this patch into 2 part: the first one is NFC and just makes the ompiler to use the new interfaces and the second with the new functionality for mappers.

include/clang/AST/OpenMPClause.h
4292

Not in Camel format

4537–4541

Why changed to const Expr * here? I believe, ArrayRef makes it constant automatically.

4590

Not in Camel format

4593

No need for const in ArrayRef

4615–4616

Not in Camel format

4737

ArrayRef<const Expr *>()->llvm::None?

4744

ArrayRef<const Expr *>()->llvm::None

4757

Same here

lib/CodeGen/CGOpenMPRuntime.cpp
743

Better to fix it in a separate patch

756–795

I believe these new functions replace completely the old one, so old functions must be removed

2688

Same here, remove processing for old functions.

7440

CamelCase

7440

Seems to me, with mapper you're changing the contract of this function. It is supposed to return the size in bytes, with Mapper it returns just number of elements. Bad decision. Better to convert size in bytes to number of elements in place where you need it.

7461

Why do we return 1 here?

7462

No need for else here

7481–7484

Same here

7488–7497

Same here

7489

No need for else here

7592–7601

Too many params in the function, worth thinking about wrapping them in a record.

7968–7971

Mappers.push_back(HasMappers?Mapper:nullptr);

8751–8753

const auto *VD = dyn_cast_or_null<VarDecl>(std::get<0>(L));

8856–8857

In general, there should not be branch for default mapping during the codegen, the implicit map clauses must be generated in sema.

8988–8990

Better to use Builder.CreateConstArrayGEP

8991–8992

Easier cast function to the voidptr, I think

9024–9027

Hmm, just cast of the Info.MappersArray to the correct type if all indices are 0?

9028–9029

If then sub-statement in braces, the else substatement also must be in braces

9385

Better to use MapperCGF.EmitNounwindRuntimeCall.

lib/CodeGen/CGOpenMPRuntime.h
1519

Camel case

1533

The flag HasMappers also must be cleared, no?

lildmh updated this revision to Diff 234113.Dec 16 2019, 11:39 AM
lildmh marked 28 inline comments as done.Dec 16 2019, 11:39 AM

Rebase and address comments

I'll split the patch into 2 later

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Yes. The reason why I did this is I found this is the single place that I can consider all situations. Otherwise, I need to divide the size by the element size in other places. Since this function has discussed all situations for the element size, I will need to duplicate the work elsewhere if I don't do it here. The latter solution is not good for code mountainous I think.

7592–7601

Maybe not in this patch?

8856–8857

Not sure where this code will be used. I guess it's still correct to add a null to the mappers here?

8988–8990

I saw all the other places are using CreateConstInBoundsGEP2_32 to get sizes, types, etc. So I kept it consistent here. Maybe these modifications should be in a future patch?

8991–8992

Again, this is to keep it consistent with previous code above?

9024–9027

Again, this is to keep consistent with previous code?

ABataev added inline comments.Dec 16 2019, 12:14 PM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

Do we really need to set HasMapper to true unconditionally here and in other places?

clang/lib/CodeGen/CGOpenMPRuntime.h
841–842 ↗(On Diff #234113)

Is this required to make it virtual?

1556 ↗(On Diff #234113)

(!HasMapper || MappersArray)?

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Changing the original function contract is also not the best solution. Better to introduce the new function. If some of the code can be reused, need to factor out the common code and reuse it in the old function and the new one.

7440

Also, additional question. Does it mean that we pass to the runtime functions different values in different situations: in one case we pass size in bytes, with mappers - number of elements? If so, it is a very bad idea caused by bad design. We need to pass either size in bytes, or number of elements in this argument.

7592–7601

It is better to make it in this patch or make a separate patch and commit it before this one. We try to resolve this as soon as possible.

8856–8857

I mean, that in general, it would be good to see here some kind of an assertion. Almost all situations must be resolved in Sema.

8988–8990

Some of the modifications were made before we had these new functions. Now, we have the new ones and if we can use it, better to switch to the new safer versions.

8991–8992

No need to rely on the old code if can make something cleaner and better. Cast Mfunc to voidptr in the if statement and no need to make some extra casts later.

9024–9027

Same, use the best code if we can use it.

lildmh marked 5 inline comments as done.Dec 16 2019, 12:41 PM

Alexey, thanks for the review

clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

For map, to, and from clauses, they can have mappers, so it's set to true. For is_device_ptr and use_device_ptr, it's set to false.

lib/CodeGen/CGOpenMPRuntime.cpp
7440

Yes, the current design is like this. Let me see how to make it better

8856–8857

Okay, I guess it's not the work of this patch

ABataev added inline comments.Dec 16 2019, 12:45 PM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

So, it completely depends on the clause kind, right? If so, can we just remove it and rely on the clause kind?

lildmh marked 4 inline comments as done.Dec 16 2019, 4:38 PM
lildmh added inline comments.
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

This is used in OMPMappableExprListClause which is inherited by all these clauses. Do you mean to check the template type there to get the clause kind?

ABataev added inline comments.Dec 17 2019, 7:31 AM
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

I see. I think it is better to rename it to ClauseSupportsMapper (or something like this) and make it const. Currently, it is confusing. I thought that this flag shows the use of the mapper in the clause.

lildmh marked an inline comment as done.Dec 17 2019, 7:46 AM
lildmh added inline comments.
clang/include/clang/AST/OpenMPClause.h
4918 ↗(On Diff #234113)

Ok, will do that

lildmh updated this revision to Diff 234554.Dec 18 2019, 8:53 AM
lildmh marked 3 inline comments as done.

Address Alexey's comments to change mapper function size and refactor code

Looks much better in general, need to resolve last one issue with number of elements/size and you're ready.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

So, we're still going to use number of elements for mappers? And pass it in the same parameter that in other cases is used as size in bytes? If so, point to it explicitly in the review for the runtime part so all are informed about it.

9244 ↗(On Diff #234554)

getOrEmitUserDefinedMapperFunc?

lildmh marked 2 inline comments as done.Dec 18 2019, 9:40 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

From interface, the mapper function uses size in bytes now. Inside, it needs number of elements to iterate through all elements. This has no impact on the runtime part, since it looks like normal mapping from the interface. All conversion happens inside the mapper function which is completely generated by the compiler.

9244 ↗(On Diff #234554)

I guess getUserDefinedMapperFunc is a better name? Because the user uses this function to get the mapper function. Emitting a mapper function is a side effect.

ABataev added inline comments.Dec 18 2019, 9:43 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Ok. Then why do we need to convert size in bytes to number of elements here?

9244 ↗(On Diff #234554)

I meant something like getOrCreate... but it is up to you.

lildmh marked 2 inline comments as done.Dec 18 2019, 9:54 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

This is used to 1) see if we are going to map an array of elements with mapper, and 2) iterate all to map them individually.

9244 ↗(On Diff #234554)

Okay, sounds good

ABataev added inline comments.Dec 18 2019, 10:00 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Could you point where we have this kind of analysis here? Because I don't see anything affected by this change in the patch.

lildmh marked 2 inline comments as done.Dec 18 2019, 10:31 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8986 ↗(On Diff #234554)

Size is used to compute the last element position, which is used as the loop boundary.

9207 ↗(On Diff #234554)

Size is used to see if this is an array.

ABataev added inline comments.Dec 18 2019, 10:40 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Is this a bug fix in the previous implementation?

lildmh updated this revision to Diff 234579.Dec 18 2019, 10:44 AM

Change the function name

lildmh marked an inline comment as done.Dec 18 2019, 10:54 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

The previous implementation assumes the size is the number of elements, and it works correctly under that assumption. Since we change the meaning of size here, I add this line of code so the previous implementation can work correctly in the new assumption that the size is the size in bytes.

ABataev added inline comments.Dec 18 2019, 10:59 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Ah, got it. Then, in general, looks good. Please, split the patches in to, possibly, 2 NFC (one with using of the new functions + another one for aggregating too many params into records) + another one with the new functionality.

lildmh marked an inline comment as done.Dec 18 2019, 11:14 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Okay, will do that. What do you think need to be done for the runtime patch D68100?

ABataev added inline comments.Dec 18 2019, 11:18 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Address comments and rebase, I think

lildmh marked an inline comment as done.Dec 18 2019, 11:29 AM
lildmh added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

The only left comment is to split it into 2 patches, I think.

ABataev added inline comments.Dec 18 2019, 11:36 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8981–8982 ↗(On Diff #234554)

Do it.

lildmh accepted this revision.Jul 14 2020, 12:59 PM
This revision is now accepted and ready to land.Jul 14 2020, 12:59 PM
lildmh requested review of this revision.Jul 14 2020, 1:28 PM

I'll update the diff and please check and accept after that

I'll update the diff and please check and accept after that

Ok

grokos added a subscriber: grokos.Jul 15 2020, 1:07 PM
lildmh updated this revision to Diff 278291.Jul 15 2020, 1:13 PM

Update diff and pass test. Please accept

This revision is now accepted and ready to land.Jul 15 2020, 1:17 PM

@grokos Could you upstream this patch and the runtime patch neck to neck? Upstreaming one of them will break the OpenMP offloading. It will be nice if you can test this patch locally with the runtime patch. Thanks

I tried to build clang with this patch and I get errors like:

CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope
                                    ? OMPRTL___tgt_target_teams_nowait_mapper

Where are these OMPRTL___tgt_ symbols defined?

lildmh updated this revision to Diff 278344.Jul 15 2020, 6:01 PM

Include the llvm part

Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 6:01 PM

I tried to build clang with this patch and I get errors like:

CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope
                                    ? OMPRTL___tgt_target_teams_nowait_mapper

Where are these OMPRTL___tgt_ symbols defined?

Sorry I forgot to include the llvm part. Please try again

This revision was automatically updated to reflect the committed changes.

OK, now it works. Thanks!

fhahn added a subscriber: fhahn.Jul 16 2020, 2:34 AM

I think this change introduced the following warnings, where auto & is used for types that are always copied. It would be great if you could take a look.

lvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
3 warnings generated.

I think this change introduced the following warnings, where auto & is used for types that are always copied. It would be great if you could take a look.

lvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPToClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8011:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPFromClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8016:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:24: warning: loop variable 'L' is always a copy because the range of type 'clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_range' (aka 'iterator_range<clang::OMPMappableExprListClause<clang::OMPUseDevicePtrClause>::const_component_lists_iterator>') does not return a reference [-Wrange-loop-analysis]
      for (const auto &L : C->component_lists()) {
                       ^
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp:8032:12: note: use non-reference type 'std::__1::tuple<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>, const clang::ValueDecl *>'
      for (const auto &L : C->component_lists()) {
           ^~~~~~~~~~~~~~~
3 warnings generated.

Thanks, I'll fix this soon. Basically &L needs to be L instead in those 3 places. Since I don't have upload permission, it will be great if you can do the change quickly

Fix is at https://reviews.llvm.org/D83959
@Meinersbur Michael, could you please help upstream the patch above? Thanks

Seems that it already has been applied ;-)

Seems that it already has been applied ;-)

Yes, thanks Michael. Hope everything goes well with you

Starting with the commit of this patch, the libomptarget test llvm-project/openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails. Here is a stacktrace of the segfault:

$ gdb target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x0000000000400fcf in .omp_outlined._debug__ (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:22
#1  0x0000000000401e8d in .omp_outlined..23 (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:18
#2  0x00002aaaab574213 in __kmp_invoke_microtask () from libomp.so
#3  0x00002aaaab51338e in __kmp_invoke_task_func () from libomp.so
#4  0x00002aaaab5126bf in __kmp_launch_thread () from libomp.so
#5  0x00002aaaab55d3d0 in __kmp_launch_worker(void*) () from libomp.so
#6  0x00002aaaabbd6e65 in start_thread () from libpthread.so.0
#7  0x00002aaaabee988d in clone () from libc.so.6

Starting with the commit of this patch, the libomptarget test llvm-project/openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails. Here is a stacktrace of the segfault:

$ gdb target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x0000000000400fcf in .omp_outlined._debug__ (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:22
#1  0x0000000000401e8d in .omp_outlined..23 (.global_tid.=0x2aaab96b9be0, .bound_tid.=0x2aaab96b9bd8) at target_depend_nowait.cpp:18
#2  0x00002aaaab574213 in __kmp_invoke_microtask () from libomp.so
#3  0x00002aaaab51338e in __kmp_invoke_task_func () from libomp.so
#4  0x00002aaaab5126bf in __kmp_launch_thread () from libomp.so
#5  0x00002aaaab55d3d0 in __kmp_launch_worker(void*) () from libomp.so
#6  0x00002aaaabbd6e65 in start_thread () from libpthread.so.0
#7  0x00002aaaabee988d in clone () from libc.so.6

Thanks. If possible, could you send me an IR generated before this patch and one after this patch, so I can probably see the problem?

I couldn't reproduce this problem because all commands are ignored on my machine. Could you send the exact compiling and running commands?
In the IR files you sent me before, any idea where segfault happens?

protze.joachim added a comment.EditedJul 22 2020, 2:40 PM

I'm executing:

$ /usr/bin/python bin/llvm-lit -vv -a projects/openmp/libomptarget/test/offloading/target_depend_nowait.cpp

which executes:

"./bin/clang++" "-fopenmp" "-pthread" "-fno-experimental-isel" "-I" "../openmp/libomptarget/test" "-I" "./projects/openmp/libomptarget/../runtime/src" "-L" "./lib" "-fopenmp-targets=x86_64-pc-linux-gnu" "../openmp/libomptarget/test/offloading/target_depend_nowait.cpp" "-o" "./projects/openmp/libomptarget/test/offloading/Output/target_depend_nowait.cpp.tmp-x86_64-pc-linux-gnu"

According to gdb, the segfault happens in 0x400fcf. %rcx is 0 at that time.

 400f89:       e8 62 fc ff ff          callq  400bf0 <__kmpc_omp_target_task_alloc@plt>
  400f8e:       45 31 c9                xor    %r9d,%r9d
  400f91:       31 c9                   xor    %ecx,%ecx
  400f93:       48 8d 55 88             lea    -0x78(%rbp),%rdx
  400f97:       48 8b 75 e0             mov    -0x20(%rbp),%rsi
  400f9b:       48 89 70 28             mov    %rsi,0x28(%rax)
  400f9f:       48 8b 75 e8             mov    -0x18(%rbp),%rsi
  400fa3:       48 89 70 30             mov    %rsi,0x30(%rax)
  400fa7:       48 8b 75 d0             mov    -0x30(%rbp),%rsi
  400fab:       48 89 70 38             mov    %rsi,0x38(%rax)
  400faf:       48 8b 75 d8             mov    -0x28(%rbp),%rsi
  400fb3:       48 89 70 40             mov    %rsi,0x40(%rax)
  400fb7:       48 8b 34 25 40 1f 40    mov    0x401f40,%rsi
  400fbe:       00 
  400fbf:       48 89 70 48             mov    %rsi,0x48(%rax)
  400fc3:       48 8b 34 25 48 1f 40    mov    0x401f48,%rsi
  400fca:       00 
  400fcb:       48 89 70 50             mov    %rsi,0x50(%rax)
> 400fcf:       48 8b 31                mov    (%rcx),%rsi
  400fd2:       48 89 70 58             mov    %rsi,0x58(%rax)
  400fd6:       48 8b 49 08             mov    0x8(%rcx),%rcx
  400fda:       48 89 48 60             mov    %rcx,0x60(%rax)
  400fde:       48 b9 00 61 60 00 00    movabs $0x606100,%rcx
  400fe5:       00 00 00 
  400fe8:       48 89 4d 88             mov    %rcx,-0x78(%rbp)
  400fec:       48 c7 45 90 04 00 00    movq   $0x4,-0x70(%rbp)
  400ff3:       00 
  400ff4:       c6 45 98 03             movb   $0x3,-0x68(%rbp)
  400ff8:       48 c7 45 80 01 00 00    movq   $0x1,-0x80(%rbp)
  400fff:       00 
  401000:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401004:       8b 31                   mov    (%rcx),%esi
  401006:       48 b9 97 1f 40 00 00    movabs $0x401f97,%rcx
  40100d:       00 00 00 
  401010:       48 89 4d b0             mov    %rcx,-0x50(%rbp)
  401014:       48 8d 7d a0             lea    -0x60(%rbp),%rdi
  401018:       48 89 95 18 fe ff ff    mov    %rdx,-0x1e8(%rbp)
  40101f:       48 89 c2                mov    %rax,%rdx
  401022:       b9 01 00 00 00          mov    $0x1,%ecx
  401027:       4c 8b 85 18 fe ff ff    mov    -0x1e8(%rbp),%r8
  40102e:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  401035:       00 
  401036:       e8 55 fb ff ff          callq  400b90 <__kmpc_omp_task_with_deps@plt>

Can you check if https://reviews.llvm.org/D84470 fixes the problem as this test doesn't work on my machine? If yes, please accept and upstream it since I don't have the permission.