This patch implements the code generation for OpenMP 5.0 declare mapper (user-defined mapper) constructs. For each declare mapper, a mapper function is generated. These mapper functions will be called by the runtime and/or other mapper functions to achieve user defined mapping.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Details
- Reviewers
ABataev hfinkel Meinersbur kkwli0 jdoerfert - Commits
- rGd47b9438d7b7: [OpenMP 5.0] Codegen support for user-defined mappers.
rL367905: [OpenMP 5.0] Codegen support for user-defined mappers.
rC367905: [OpenMP 5.0] Codegen support for user-defined mappers.
rGa04ffdbb05fb: [OpenMP 5.0] Codegen support for user-defined mappers.
rC367773: [OpenMP 5.0] Codegen support for user-defined mappers.
rL367773: [OpenMP 5.0] Codegen support for user-defined mappers.
Diff Detail
Event Timeline
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
758 | The runtime part uses int64_t because I see every other runtime function use int64_t instead of size_t for size, e.g., __tgt_target, __tgt_target_teams, etc., despite that they are declared to be size_t in Clang codegen. So I guess it is done on purpose? Otherwise we need to modify all these runtime function interface in the future. | |
8838 | Hi Alexey, I don't think this scheme is more efficient than the current scheme. My reasons are:
|
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
758 | I recently committed the patch that fixes this problem in clang. If you're using int64_t in the runtime, the same type must be used in clang codegen. | |
8838 | Hi Lingda, I'm trying to simplify the code generated by clang and avoid some unnecessary code duplications. If the complexity of this scheme is the same as proposed by you, I would prefer to use this scheme unless there are some other opinions.
Also, this solution reduces the number of required runtime functions, instead of 2 we need just 1 and, thus, we need to make fewer runtime functions calls. I think it would better to propose this scheme as an alternate design and discuss it in the OpenMP telecon. What do you think? Or we can try to discuss it in the offline mode via the e-mail with other members. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
758 | That's nice to know. Then I will change the type of size to int64_t as well. | |
8838 | Hi Alexey, I still prefer the current scheme, because:
But if you have a different option, we can discuss it next time in the meeting. I do have a time constraint to work on the mapper implementation. I'll no longer work in this project starting this September, and I have about 30% of my time working on it until then. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Lingda,
I see :(. I understand your concern. In this case, we could try to discuss it offline, in the mailing list, to make it a little bit faster. We just need to hear other opinions on this matter, maybe there are some other pros and cons for these schemes. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Hi Alexey, Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing list later.
We won't have recursive calls with inlining.
I think it's even harder to optimized these code out when we move them into the runtime.
I don't quite understand. It's still equal to the number of mappers in any case. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
Good, thanks!
We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure that inlining is the best option here. We have a lot of code for each mapper and I'm not sure that the optimizer will be able to squash it effectively.
Definitely not, unless we use LTO or inlined runtime. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
Sorry I should not say recursive calls. Here it needs to "recursively" call other mapper functions in case of nested mappers, but we don't need it in case of inlining.
But you are proposing to move many code to the runtime here, right? That doesn't make sense to me. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
I'm just not sure that there going be significant problems with the performance because of that. And it significantly simplifies codegen in the compiler and moves the common part into a single function. Plus, if in future we'll need to modify this functionality for some reason, 2 different versions of the compiler will produce incompatible code. With my scheme, you still can use old runtime and have the same functionality as the old compiler and the new one. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Hi Alexey, I think more carefully about your scheme, and I don't think we can solve the 2 problems below with this scheme:
Based on these reasons, I think the current scheme is still more preferable. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Hi Lingda,
|
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
|
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
|
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
Besides, I don't think there is a valid reason that the current scheme is not good. You mentioned it's complex codegen. But it only has less 200 loc here, I don't see why it is complex. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
I think it would be better to share it with others and listen to their opinions. It is better to spend some extra time to provide good design. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 |
Sure, I can release the discussion to the mailing list. I don't see a reason to use the new scheme now. |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Lingda, I meant to send the message to the OpenMP telecon list :) Could you forward the same e-mail to Ravi and others, please? |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8838 | Sure, probably no one in the public mailing list will care about it. I'll send it to the bi-weekly meeting list. |
Sure, I'll do it next week, since I'm on vacation and don't have access to my desktop.
lib/CodeGen/CGDecl.cpp | ||
---|---|---|
2534 | Why do we need to emit it for simd only mode? | |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
1695 | You're looking for CGF.CurFn twice here, used find member function instead and work with iterator. | |
7125–7133 | Maybe it is better to define a constant constexpr uint64_t OMP_MEMBER_OF_RANK = 48 and then deduce OMP_MAP_MEMBER_OF as ~((1<<OMP_MEMBER_OF_RANK) - 1)? | |
8122 | AFAIK, LLVM has dropped support for msvc 2013, do we still need this? | |
lib/CodeGen/CGOpenMPRuntime.h | ||
352 | Use using instead of typedef. |
lib/CodeGen/CGDecl.cpp | ||
---|---|---|
2534 | This code is not emitting mapper for simd only mode. | |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
7125–7133 | In libomptarget, the same way is used to define OMP_TGT_MAPTYPE_MEMBER_OF: OMP_TGT_MAPTYPE_MEMBER_OF = 0xffff000000000000. So I think they should stay the same. Btw, the number 48 is directly used in libomptarget now, which may need to change in the future. In your code, it assumes bits higher than 48 are all OMP_MAP_MEMBER_OF, which may not be true in the future. My code here is more universal, although it does not look great. What do you think? |
lib/CodeGen/CGDecl.cpp | ||
---|---|---|
2534 | Ah, yes, it is the condition for the early exit. | |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
7125–7133 | You can apply a mask to drop some of the most significant bits if required. My code looks much cleaner it would be good to have the same code in libomptarget too. | |
8122 | Ping. | |
lib/CodeGen/CGOpenMPRuntime.h | ||
818–823 | I don't think we need virtual functions here, non-virtual are good enough. | |
2111 | I think you can drop this function here if the original function is not virtual |
lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
7125–7133 | Maybe let's keep it this way in this patch, then we modify both places in a further patch? | |
8122 | I'm okay either way. I guess it doesn't hurt to keep it? | |
lib/CodeGen/CGOpenMPRuntime.h | ||
2111 | The function for simd only mode includes a llvm_unreachable, so I think it's still needed as a virtual function above |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | It is better to reduce number of virtual functions, if possible. |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | What about virtual function? |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | Since emitUserDefinedMapper here overrides the same function in CGOpenMPRuntime, I think emitUserDefinedMapper of CGOpenMPRuntime needs to be defined as virtual? |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | If you are asking about what virtual function I removed, it is emitUDMapperArrayInitOrDel of CGOpenMPRuntime which is never overrided. |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | I would suggest to not make this virtual too and remove overriden version. It does not help. |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | If we remove virtual from emitUserDefinedMapper, we will have to define it in both CGOpenMPRuntime and CGOpenMPSIMDRuntime, and its definition is identical in both places. I don't think that's good software engineering practice? |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2111 | WE just don't need an overridden version in CGOpenMPSIMDRuntime. Just remove virtual and remove the overridden function from CGOpenMPSIMDRuntime. |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
823 | Seems to me, this function is used only in emitUserDefinedMapper. I think you can make it static local in the CGOpenMPRuntime.cpp and do not expose it in the interface. |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
823 | emitUserDefinedMapper needs to call createRuntimeFunction of CGOpenMPRuntime, which is private. Which one do you think is better, make createRuntimeFunction public, or have emitUserDefinedMapper not defined in CGOpenMPRuntime? It seems to me that they are similar |
lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
823 | Then make emitUDMapperArrayInitOrDel private instead. |
test/OpenMP/declare_mapper_codegen.cpp | ||
---|---|---|
44–48 | Just like in this test when you're using vars. |
test/OpenMP/declare_mapper_codegen.cpp | ||
---|---|---|
44–48 | Sorry I was not clear before. What do you mean by "predetermined indices" here? If you are referring to, for example, %0 in store i8* %0, i8** [[HANDLEADDR:%[^,]+]], I guess there is no way to get rid of %0 because it means the first argument of the function? |
test/OpenMP/declare_mapper_codegen.cpp | ||
---|---|---|
44–48 | Yes, I meant those %0 like registers. Better to mark them as variables in function declaration and use those names in the checks. |
test/OpenMP/declare_mapper_codegen.cpp | ||
---|---|---|
44–48 | Now it's like define {{.*}}void @.omp_mapper.{{.*}}C.id{{.*}}(i8*, i8*, i8*, i64, i64), I think you are suggesting something like define {{.*}}void @.omp_mapper.{{.*}}C.id{{.*}}(i8* [[HANDLE:%[^,]+]], i8* [[BPTR:%[^,]+]], ...), and later I can use store i8* [[HANDLE]], i8** [[HANDLEADDR:%[^,]+]] I'm not sure how to add names for function arguments. They seems to be always nameless like (i8*, i8*, i8*, i64, i64). Is there a way to do that? |
test/OpenMP/declare_mapper_codegen.cpp | ||
---|---|---|
44–48 | If the clang parameters have names, the llvm params also will get the names. But it is not worth it to add the names to the function. Could just use regexp here to avoid using LLVM register names? Just %{{·+}}. And rely on the order, i.e. remove -DAG checks? |