This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/AST/
-
clang/
-
AST/
-
GlobalDecl.h
-
lib/
-
AST/
-
ASTContext.cpp
-
CodeGen/
1/3
CGDecl.cpp
11/23
CGOpenMPRuntime.h
48/87
CGOpenMPRuntime.cpp
-
ModuleBuilder.cpp
-
test/OpenMP/
-
OpenMP/
3/8
declare_mapper_codegen.cpp

Differential D59474

[OpenMP 5.0] Codegen support for user-defined mappers
ClosedPublic

Authored by lildmh on Mar 17 2019, 11:14 AM.

Download Raw Diff

Details

Reviewers

ABataev
hfinkel
Meinersbur
kkwli0
jdoerfert

Commits

rGd47b9438d7b7: [OpenMP 5.0] Codegen support for user-defined mappers.
rL367905: [OpenMP 5.0] Codegen support for user-defined mappers.
rC367905: [OpenMP 5.0] Codegen support for user-defined mappers.
rGa04ffdbb05fb: [OpenMP 5.0] Codegen support for user-defined mappers.
rC367773: [OpenMP 5.0] Codegen support for user-defined mappers.
rL367773: [OpenMP 5.0] Codegen support for user-defined mappers.

Summary

This patch implements the code generation for OpenMP 5.0 declare mapper (user-defined mapper) constructs. For each declare mapper, a mapper function is generated. These mapper functions will be called by the runtime and/or other mapper functions to achieve user defined mapping.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Combine 2 pointers into one.

lildmh marked 3 inline comments as done.Apr 26 2019, 10:36 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
2473	The runtime part is on hold for now.
lib/CodeGen/CGOpenMPRuntime.h
351	Yes, I plan to have this part in the next patch, which will implement to look up the corresponding mapper function for map. to, from clauses

ABataev added inline comments.Apr 26 2019, 12:54 PM

lib/CodeGen/CGOpenMPRuntime.cpp
8118	This code has too many common parts with the existing one. Is it possible to merge it somehow or outline into a function?
8658	Hmm, how could you calculate the required size of the array for the mapper? Or this is constant?
8692	I think it would be good to move this part to the runtime. We should just pass the mapper function to the runtime functions and the runtime should call those mapper functions and get base pointers, pointers, sizes and maptypes.
8832	With the buffering-based implementation we need only function.
lib/CodeGen/CGOpenMPRuntime.h
351	Noope, this must be the part of this patch, because it may cause the crash of the compiler during testing.
821	I don't remember precisely, but probably `\a`->`\p`
825	Use `///` style of comment

Hi Alexey,

Again, thanks for your review! Sorry I didn't get back to you in time because I was distracted by other things. Please see the comments inline.

lib/CodeGen/CGOpenMPRuntime.cpp
8118	I tried to merge it with `generateAllInfo`. The problem is `generateAllInfo` also generates information for clauses including `to, from, is_device_ptr, use_device_ptr`, which don't exist for `declare mapper`. There is no clear way to extract them separately. For example, every 4 or 5 lines, the code is intended to address a different clause type. At last, I think the most clear way is to extract all code related to map clauses into this function `generateAllInfoForMapper`. It's ~70 lines of code so not too much.
8658	I'm not sure I understand your question here. Do you mean the size when an OpenMP array section is mapped? If that's the case, it is not constant. Existing code can already handle it. Or do you mean the size of mapper array (i.e., `MapArrayType`)? This is constant and depends on how many map clauses exist in the declare mapper directive.
8692	This part cannot be moved into the runtime, because the runtime does not know the map type associated with the mapper. Another argument can be potentially added to the runtime API, but that will be more work and I don't think it's necessary
8832	Yes, in either case, we only generate functions here. Is there a problem?
lib/CodeGen/CGOpenMPRuntime.h
351	It will not crash the compiler, because this UDMap is only written in this patch, never read.

Fix code format

ABataev added inline comments.May 3 2019, 11:47 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8118	If those clauses do not exist for the declare mapper, it is fine, no problems with them. If they don't exist, we can't generate anything for them, no? But if you think, that it would be better to extract some common parts into a separate function, this also works for me.
8658	Yes, it is the question about the size of mapper array. It is the part of our discussion about mappers generation we had before. You said that it is hard to generate the sizes of the arrays since we don't know the number of map clauses at the codegen phase. bu there we have this number.
8692	I think it is better again discuss the runtime part of the patch, because everything else depends on the runtime. I would suggest to try to implement the solution we discussed before, where the required data is stored in the runtime dynamic arrays and only after that it is used to transfer the data.
8832	Sorry, I meant we'll need only one function.
lib/CodeGen/CGOpenMPRuntime.h
351	Still, you should clear it in this patch. Otherwise, you're breaking data-dependency between the patches and this is not good at all.

Hi Alexey,

Let's discuss your runtime data mapping scheme next week first. After that we will continue the review of this.

lib/CodeGen/CGOpenMPRuntime.cpp
8658	Sorry there was probably some miscommunication. What I meant is that after fully expanded, for example, from `map(mapper(id):a[0:n])`, eventually to `map(a.b.c[0:e]) map(a.k) ...`, the number of things in the results is unknown at compile time. Here, we only do one level of expansion of one instance based on the `declare mapper` directive, for example, the mapper is `declare mapper(class A a) map(a.b[0:a.n]) map(a.c)` In this case, the size of mapper array is 2, because there are 2 map clauses (actually it's more than 2 because the first map clause maps an array). This number can be decided at compile time easily.
8832	Yes, in that case, only one is needed.
lib/CodeGen/CGOpenMPRuntime.h
351	Sure, if you think that's absolutely necessary, I can add it to this patch.

In D59474#1490235, @lildmh wrote:

Hi Alexey,

Let's discuss your runtime data mapping scheme next week first. After that we will continue the review of this.

That would be good, thanks!

lib/CodeGen/CGOpenMPRuntime.cpp
8658	Let's discuss the runtime at first, later we can return to this.
lib/CodeGen/CGOpenMPRuntime.h
351	It is necessary, so, please, add it, thanks!

Implement the new mapper codegen scheme

lildmh edited the summary of this revision. (Show Details)Jun 4 2019, 5:34 AM

Address Alexey's comment about mapping between function and user-defined mappers

Hahnfeld mentioned this in D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.Jun 4 2019, 7:23 AM

Use tgt instead of kmpc for mapper runtime function names

ABataev added inline comments.Jun 10 2019, 8:48 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8828	This function looks like the universal one, regardless of the type `<type_name>` specifics. Do we really need to generate it for each particular type and mapper? Or we could use the same function for all types/mappers?

lildmh marked an inline comment as done.Jun 10 2019, 9:18 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8828	I think we need a particular mapper function for each type and mapper, because the code generated within the mapper function depends on what type and what mapper it is.

ABataev added inline comments.Jun 10 2019, 9:39 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8828	Hmm, maybe I'm wrong but I don't see significant mapper or type-specific dependencies in this mapper function. It uses the pointer to type and size of the type, but this information can be generalized, I think. Could you point the lines of code that are type and mapper specific?

lildmh marked an inline comment as done.Jun 10 2019, 9:51 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8828	Code between line 8857-8965 is type and mapper specific. For instance, `generateAllInforForMapper` depends on the map clauses associated with the mapper and the internal structure of struct/class type, and generates difference code as a result. `BasePointers.size()` also depends on the above things.

ABataev added inline comments.Jun 10 2019, 10:01 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8828	Most of these data can be passed as parameters to the function. It would be good, if we could move this function to the libomptaret library and reduce the number of changes (and, thus, complexity) of the compiler itself. It is always easier to review and to maintain the source code written in C/C++ rather than the changes in the compiler codegen. Plus, it may reduce the size of the final code significantly, I assume. I would appreciate it if you would try to move this function to libomptarget.
8956	These data can be passed as parameters to the function, no?

lildmh marked an inline comment as done.Jun 10 2019, 10:14 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8828	Hi Alexey, I don't think libomptarget can do this efficiently. For example, class C { int a; double *b } #pragma omp declare mapper(C c) map(c.a, c.b[0:1]) The codegen can directly know there are 2 components (c.a, c.b[0]) in this mapper function (3 actually when we count the pointer), and it can also know the size, starting address, map type, etc. about these components. Passing all these information to libomptarget seems to be a bad idea. Or did I get your idea wrong?

ABataev added inline comments.Jun 10 2019, 10:18 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8828	Yes, I understand this. But can we pass some additional parameters to this function so we don't need to generate a unique copy of almost the same function for all types/mappers?

lildmh marked an inline comment as done.Jun 10 2019, 10:35 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8828	For different types/mappers, the skeleton of mapper functions are similar (i.e., the things outlined in the comment here). I would say most other code is unique, for instance, the code to prepare parameters of call to `__tgt_push_mapper_component`. These code should be much more compared with the skeleton shown here. I cannot think of a way to reduce the code by passing more parameters to this function. Please let me know if you have some suggestions.

ABataev added inline comments.Jun 19 2019, 10:33 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Currently `currentComponent` is generated by the compiler. But can we instead pass this data as an extra parameter to this `omp_mapper` function.
8839	I don't see this part of logic in the code. Could you show me where is it exactly?
8884	Bad idea to do this. Better to use something like this: SmallString<256> TyStr; llvm::raw_svector_ostream Out(TyStr); CGM.getCXXABI().getMangleContext().mangleTypeName(Ty, Out);

Fix mapper function name mangling

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Emm, I think this scheme will be very difficult and inefficient. If we pass components as an argument of `omp_mapper` function, it means that the runtime needs to generate all components related to a map clause. I don't think the runtime is able to do that efficiently. On the other hand, in the current scheme, these components are naturally generated by the compiler, and the runtime only needs to know the base pointer, pointer, type, size. etc.
8839	This part doesn't exist in this patch. Currently we don't really look up the mapper for any mapped variable/array/etc. The next patch will add the code to look up the specified mapper for every map clause, and get the mapper function for them correspondingly.
8884	Sounds good. Thanks!

ABataev added inline comments.Jun 19 2019, 11:56 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	With the current scheme, we may end with the code blowout. We need to generate very similar code for different types and variables. The worst thing here is that we will be unable to optimize this huge amount of code because the codegen relies on the runtime functions and the code cannot be inlined. That's why I would like to move as much as possible code to the runtime rather than to emit it in the compiler.
8839	Then we need at least some kind of `TODO` note that this part is not implemented in this patch.
8872	Always better to use constructor with the location to generate correct debug info for all the parameters.
8965–8966	I don't like this code very much! It hides the logiс ща the MEMBER_OF flag deep inside and it is going to be very hard to update it in future if there are some changes in the flags.
9033	I don't see this logic in the comment for the function. Could you add more details for all this logic implemented here?
9094	Use `///` style of comment here Add the description of the logic implemented here
9102	Use `StringRef` or `SmallString`
9119–9125	Enclose all substatements into braces or none of them.

Address Alexey's comments

lib/CodeGen/CGOpenMPRuntime.cpp
8838	I understand your concerns. I think this is the best we can do right now. The most worrisome case will be when we have nested mappers within each other. In this case, a mapper function will call another mapper function. We can inline the inner mapper functions in this scenario, so that these mapper function can be properly optimized. As a result, I think the performance should be fine.
8965–8966	Add a function to calculate this offset. Also modify another existing place using the hard coded number 48.

Sorry for the delay, Lingda. I tried to find some better solution for this.

lib/CodeGen/CGOpenMPRuntime.cpp
755	Do we really need to use `int64_t` for number of elements? `size_t` must be enough.
8838	Instead, we can use indirect function calls passed in the array to the runtime. Do you think it is going to be slower? In your current scheme, we generate many runtime calls instead. Could you try to estimate the number of calls in cases if we'll call the mappers through the indirect function calls and in your cuurent scheme, where we need to call the runtime functions many times in each particular mapper?
8997	You can use `nuw` attribute here, I think
9123	You can use `C.toBits(CGM.getSizeSize())`
9145	Just `CGM.getSizeSize()`
9146	Add `nuw` attribute

lildmh marked an inline comment as done.Jun 25 2019, 12:37 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Hi Alexey, Sorry I don't understand your idea. What indirect function calls do you propose to be passed to the runtime? What are these functions supposed to do? The number of function calls will be exactly equal to the number of components mapped, no matter whether there are nested mappers or not. The number of components depend on the program. E.g., if we map a large array section, then there will be many more function calls.

ABataev added inline comments.Jun 25 2019, 12:49 PM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	I mean the pointers to the mapper function, generated by the compiler. In your comment, it is `c.Mapper()`

lildmh marked 4 inline comments as done.Jun 25 2019, 1:02 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	If we pass nested mapper functions to the runtime, I think it will slow down execution because of the extra level of indirect function calls. E.g., the runtime will call `omp_mapper1`, which calls the runtime back, which calls `omp_mapper2`, .... This can result in a deep call stack. I think the current implementation will be more efficient, which doesn't pass nested mappers to the runtime. One call to the outer most mapper function will have all data mapping done. The call stack will be 2 level deep (the first level is the mapper function, and the second level is `__tgt_push_mapper_component`) in this case from the runtime. There are also more compiler optimization space when we inline all nested mapper functions.

lildmh marked 2 inline comments as done.Jun 25 2019, 1:23 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
755	Because we use the return value to shift the memberof filed of map type, which is `int64_t`, so I think `int64_t` makes sense here.

lildmh updated this revision to Diff 206527.Jun 25 2019, 1:52 PM

ABataev added inline comments.Jun 25 2019, 2:53 PM

lib/CodeGen/CGOpenMPRuntime.cpp
755	Ok, there is a discrepancy between runtime part and compiler part: `__tgt_push_mapper_component` uses `size_t` for size, while the runtime function uses `int64_t`. It won't work for 32bit platform.
8838	Yes, if we leave it as is. But if instead of the bunch unique functions we'll have the common one, that accept list if indirect pointers to functions additionally, and move it to the runtime library, we won't need those 2 functions we have currently. We'll have full access to the mapping data vector in the runtime library and won't need to use those 2 accessors we have currently. Instead, we'll need just one runtime functions, which implements the whole mapping logic. We still need to call it recursively, but I assume the number of calls will remain the same as in the current scheme. Did you understand the idea? If yes, it would good if you coild try to estimate the number of function calls in current scheme and in this new scheme to estimate possible pros and cons.

lildmh marked an inline comment as done.Jun 26 2019, 9:46 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Hi Alexey, Could you give an example for this scheme? 1) I don't understand how the mapper function can have full access to the mapping data vector without providing these 2 accessors. 2) I don't think it is possible to have a common function instead of bunch of unique functions for each mapper declared.

lildmh marked an inline comment as done.Jun 26 2019, 9:52 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
755	This should work on 32bit machines, since 32bit machines also use `int64_t` for the map type?

ABataev added inline comments.Jun 26 2019, 10:14 AM

lib/CodeGen/CGOpenMPRuntime.cpp

758

Check the declaration of the runtime function in the runtime patch and here. size parameter here has type size_t, while in runtime part it is int64_t

8838

Hi Lingda, something like this.

void __tgt_mapper(void *base, void *begin, size_t size, int64_t type, auto components[]) {
  // Allocate space for an array section first.
  if (size > 1 && !maptype.IsDelete)
     <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
   
  // Map members.
  for (unsigned i = 0; i < size; i++) {
     // For each component specified by this mapper:
     for (auto c : components) {
       if (c.hasMapper())
         (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
       else
         <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
     }
  }
  // Delete the array section.
  if (size > 1 && maptype.IsDelete)
    <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
}

void <type>.mapper(void *base, void *begin, size_t size, int64_t type) {
 auto sub_components[] = {...};
 __tgt_mapper(base, begin, size, type, sub_components);
}

lildmh marked 2 inline comments as done.Jun 26 2019, 10:50 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
758	The runtime part uses `int64_t` because I see every other runtime function use `int64_t` instead of `size_t` for `size`, e.g., `__tgt_target, __tgt_target_teams`, etc., despite that they are declared to be `size_t` in Clang codegen. So I guess it is done on purpose? Otherwise we need to modify all these runtime function interface in the future.
8838	Hi Alexey, I don't think this scheme is more efficient than the current scheme. My reasons are: Most code here is essentially to generate `components`, i.e., we need to generate `c.arg_base, c.arg_begin, c.arg_size, c.arg_type` for each `c` in `components`, so there will still be a lot of code in `<type>.mapper`. It will not reduce the mapper function code, i.e., we will still have a bunch of unique mapper functions. This scheme will prevent a lot of compiler optimization from happening. In reality, a lot of computation should be redundant. E.g., for two components `c1` and `c2`, `c1`'s base may be the same as `c2`'s begin, so the compiler will be able to eliminate these reduction computation, especially when we inline all nested mapper functions together. If we move these computation into the runtime, the compiler will not be able to do such optimization. In terms of the number of `push` function calls, this scheme has the exact same number of calls as the current scheme, so I don't think this scheme can bring performance benefits. The scheme should perform worse than the current scheme, because it reduces the opportunities of compiler optimization as mentioned above.

ABataev added inline comments.Jun 26 2019, 12:11 PM

lib/CodeGen/CGOpenMPRuntime.cpp
758	I recently committed the patch that fixes this problem in clang. If you're using `int64_t` in the runtime, the same type must be used in clang codegen.
8838	Hi Lingda, I'm trying to simplify the code generated by clang and avoid some unnecessary code duplications. If the complexity of this scheme is the same as proposed by you, I would prefer to use this scheme unless there are some other opinions. It is not a problem. This code is unique and is not duplicated in the different mappers. Inlining is no solution here. We still generate to much code, which is almost the same in many cases and it will lead to very ineffective codegen because we still end up with a lot of almost the same code. This also might lead to poor performance. Yes, the number of pushes is always the same, in all possible schemes. It would be good to compare somehow the performance of both schemes, at least preliminary. Also, this solution reduces the number of required runtime functions, instead of 2 we need just 1 and, thus, we need to make fewer runtime functions calls. I think it would better to propose this scheme as an alternate design and discuss it in the OpenMP telecon. What do you think? Or we can try to discuss it in the offline mode via the e-mail with other members. I'm not trying to convince you to implement this scheme right now, but it would be good to discuss it. Maybe it will lead to some better ideas from others?

lildmh marked 2 inline comments as done.Jun 26 2019, 12:48 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
758	That's nice to know. Then I will change the type of `size` to `int64_t` as well.
8838	Hi Alexey, I still prefer the current scheme, because: I don't like recursive mapper calls, which goes back to my original scheme a little bit. I really think inlining can make a big difference when we have nested mappers. These compiler optimizations are the keys to have better performance for mappers. I don't think the codegen here is inefficient. Yes there is duplicated code across different mapper functions, but why that will lead to poor performance? Although we have 2 runtime functions now, the `__tgt_mapper_num_components` is called only once per mapper. It should have very negligible performance impact. But if you have a different option, we can discuss it next time in the meeting. I do have a time constraint to work on the mapper implementation. I'll no longer work in this project starting this September, and I have about 30% of my time working on it until then.

ABataev added inline comments.Jun 26 2019, 12:56 PM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Lingda, We have recursive (actually, not recursive, because you cannot use types recursively) mappers calls anyway, it is nature of struсtures/classes. We have a lot of similar code. And I'm not sure that it can be optimized out. Yes, but it means that we have n extra runtime calls, where n is the number of branches in the structure/class tree. I see :(. I understand your concern. In this case, we could try to discuss it offline, in the mailing list, to make it a little bit faster. We just need to hear other opinions on this matter, maybe there are some other pros and cons for these schemes.

lildmh marked an inline comment as done.Jun 26 2019, 1:11 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Hi Alexey, Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing list later. We have recursive (actually, not recursive, because you cannot use types recursively) mappers calls anyway, it is nature of struсtures/classes. We won't have recursive calls with inlining. We have a lot of similar code. And I'm not sure that it can be optimized out. I think it's even harder to optimized these code out when we move them into the runtime. Yes, but it means that we have n extra runtime calls, where n is the number of branches in the structure/class tree. I don't quite understand. It's still equal to the number of mappers in any case.

ABataev added inline comments.Jun 26 2019, 1:20 PM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing list later. Good, thanks! We won't have recursive calls with inlining. We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure that inlining is the best option here. We have a lot of code for each mapper and I'm not sure that the optimizer will be able to squash it effectively. I think it's even harder to optimized these code out when we move them into the runtime. Definitely not, unless we use LTO or inlined runtime.

lildmh marked an inline comment as done.Jun 26 2019, 1:26 PM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure that inlining is the best option here. We have a lot of code for each mapper and I'm not sure that the optimizer will be able to squash it effectively. Sorry I should not say recursive calls. Here it needs to "recursively" call other mapper functions in case of nested mappers, but we don't need it in case of inlining. Definitely not, unless we use LTO or inlined runtime. But you are proposing to move many code to the runtime here, right? That doesn't make sense to me.

ABataev added inline comments.Jun 26 2019, 1:34 PM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	But you are proposing to move much code to the runtime here, right? That doesn't make sense to me. I'm just not sure that there going be significant problems with the performance because of that. And it significantly simplifies codegen in the compiler and moves the common part into a single function. Plus, if in future we'll need to modify this functionality for some reason, 2 different versions of the compiler will produce incompatible code. With my scheme, you still can use old runtime and have the same functionality as the old compiler and the new one.

lildmh marked an inline comment as done.Jun 27 2019, 9:48 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Hi Alexey, I think more carefully about your scheme, and I don't think we can solve the 2 problems below with this scheme: In the example you gave before, the compiler needs to generate all map types and pass them to `__tgt_mapper` through `sub_components`. But in this case, the compiler won't be able to generate the correct `MEMBER_OF` field in map type. As a result, the runtime has to fix it using the mechanism we already have here: `__tgt_mapper_num_components`. This not only increases complexity, but also, it means the runtime needs further manipulation of the map type, which creates locality issues. While in the current scheme, the map type is generated by compiler once, so the data locality will be very good in this case. `sub_components` includes all components that should be mapped. If we are mapping an array, this means we need to map many components, which will need to allocate memory for `sub_components` in the heap. This creates further memory management burden and is not an efficient way to use memory. Based on these reasons, I think the current scheme is still more preferable.

ABataev added inline comments.Jun 27 2019, 9:54 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Hi Lingda, Actually, I thought that the runtime function `__tgt_mapper` will do this, not the compiler. Why do we need to allocate it on the heap? We can allocate it on the stack.

lildmh marked an inline comment as done.Jun 27 2019, 10:04 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	In your scheme, both compiler and `__tgt_mapper` need to do this: the compiler will generate other parts in the type, e.g., `TO` `FROM` bits and basic `MEMBER_OF` bits. Then `__tgt_mapper` needs to modify the `MEMBER_OF` bits later. Since there are a lot of other memory accesses between the compiler and `__tgt_mapper` operations to the same map type, it's very likely the map type will not stay in the cache, which causes locality problem. Assume we are mapping an array with 1000000 elements, and each elements have 5 components. For each component, we need `base, begin_ptr, size, type, mapper`, which are 40 bytes. Together, we will need 1000000 * 5 * 40 = 200MB of space for this array, which stack cannnot handle.

ABataev added inline comments.Jun 27 2019, 10:09 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	I don't think it is a big problem, this part of the code is executed on the CPU and I don't think it will lead to significant overhead. When we map an array, we do not map it element-by-element, so we don't need 10000 records. Moreover, we try to merge contiguous parts into single one, reducing the total number of elements.

lildmh marked an inline comment as done.Jun 27 2019, 10:20 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	I think it is a problem. Beside, doing so is not elegant: why having a single thing (setting the map type) done in 2 places while we can do it in one place? We need to map them element by element because it is not always possible to merge contiguous parts together (there may be no contiguous parts based on the mapper). And merging parts together will be a complex procedure: I don't think it can be done in the runtime because many code is moved into the runtime now. In contrast, the compiler will have better opportunities to merge things. Besides, I don't think there is a valid reason that the current scheme is not good. You mentioned it's complex codegen. But it only has less 200 loc here, I don't see why it is complex.

ABataev added inline comments.Jun 27 2019, 10:26 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	I rather doubt that we will need to map a record with 100000 fields element-by-element. I think it would be better to share it with others and listen to their opinions. It is better to spend some extra time to provide good design. You can include your doubts in the description of the new scheme, of course.

lildmh marked an inline comment as done.Jun 27 2019, 10:33 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	The implementation should work for any case anyway. Besides, I think mapping a large array should be actually a common case. The users don't want to map 1000000 elements by themselves, so they will want to use mapper to let the system do it automatically. Sure, I can release the discussion to the mailing list. I don't see a reason to use the new scheme now.

ABataev added inline comments.Jun 27 2019, 10:58 AM

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Lingda, I meant to send the message to the OpenMP telecon list :) Could you forward the same e-mail to Ravi and others, please?

lildmh marked an inline comment as done.Jun 27 2019, 11:03 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
8838	Sure, probably no one in the public mailing list will care about it. I'll send it to the bi-weekly meeting list.

Change the type of size from size_t to int64_t, and rebase

In D59474#1561161, @lildmh wrote:

Change the type of size from size_t to int64_t, and rebase

Lingda, could you rebase it again? Thanks.

In D59474#1589464, @ABataev wrote:

In D59474#1561161, @lildmh wrote:

Change the type of size from size_t to int64_t, and rebase

Lingda, could you rebase it again? Thanks.

Sure, I'll do it next week, since I'm on vacation and don't have access to my desktop.

Rebase

Herald added a reviewer: jdoerfert. · View Herald TranscriptJul 23 2019, 5:15 AM

ABataev added inline comments.Jul 23 2019, 9:35 AM

lib/CodeGen/CGDecl.cpp
2534	Why do we need to emit it for simd only mode?
lib/CodeGen/CGOpenMPRuntime.cpp
1695	You're looking for `CGF.CurFn` twice here, used `find` member function instead and work with iterator.
7125–7133	Maybe it is better to define a constant `constexpr uint64_t OMP_MEMBER_OF_RANK = 48` and then deduce `OMP_MAP_MEMBER_OF` as `~((1<<OMP_MEMBER_OF_RANK) - 1)`?
8122	AFAIK, LLVM has dropped support for msvc 2013, do we still need this?
lib/CodeGen/CGOpenMPRuntime.h
352	Use `using` instead of `typedef`.

lildmh updated this revision to Diff 211333.Jul 23 2019, 10:43 AM

lildmh marked 4 inline comments as done.

lildmh added inline comments.

lib/CodeGen/CGDecl.cpp
2534	This code is not emitting mapper for simd only mode.
lib/CodeGen/CGOpenMPRuntime.cpp
7125–7133	In libomptarget, the same way is used to define `OMP_TGT_MAPTYPE_MEMBER_OF`: `OMP_TGT_MAPTYPE_MEMBER_OF = 0xffff000000000000`. So I think they should stay the same. Btw, the number 48 is directly used in libomptarget now, which may need to change in the future. In your code, it assumes bits higher than 48 are all `OMP_MAP_MEMBER_OF`, which may not be true in the future. My code here is more universal, although it does not look great. What do you think?

ABataev added inline comments.Jul 23 2019, 11:11 AM

lib/CodeGen/CGDecl.cpp
2534	Ah, yes, it is the condition for the early exit.
lib/CodeGen/CGOpenMPRuntime.cpp
7125–7133	You can apply a mask to drop some of the most significant bits if required. My code looks much cleaner it would be good to have the same code in libomptarget too. But it is up to you what to do here.
8122	Ping.
lib/CodeGen/CGOpenMPRuntime.h
818–823	I don't think we need virtual functions here, non-virtual are good enough.
2111	I think you can drop this function here if the original function is not virtual

lildmh marked 3 inline comments as done.Jul 23 2019, 11:37 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.cpp
7125–7133	Maybe let's keep it this way in this patch, then we modify both places in a further patch?
8122	I'm okay either way. I guess it doesn't hurt to keep it?
lib/CodeGen/CGOpenMPRuntime.h
2111	The function for simd only mode includes a `llvm_unreachable`, so I think it's still needed as a virtual function above

ABataev added inline comments.Jul 23 2019, 11:42 AM

lib/CodeGen/CGOpenMPRuntime.h
2111	It is better to reduce number of virtual functions, if possible.

Get rid of MSVC requirement of this, and a virtual function

ABataev added inline comments.Jul 24 2019, 12:57 PM

lib/CodeGen/CGOpenMPRuntime.h
2111	What about virtual function?

lildmh marked an inline comment as done.Jul 25 2019, 3:45 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.h
2111	Since `emitUserDefinedMapper` here overrides the same function in `CGOpenMPRuntime`, I think `emitUserDefinedMapper` of `CGOpenMPRuntime` needs to be defined as `virtual`?

lildmh marked an inline comment as done.Jul 25 2019, 3:47 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.h
2111	If you are asking about what virtual function I removed, it is `emitUDMapperArrayInitOrDel` of `CGOpenMPRuntime` which is never overrided.

ABataev added inline comments.Jul 25 2019, 6:59 AM

lib/CodeGen/CGOpenMPRuntime.h
2111	I would suggest to not make this virtual too and remove overriden version. It does not help.

lildmh marked an inline comment as done.Jul 25 2019, 8:48 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.h
2111	If we remove virtual from `emitUserDefinedMapper`, we will have to define it in both `CGOpenMPRuntime` and `CGOpenMPSIMDRuntime`, and its definition is identical in both places. I don't think that's good software engineering practice?

ABataev added inline comments.Jul 25 2019, 8:52 AM

lib/CodeGen/CGOpenMPRuntime.h
2111	WE just don't need an overridden version in `CGOpenMPSIMDRuntime`. Just remove virtual and remove the overridden function from `CGOpenMPSIMDRuntime`. You have the check for `OpenMPSimd` mode that prevents the emission of mappers in simd-only mode.

Remove virtual from function declaration

ABataev added inline comments.Jul 26 2019, 10:33 AM

lib/CodeGen/CGOpenMPRuntime.h
823	Seems to me, this function is used only in `emitUserDefinedMapper`. I think you can make it static local in the CGOpenMPRuntime.cpp and do not expose it in the interface.

lildmh marked an inline comment as done.Jul 26 2019, 11:01 AM

lildmh added inline comments.

lib/CodeGen/CGOpenMPRuntime.h
823	`emitUserDefinedMapper` needs to call `createRuntimeFunction` of `CGOpenMPRuntime`, which is private. Which one do you think is better, make `createRuntimeFunction` public, or have `emitUserDefinedMapper` not defined in `CGOpenMPRuntime`? It seems to me that they are similar

ABataev added inline comments.Jul 26 2019, 11:37 AM

lib/CodeGen/CGOpenMPRuntime.h
823	Then make `emitUDMapperArrayInitOrDel` private instead.

Make emitUDMapperArrayInitOrDel private

Looks good in general, but commit the runtime part at first.

test/OpenMP/declare_mapper_codegen.cpp
44–48	I would not rely on the predetermined indices here, better to use some kind of patterns here just like in other places.
289–293	Same here

This revision is now accepted and ready to land.Jul 26 2019, 1:45 PM

Thanks Alexey! Could you look into the runtime patch D60972 then?

test/OpenMP/declare_mapper_codegen.cpp
44–48	Could you give an example about what you suggest? For instance, some other tests I should look into.

ABataev added inline comments.Jul 29 2019, 12:57 PM

test/OpenMP/declare_mapper_codegen.cpp
44–48	Just like in this test when you're using vars.

lildmh marked an inline comment as done.Jul 29 2019, 1:19 PM

lildmh added inline comments.

test/OpenMP/declare_mapper_codegen.cpp
44–48	Sorry I was not clear before. What do you mean by "predetermined indices" here? If you are referring to, for example, `%0` in `store i8* %0, i8** [[HANDLEADDR:%[^,]+]]`, I guess there is no way to get rid of `%0` because it means the first argument of the function?

ABataev added inline comments.Jul 29 2019, 1:21 PM

test/OpenMP/declare_mapper_codegen.cpp
44–48	Yes, I meant those `%0` like registers. Better to mark them as variables in function declaration and use those names in the checks.

lildmh marked an inline comment as done.Jul 29 2019, 1:44 PM

lildmh added inline comments.

test/OpenMP/declare_mapper_codegen.cpp
44–48	Now it's like `define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8, i8, i8, i64, i64)`, I think you are suggesting something like `define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8 [[HANDLE:%[^,]+]], i8* [[BPTR:%[^,]+]], ...)`, and later I can use `store i8* [[HANDLE]], i8** [[HANDLEADDR:%[^,]+]]` I'm not sure how to add names for function arguments. They seems to be always nameless like `(i8, i8, i8*, i64, i64)`. Is there a way to do that?

ABataev added inline comments.Jul 29 2019, 2:00 PM

test/OpenMP/declare_mapper_codegen.cpp
44–48	If the clang parameters have names, the llvm params also will get the names. But it is not worth it to add the names to the function. Could just use regexp here to avoid using LLVM register names? Just `%{{·+}}`. And rely on the order, i.e. remove `-DAG` checks?

Change mapper function argument checking

Thanks, looks good.

Closed by commit rL367773: [OpenMP 5.0] Codegen support for user-defined mappers. (authored by Meinersbur). · Explain WhyAug 3 2019, 9:18 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2019, 9:18 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Fix declare mapper codegen test when the function argument has name attached.

Revision Contents

Path

Size

include/

clang/

AST/

GlobalDecl.h

1 line

lib/

AST/

ASTContext.cpp

2 lines

CodeGen/

7 lines

20 lines

499 lines

3 lines

test/

OpenMP/

declare_mapper_codegen.cpp

442 lines

Diff 213339

include/clang/AST/GlobalDecl.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)		GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)
: MultiVersionIndex(MVIndex) {		: MultiVersionIndex(MVIndex) {
Init(D);		Init(D);
}		}
GlobalDecl(const BlockDecl *D) { Init(D); }		GlobalDecl(const BlockDecl *D) { Init(D); }
GlobalDecl(const CapturedDecl *D) { Init(D); }		GlobalDecl(const CapturedDecl *D) { Init(D); }
GlobalDecl(const ObjCMethodDecl *D) { Init(D); }		GlobalDecl(const ObjCMethodDecl *D) { Init(D); }
GlobalDecl(const OMPDeclareReductionDecl *D) { Init(D); }		GlobalDecl(const OMPDeclareReductionDecl *D) { Init(D); }
		GlobalDecl(const OMPDeclareMapperDecl *D) { Init(D); }
GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type) : Value(D, Type) {}		GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type) : Value(D, Type) {}
GlobalDecl(const CXXDestructorDecl *D, CXXDtorType Type) : Value(D, Type) {}		GlobalDecl(const CXXDestructorDecl *D, CXXDtorType Type) : Value(D, Type) {}
GlobalDecl(const VarDecl *D, DynamicInitKind StubKind)		GlobalDecl(const VarDecl *D, DynamicInitKind StubKind)
: Value(D, unsigned(StubKind)) {}		: Value(D, unsigned(StubKind)) {}

GlobalDecl getCanonicalDecl() const {		GlobalDecl getCanonicalDecl() const {
GlobalDecl CanonGD;		GlobalDecl CanonGD;
CanonGD.Value.setPointer(Value.getPointer()->getCanonicalDecl());		CanonGD.Value.setPointer(Value.getPointer()->getCanonicalDecl());
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

lib/AST/ASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,854 Lines • ▼ Show 20 Lines	bool ASTContext::DeclMustBeEmitted(const Decl *D) {
} else if (isa<PragmaCommentDecl>(D))		} else if (isa<PragmaCommentDecl>(D))
return true;		return true;
else if (isa<PragmaDetectMismatchDecl>(D))		else if (isa<PragmaDetectMismatchDecl>(D))
return true;		return true;
else if (isa<OMPThreadPrivateDecl>(D))		else if (isa<OMPThreadPrivateDecl>(D))
return !D->getDeclContext()->isDependentContext();		return !D->getDeclContext()->isDependentContext();
else if (isa<OMPAllocateDecl>(D))		else if (isa<OMPAllocateDecl>(D))
return !D->getDeclContext()->isDependentContext();		return !D->getDeclContext()->isDependentContext();
else if (isa<OMPDeclareReductionDecl>(D))		else if (isa<OMPDeclareReductionDecl>(D) \|\| isa<OMPDeclareMapperDecl>(D))
return !D->getDeclContext()->isDependentContext();		return !D->getDeclContext()->isDependentContext();
else if (isa<ImportDecl>(D))		else if (isa<ImportDecl>(D))
return true;		return true;
else		else
return false;		return false;

if (D->isFromASTFile() && !LangOpts.BuildingPCHWithObjectFile) {		if (D->isFromASTFile() && !LangOpts.BuildingPCHWithObjectFile) {
assert(getExternalSource() && "It's from an AST file; must have a source.");		assert(getExternalSource() && "It's from an AST file; must have a source.");
▲ Show 20 Lines • Show All 775 Lines • Show Last 20 Lines

lib/CodeGen/CGDecl.cpp

	Show First 20 Lines • Show All 2,524 Lines • ▼ Show 20 Lines
	void CodeGenModule::EmitOMPDeclareReduction(const OMPDeclareReductionDecl *D,			void CodeGenModule::EmitOMPDeclareReduction(const OMPDeclareReductionDecl *D,
	CodeGenFunction *CGF) {			CodeGenFunction *CGF) {
	if (!LangOpts.OpenMP \|\| (!LangOpts.EmitAllDecls && !D->isUsed()))			if (!LangOpts.OpenMP \|\| (!LangOpts.EmitAllDecls && !D->isUsed()))
	return;			return;
	getOpenMPRuntime().emitUserDefinedReduction(CGF, D);			getOpenMPRuntime().emitUserDefinedReduction(CGF, D);
	}			}

	void CodeGenModule::EmitOMPDeclareMapper(const OMPDeclareMapperDecl *D,			void CodeGenModule::EmitOMPDeclareMapper(const OMPDeclareMapperDecl *D,
	CodeGenFunction *CGF) {			CodeGenFunction *CGF) {
	if (!LangOpts.OpenMP \|\| (!LangOpts.EmitAllDecls && !D->isUsed()))			if (!LangOpts.OpenMP \|\| LangOpts.OpenMPSimd \|\|
				ABataevUnsubmitted Not Done Reply Inline Actions Why do we need to emit it for simd only mode? ABataev: Why do we need to emit it for simd only mode?
				lildmhAuthorUnsubmitted Done Reply Inline Actions This code is not emitting mapper for simd only mode. lildmh: This code is not emitting mapper for simd only mode.
				ABataevUnsubmitted Not Done Reply Inline Actions Ah, yes, it is the condition for the early exit. ABataev: Ah, yes, it is the condition for the early exit.
				(!LangOpts.EmitAllDecls && !D->isUsed()))
	return;			return;
	// FIXME: need to implement mapper code generation			getOpenMPRuntime().emitUserDefinedMapper(D, CGF);
	}			}

	void CodeGenModule::EmitOMPRequiresDecl(const OMPRequiresDecl *D) {			void CodeGenModule::EmitOMPRequiresDecl(const OMPRequiresDecl *D) {
	getOpenMPRuntime().checkArchForUnifiedAddressing(D);			getOpenMPRuntime().checkArchForUnifiedAddressing(D);
	}			}

lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	typedef llvm::DenseMap<const OMPDeclareReductionDecl *,
std::pair<llvm::Function , llvm::Function >>		std::pair<llvm::Function , llvm::Function >>
UDRMapTy;		UDRMapTy;
UDRMapTy UDRMap;		UDRMapTy UDRMap;
/// Map of functions and locally defined UDRs.		/// Map of functions and locally defined UDRs.
typedef llvm::DenseMap<llvm::Function *,		typedef llvm::DenseMap<llvm::Function *,
SmallVector<const OMPDeclareReductionDecl *, 4>>		SmallVector<const OMPDeclareReductionDecl *, 4>>
FunctionUDRMapTy;		FunctionUDRMapTy;
FunctionUDRMapTy FunctionUDRMap;		FunctionUDRMapTy FunctionUDRMap;
		/// Map from the user-defined mapper declaration to its corresponding
		/// functions.
		llvm::DenseMap<const OMPDeclareMapperDecl , llvm::Function > UDMMap;
		/// Map of functions and their local user-defined mappers.
		ABataevUnsubmitted Not Done Reply Inline Actions You should be very careful with this map. If the mapper is declared in the function context, it must be removed from this map as soon as the function processing is completed. All local declarations are removed after this and their address might be used again. ABataev: You should be very careful with this map. If the mapper is declared in the function context, it…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Yes, I plan to have this part in the next patch, which will implement to look up the corresponding mapper function for map. to, from clauses lildmh: Yes, I plan to have this part in the next patch, which will implement to look up the…
		ABataevUnsubmitted Not Done Reply Inline Actions Noope, this must be the part of this patch, because it may cause the crash of the compiler during testing. ABataev: Noope, this must be the part of this patch, because it may cause the crash of the compiler…
		lildmhAuthorUnsubmitted Done Reply Inline Actions It will not crash the compiler, because this UDMap is only written in this patch, never read. lildmh: It will not crash the compiler, because this UDMap is only written in this patch, never read.
		ABataevUnsubmitted Not Done Reply Inline Actions Still, you should clear it in this patch. Otherwise, you're breaking data-dependency between the patches and this is not good at all. ABataev: Still, you should clear it in this patch. Otherwise, you're breaking data-dependency between…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Sure, if you think that's absolutely necessary, I can add it to this patch. lildmh: Sure, if you think that's absolutely necessary, I can add it to this patch.
		ABataevUnsubmitted Not Done Reply Inline Actions It is necessary, so, please, add it, thanks! ABataev: It is necessary, so, please, add it, thanks!
		using FunctionUDMMapTy =
		ABataevUnsubmitted Done Reply Inline Actions Use `using` instead of `typedef`. ABataev: Use `using` instead of `typedef`.
		llvm::DenseMap<llvm::Function *,
		SmallVector<const OMPDeclareMapperDecl *, 4>>;
		FunctionUDMMapTy FunctionUDMMap;
/// Type kmp_critical_name, originally defined as typedef kmp_int32		/// Type kmp_critical_name, originally defined as typedef kmp_int32
/// kmp_critical_name[8];		/// kmp_critical_name[8];
llvm::ArrayType *KmpCriticalNameTy;		llvm::ArrayType *KmpCriticalNameTy;
/// An ordered map of auto-generated variables to their unique names.		/// An ordered map of auto-generated variables to their unique names.
/// It stores variables with the following names: 1) ".gomp_critical_user_" +		/// It stores variables with the following names: 1) ".gomp_critical_user_" +
/// <critical_section_name> + ".var" for "omp critical" directives; 2)		/// <critical_section_name> + ".var" for "omp critical" directives; 2)
/// <mangled_name_for_global_var> + ".cache." for cache for threadprivate		/// <mangled_name_for_global_var> + ".cache." for cache for threadprivate
/// variables.		/// variables.
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	private:
/// \param Ctor Pointer to a global init function for \a VD.		/// \param Ctor Pointer to a global init function for \a VD.
/// \param CopyCtor Pointer to a global copy function for \a VD.		/// \param CopyCtor Pointer to a global copy function for \a VD.
/// \param Dtor Pointer to a global destructor function for \a VD.		/// \param Dtor Pointer to a global destructor function for \a VD.
/// \param Loc Location of threadprivate declaration.		/// \param Loc Location of threadprivate declaration.
void emitThreadPrivateVarInit(CodeGenFunction &CGF, Address VDAddr,		void emitThreadPrivateVarInit(CodeGenFunction &CGF, Address VDAddr,
llvm::Value Ctor, llvm::Value CopyCtor,		llvm::Value Ctor, llvm::Value CopyCtor,
llvm::Value *Dtor, SourceLocation Loc);		llvm::Value *Dtor, SourceLocation Loc);

		/// Emit the array initialization or deletion portion for user-defined mapper
		/// code generation.
		void emitUDMapperArrayInitOrDel(CodeGenFunction &MapperCGF,
		llvm::Value Handle, llvm::Value BasePtr,
		llvm::Value Ptr, llvm::Value Size,
		llvm::Value *MapType, CharUnits ElementSize,
		llvm::BasicBlock *ExitBB, bool IsInit);

struct TaskResultTy {		struct TaskResultTy {
llvm::Value *NewTask = nullptr;		llvm::Value *NewTask = nullptr;
llvm::Function *TaskEntry = nullptr;		llvm::Function *TaskEntry = nullptr;
llvm::Value *NewTaskNewTaskTTy = nullptr;		llvm::Value *NewTaskNewTaskTTy = nullptr;
LValue TDBase;		LValue TDBase;
const RecordDecl *KmpTaskTQTyRD = nullptr;		const RecordDecl *KmpTaskTQTyRD = nullptr;
llvm::Value *TaskDupFn = nullptr;		llvm::Value *TaskDupFn = nullptr;
};		};
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	public:

/// Emit code for the specified user defined reduction construct.		/// Emit code for the specified user defined reduction construct.
virtual void emitUserDefinedReduction(CodeGenFunction *CGF,		virtual void emitUserDefinedReduction(CodeGenFunction *CGF,
const OMPDeclareReductionDecl *D);		const OMPDeclareReductionDecl *D);
/// Get combiner/initializer for the specified user-defined reduction, if any.		/// Get combiner/initializer for the specified user-defined reduction, if any.
virtual std::pair<llvm::Function , llvm::Function >		virtual std::pair<llvm::Function , llvm::Function >
getUserDefinedReduction(const OMPDeclareReductionDecl *D);		getUserDefinedReduction(const OMPDeclareReductionDecl *D);

		/// Emit the function for the user defined mapper construct.
		void emitUserDefinedMapper(const OMPDeclareMapperDecl *D,
		CodeGenFunction *CGF = nullptr);

/// Emits outlined function for the specified OpenMP parallel directive		/// Emits outlined function for the specified OpenMP parallel directive
		ABataevUnsubmitted Done Reply Inline Actions I don't remember precisely, but probably `\a`->`\p` ABataev: I don't remember precisely, but probably `\a`->`\p`
/// \a D. This outlined function has type void()(kmp_int32 ThreadID,		/// \a D. This outlined function has type void()(kmp_int32 ThreadID,
/// kmp_int32 BoundID, struct context_vars*).		/// kmp_int32 BoundID, struct context_vars*).
		ABataevUnsubmitted Not Done Reply Inline Actions I don't think we need virtual functions here, non-virtual are good enough. ABataev: I don't think we need virtual functions here, non-virtual are good enough.
		ABataevUnsubmitted Not Done Reply Inline Actions Seems to me, this function is used only in `emitUserDefinedMapper`. I think you can make it static local in the CGOpenMPRuntime.cpp and do not expose it in the interface. ABataev: Seems to me, this function is used only in `emitUserDefinedMapper`. I think you can make it…
		lildmhAuthorUnsubmitted Done Reply Inline Actions `emitUserDefinedMapper` needs to call `createRuntimeFunction` of `CGOpenMPRuntime`, which is private. Which one do you think is better, make `createRuntimeFunction` public, or have `emitUserDefinedMapper` not defined in `CGOpenMPRuntime`? It seems to me that they are similar lildmh: `emitUserDefinedMapper` needs to call `createRuntimeFunction` of `CGOpenMPRuntime`, which is…
		ABataevUnsubmitted Not Done Reply Inline Actions Then make `emitUDMapperArrayInitOrDel` private instead. ABataev: Then make `emitUDMapperArrayInitOrDel` private instead.
/// \param D OpenMP directive.		/// \param D OpenMP directive.
/// \param ThreadIDVar Variable for thread id in the current OpenMP region.		/// \param ThreadIDVar Variable for thread id in the current OpenMP region.
		ABataevUnsubmitted Done Reply Inline Actions Use `///` style of comment ABataev: Use `///` style of comment
/// \param InnermostKind Kind of innermost directive (for simple directives it		/// \param InnermostKind Kind of innermost directive (for simple directives it
/// is a directive itself, for combined - its innermost directive).		/// is a directive itself, for combined - its innermost directive).
/// \param CodeGen Code generation sequence for the \a D directive.		/// \param CodeGen Code generation sequence for the \a D directive.
virtual llvm::Function *emitParallelOutlinedFunction(		virtual llvm::Function *emitParallelOutlinedFunction(
const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,		const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen);		OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen);

/// Emits outlined function for the specified OpenMP teams directive		/// Emits outlined function for the specified OpenMP teams directive
▲ Show 20 Lines • Show All 1,269 Lines • ▼ Show 20 Lines	public:
void emitTargetOutlinedFunction(const OMPExecutableDirective &D,		void emitTargetOutlinedFunction(const OMPExecutableDirective &D,
StringRef ParentName,		StringRef ParentName,
llvm::Function *&OutlinedFn,		llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID,		llvm::Constant *&OutlinedFnID,
bool IsOffloadEntry,		bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen) override;		const RegionCodeGenTy &CodeGen) override;

/// Emit the target offloading code associated with \a D. The emitted		/// Emit the target offloading code associated with \a D. The emitted
/// code attempts offloading the execution to the device, an the event of		/// code attempts offloading the execution to the device, an the event of
		ABataevUnsubmitted Not Done Reply Inline Actions I think you can drop this function here if the original function is not virtual ABataev: I think you can drop this function here if the original function is not virtual
		lildmhAuthorUnsubmitted Done Reply Inline Actions The function for simd only mode includes a `llvm_unreachable`, so I think it's still needed as a virtual function above lildmh: The function for simd only mode includes a `llvm_unreachable`, so I think it's still needed as…
		ABataevUnsubmitted Not Done Reply Inline Actions It is better to reduce number of virtual functions, if possible. ABataev: It is better to reduce number of virtual functions, if possible.
		ABataevUnsubmitted Not Done Reply Inline Actions What about virtual function? ABataev: What about virtual function?
		lildmhAuthorUnsubmitted Done Reply Inline Actions Since `emitUserDefinedMapper` here overrides the same function in `CGOpenMPRuntime`, I think `emitUserDefinedMapper` of `CGOpenMPRuntime` needs to be defined as `virtual`? lildmh: Since `emitUserDefinedMapper` here overrides the same function in `CGOpenMPRuntime`, I think…
		lildmhAuthorUnsubmitted Done Reply Inline Actions If you are asking about what virtual function I removed, it is `emitUDMapperArrayInitOrDel` of `CGOpenMPRuntime` which is never overrided. lildmh: If you are asking about what virtual function I removed, it is `emitUDMapperArrayInitOrDel` of…
		ABataevUnsubmitted Not Done Reply Inline Actions I would suggest to not make this virtual too and remove overriden version. It does not help. ABataev: I would suggest to not make this virtual too and remove overriden version. It does not help.
		lildmhAuthorUnsubmitted Done Reply Inline Actions If we remove virtual from `emitUserDefinedMapper`, we will have to define it in both `CGOpenMPRuntime` and `CGOpenMPSIMDRuntime`, and its definition is identical in both places. I don't think that's good software engineering practice? lildmh: If we remove virtual from `emitUserDefinedMapper`, we will have to define it in both…
		ABataevUnsubmitted Not Done Reply Inline Actions WE just don't need an overridden version in `CGOpenMPSIMDRuntime`. Just remove virtual and remove the overridden function from `CGOpenMPSIMDRuntime`. You have the check for `OpenMPSimd` mode that prevents the emission of mappers in simd-only mode. ABataev: WE just don't need an overridden version in `CGOpenMPSIMDRuntime`. Just remove virtual and…
/// a failure it executes the host version outlined in \a OutlinedFn.		/// a failure it executes the host version outlined in \a OutlinedFn.
/// \param D Directive to emit.		/// \param D Directive to emit.
/// \param OutlinedFn Host version of the code to be offloaded.		/// \param OutlinedFn Host version of the code to be offloaded.
/// \param OutlinedFnID ID of host version of the code to be offloaded.		/// \param OutlinedFnID ID of host version of the code to be offloaded.
/// \param IfCond Expression evaluated in if clause associated with the target		/// \param IfCond Expression evaluated in if clause associated with the target
/// directive, or null if no if clause is used.		/// directive, or null if no if clause is used.
/// \param Device Expression evaluated in device clause associated with the		/// \param Device Expression evaluated in device clause associated with the
/// target directive, or null if no device clause is used.		/// target directive, or null if no device clause is used.
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 746 Lines • ▼ Show 20 Lines	enum OpenMPRTLFunction {
OMPRTL__tgt_target_data_end_nowait,		OMPRTL__tgt_target_data_end_nowait,
// Call to void __tgt_target_data_update(int64_t device_id, int32_t arg_num,		// Call to void __tgt_target_data_update(int64_t device_id, int32_t arg_num,
// void args_base, void args, int64_t arg_sizes, int64_t arg_types);		// void args_base, void args, int64_t arg_sizes, int64_t arg_types);
OMPRTL__tgt_target_data_update,		OMPRTL__tgt_target_data_update,
// Call to void __tgt_target_data_update_nowait(int64_t device_id, int32_t		// Call to void __tgt_target_data_update_nowait(int64_t device_id, int32_t
// arg_num, void args_base, void args, int64_t *arg_sizes, int64_t		// arg_num, void args_base, void args, int64_t *arg_sizes, int64_t
// *arg_types);		// *arg_types);
OMPRTL__tgt_target_data_update_nowait,		OMPRTL__tgt_target_data_update_nowait,
		// Call to int64_t __tgt_mapper_num_components(void *rt_mapper_handle);
		ABataevUnsubmitted Not Done Reply Inline Actions Do we really need to use `int64_t` for number of elements? `size_t` must be enough. ABataev: Do we really need to use `int64_t` for number of elements? `size_t` must be enough.
		lildmhAuthorUnsubmitted Done Reply Inline Actions Because we use the return value to shift the memberof filed of map type, which is `int64_t`, so I think `int64_t` makes sense here. lildmh: Because we use the return value to shift the memberof filed of map type, which is `int64_t`, so…
		ABataevUnsubmitted Not Done Reply Inline Actions Ok, there is a discrepancy between runtime part and compiler part: `__tgt_push_mapper_component` uses `size_t` for size, while the runtime function uses `int64_t`. It won't work for 32bit platform. ABataev: Ok, there is a discrepancy between runtime part and compiler part…
		lildmhAuthorUnsubmitted Done Reply Inline Actions This should work on 32bit machines, since 32bit machines also use `int64_t` for the map type? lildmh: This should work on 32bit machines, since 32bit machines also use `int64_t` for the map type?
		OMPRTL__tgt_mapper_num_components,
		// Call to void __tgt_push_mapper_component(void *rt_mapper_handle, void
		// base, void begin, int64_t size, int64_t type);
		ABataevUnsubmitted Not Done Reply Inline Actions Check the declaration of the runtime function in the runtime patch and here. `size` parameter here has type `size_t`, while in runtime part it is `int64_t` ABataev: Check the declaration of the runtime function in the runtime patch and here. `size` parameter…
		lildmhAuthorUnsubmitted Done Reply Inline Actions The runtime part uses `int64_t` because I see every other runtime function use `int64_t` instead of `size_t` for `size`, e.g., `__tgt_target, __tgt_target_teams`, etc., despite that they are declared to be `size_t` in Clang codegen. So I guess it is done on purpose? Otherwise we need to modify all these runtime function interface in the future. lildmh: The runtime part uses `int64_t` because I see every other runtime function use `int64_t`…
		ABataevUnsubmitted Not Done Reply Inline Actions I recently committed the patch that fixes this problem in clang. If you're using `int64_t` in the runtime, the same type must be used in clang codegen. ABataev: I recently committed the patch that fixes this problem in clang. If you're using `int64_t` in…
		lildmhAuthorUnsubmitted Done Reply Inline Actions That's nice to know. Then I will change the type of `size` to `int64_t` as well. lildmh: That's nice to know. Then I will change the type of `size` to `int64_t` as well.
		OMPRTL__tgt_push_mapper_component,
};		};

/// A basic class for pre\|post-action for advanced codegen sequence for OpenMP		/// A basic class for pre\|post-action for advanced codegen sequence for OpenMP
/// region.		/// region.
class CleanupTy final : public EHScopeStack::Cleanup {		class CleanupTy final : public EHScopeStack::Cleanup {
PrePostActionTy *Action;		PrePostActionTy *Action;

public:		public:
▲ Show 20 Lines • Show All 918 Lines • ▼ Show 20 Lines	if (OpenMPLocThreadIDMap.count(CGF.CurFn)) {
clearLocThreadIdInsertPt(CGF);		clearLocThreadIdInsertPt(CGF);
OpenMPLocThreadIDMap.erase(CGF.CurFn);		OpenMPLocThreadIDMap.erase(CGF.CurFn);
}		}
if (FunctionUDRMap.count(CGF.CurFn) > 0) {		if (FunctionUDRMap.count(CGF.CurFn) > 0) {
for(auto *D : FunctionUDRMap[CGF.CurFn])		for(auto *D : FunctionUDRMap[CGF.CurFn])
UDRMap.erase(D);		UDRMap.erase(D);
FunctionUDRMap.erase(CGF.CurFn);		FunctionUDRMap.erase(CGF.CurFn);
}		}
		auto I = FunctionUDMMap.find(CGF.CurFn);
		if (I != FunctionUDMMap.end()) {
		ABataevUnsubmitted Done Reply Inline Actions You're looking for `CGF.CurFn` twice here, used `find` member function instead and work with iterator. ABataev: You're looking for `CGF.CurFn` twice here, used `find` member function instead and work with…
		for(auto *D : I->second)
		UDMMap.erase(D);
		FunctionUDMMap.erase(I);
		}
}		}

llvm::Type *CGOpenMPRuntime::getIdentTyPointerTy() {		llvm::Type *CGOpenMPRuntime::getIdentTyPointerTy() {
return IdentTy->getPointerTo();		return IdentTy->getPointerTo();
}		}

llvm::Type *CGOpenMPRuntime::getKmpc_MicroPointerTy() {		llvm::Type *CGOpenMPRuntime::getKmpc_MicroPointerTy() {
if (!Kmpc_MicroTy) {		if (!Kmpc_MicroTy) {
▲ Show 20 Lines • Show All 757 Lines • ▼ Show 20 Lines	llvm::Type *TypeParams[] = {CGM.Int64Ty,
CGM.VoidPtrPtrTy,		CGM.VoidPtrPtrTy,
CGM.Int64Ty->getPointerTo(),		CGM.Int64Ty->getPointerTo(),
CGM.Int64Ty->getPointerTo()};		CGM.Int64Ty->getPointerTo()};
auto *FnTy =		auto *FnTy =
llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg=/false);		llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg=/false);
RTLFn = CGM.CreateRuntimeFunction(FnTy, "__tgt_target_data_update_nowait");		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__tgt_target_data_update_nowait");
break;		break;
}		}
		case OMPRTL__tgt_mapper_num_components: {
		ABataevUnsubmitted Not Done Reply Inline Actions You need to implement these mapper functions first. ABataev: You need to implement these mapper functions first.
		lildmhAuthorUnsubmitted Done Reply Inline Actions The runtime part is on hold for now. lildmh: The runtime part is on hold for now.
		// Build int64_t __tgt_mapper_num_components(void *rt_mapper_handle);
		llvm::Type *TypeParams[] = {CGM.VoidPtrTy};
		auto *FnTy =
		llvm::FunctionType::get(CGM.Int64Ty, TypeParams, /isVarArg/ false);
		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__tgt_mapper_num_components");
		break;
		}
		case OMPRTL__tgt_push_mapper_component: {
		// Build void __tgt_push_mapper_component(void *rt_mapper_handle, void
		// base, void begin, int64_t size, int64_t type);
		llvm::Type *TypeParams[] = {CGM.VoidPtrTy, CGM.VoidPtrTy, CGM.VoidPtrTy,
		CGM.Int64Ty, CGM.Int64Ty};
		auto *FnTy =
		llvm::FunctionType::get(CGM.VoidTy, TypeParams, /isVarArg/ false);
		RTLFn = CGM.CreateRuntimeFunction(FnTy, "__tgt_push_mapper_component");
		break;
		}
}		}
assert(RTLFn && "Unable to find OpenMP runtime function");		assert(RTLFn && "Unable to find OpenMP runtime function");
return RTLFn;		return RTLFn;
}		}

llvm::FunctionCallee		llvm::FunctionCallee
CGOpenMPRuntime::createForStaticInitFunction(unsigned IVSize, bool IVSigned) {		CGOpenMPRuntime::createForStaticInitFunction(unsigned IVSize, bool IVSigned) {
assert((IVSize == 32 \|\| IVSize == 64) &&		assert((IVSize == 32 \|\| IVSize == 64) &&
▲ Show 20 Lines • Show All 4,618 Lines • ▼ Show 20 Lines	enum OpenMPOffloadMappingFlags : uint64_t {
/// Implicit map		/// Implicit map
OMP_MAP_IMPLICIT = 0x200,		OMP_MAP_IMPLICIT = 0x200,
/// The 16 MSBs of the flags indicate whether the entry is member of some		/// The 16 MSBs of the flags indicate whether the entry is member of some
/// struct/class.		/// struct/class.
OMP_MAP_MEMBER_OF = 0xffff000000000000,		OMP_MAP_MEMBER_OF = 0xffff000000000000,
LLVM_MARK_AS_BITMASK_ENUM(/* LargestFlag = */ OMP_MAP_MEMBER_OF),		LLVM_MARK_AS_BITMASK_ENUM(/* LargestFlag = */ OMP_MAP_MEMBER_OF),
};		};

		/// Get the offset of the OMP_MAP_MEMBER_OF field.
		static unsigned getFlagMemberOffset() {
		unsigned Offset = 0;
		for (uint64_t Remain = OMP_MAP_MEMBER_OF; !(Remain & 1);
		Remain = Remain >> 1)
		Offset++;
		return Offset;
		}

		ABataevUnsubmitted Not Done Reply Inline Actions Maybe it is better to define a constant `constexpr uint64_t OMP_MEMBER_OF_RANK = 48` and then deduce `OMP_MAP_MEMBER_OF` as `~((1<<OMP_MEMBER_OF_RANK) - 1)`? ABataev: Maybe it is better to define a constant `constexpr uint64_t OMP_MEMBER_OF_RANK = 48` and then…
		lildmhAuthorUnsubmitted Done Reply Inline Actions In libomptarget, the same way is used to define `OMP_TGT_MAPTYPE_MEMBER_OF`: `OMP_TGT_MAPTYPE_MEMBER_OF = 0xffff000000000000`. So I think they should stay the same. Btw, the number 48 is directly used in libomptarget now, which may need to change in the future. In your code, it assumes bits higher than 48 are all `OMP_MAP_MEMBER_OF`, which may not be true in the future. My code here is more universal, although it does not look great. What do you think? lildmh: In libomptarget, the same way is used to define `OMP_TGT_MAPTYPE_MEMBER_OF`…
		ABataevUnsubmitted Not Done Reply Inline Actions You can apply a mask to drop some of the most significant bits if required. My code looks much cleaner it would be good to have the same code in libomptarget too. But it is up to you what to do here. ABataev: You can apply a mask to drop some of the most significant bits if required. My code looks much…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Maybe let's keep it this way in this patch, then we modify both places in a further patch? lildmh: Maybe let's keep it this way in this patch, then we modify both places in a further patch?
/// Class that associates information with a base pointer to be passed to the		/// Class that associates information with a base pointer to be passed to the
/// runtime library.		/// runtime library.
class BasePointerInfo {		class BasePointerInfo {
/// The base pointer.		/// The base pointer.
llvm::Value *Ptr = nullptr;		llvm::Value *Ptr = nullptr;
/// The base declaration that refers to this device pointer, or null if		/// The base declaration that refers to this device pointer, or null if
/// there is none.		/// there is none.
const ValueDecl *DevPtrDecl = nullptr;		const ValueDecl *DevPtrDecl = nullptr;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	private:
struct DeferredDevicePtrEntryTy {		struct DeferredDevicePtrEntryTy {
const Expr *IE = nullptr;		const Expr *IE = nullptr;
const ValueDecl *VD = nullptr;		const ValueDecl *VD = nullptr;

DeferredDevicePtrEntryTy(const Expr IE, const ValueDecl VD)		DeferredDevicePtrEntryTy(const Expr IE, const ValueDecl VD)
: IE(IE), VD(VD) {}		: IE(IE), VD(VD) {}
};		};

/// Directive from where the map clauses were extracted.		/// The target directive from where the mappable clauses were extracted. It
const OMPExecutableDirective &CurDir;		/// is either a executable directive or a user-defined mapper directive.
		llvm::PointerUnion<const OMPExecutableDirective *,
		const OMPDeclareMapperDecl *>
		CurDir;

/// Function the directive is being generated for.		/// Function the directive is being generated for.
		ABataevUnsubmitted Done Reply Inline Actions Use `llvm::PointerUnion<OMPExecutableDirective , OMPDeclareMapperDecl >` to save the space. ABataev: Use `llvm::PointerUnion<OMPExecutableDirective , OMPDeclareMapperDecl >` to save the space.
CodeGenFunction &CGF;		CodeGenFunction &CGF;

/// Set of all first private variables in the current directive.		/// Set of all first private variables in the current directive.
/// bool data is set to true if the variable is implicitly marked as		/// bool data is set to true if the variable is implicitly marked as
/// firstprivate, false otherwise.		/// firstprivate, false otherwise.
llvm::DenseMap<CanonicalDeclPtr<const VarDecl>, bool> FirstPrivateDecls;		llvm::DenseMap<CanonicalDeclPtr<const VarDecl>, bool> FirstPrivateDecls;

/// Map between device pointer declarations and their expression components.		/// Map between device pointer declarations and their expression components.
▲ Show 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	if (FirstPrivateDecls.count(Cap.getCapturedVar())) {
return MappableExprsHandler::OMP_MAP_PRIVATE \|		return MappableExprsHandler::OMP_MAP_PRIVATE \|
MappableExprsHandler::OMP_MAP_TO;		MappableExprsHandler::OMP_MAP_TO;
}		}
return MappableExprsHandler::OMP_MAP_TO \|		return MappableExprsHandler::OMP_MAP_TO \|
MappableExprsHandler::OMP_MAP_FROM;		MappableExprsHandler::OMP_MAP_FROM;
}		}

static OpenMPOffloadMappingFlags getMemberOfFlag(unsigned Position) {		static OpenMPOffloadMappingFlags getMemberOfFlag(unsigned Position) {
// Member of is given by the 16 MSB of the flag, so rotate by 48 bits.		// Rotate by getFlagMemberOffset() bits.
return static_cast<OpenMPOffloadMappingFlags>(((uint64_t)Position + 1)		return static_cast<OpenMPOffloadMappingFlags>(((uint64_t)Position + 1)
<< 48);		<< getFlagMemberOffset());
}		}

static void setCorrectMemberOfFlag(OpenMPOffloadMappingFlags &Flags,		static void setCorrectMemberOfFlag(OpenMPOffloadMappingFlags &Flags,
OpenMPOffloadMappingFlags MemberOfFlag) {		OpenMPOffloadMappingFlags MemberOfFlag) {
// If the entry is PTR_AND_OBJ but has not been marked with the special		// If the entry is PTR_AND_OBJ but has not been marked with the special
// placeholder value 0xFFFF in the MEMBER_OF field, then it should not be		// placeholder value 0xFFFF in the MEMBER_OF field, then it should not be
// marked as MEMBER_OF.		// marked as MEMBER_OF.
if ((Flags & OMP_MAP_PTR_AND_OBJ) &&		if ((Flags & OMP_MAP_PTR_AND_OBJ) &&
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for (const llvm::PointerUnion<const CXXRecordDecl , const FieldDecl >
getPlainLayout(Base, Layout, /AsBase=/true);		getPlainLayout(Base, Layout, /AsBase=/true);
else		else
Layout.push_back(Data.get<const FieldDecl *>());		Layout.push_back(Data.get<const FieldDecl *>());
}		}
}		}

public:		public:
MappableExprsHandler(const OMPExecutableDirective &Dir, CodeGenFunction &CGF)		MappableExprsHandler(const OMPExecutableDirective &Dir, CodeGenFunction &CGF)
: CurDir(Dir), CGF(CGF) {		: CurDir(&Dir), CGF(CGF) {
// Extract firstprivate clause information.		// Extract firstprivate clause information.
for (const auto *C : Dir.getClausesOfKind<OMPFirstprivateClause>())		for (const auto *C : Dir.getClausesOfKind<OMPFirstprivateClause>())
for (const auto *D : C->varlists())		for (const auto *D : C->varlists())
FirstPrivateDecls.try_emplace(		FirstPrivateDecls.try_emplace(
cast<VarDecl>(cast<DeclRefExpr>(D)->getDecl()), C->isImplicit());		cast<VarDecl>(cast<DeclRefExpr>(D)->getDecl()), C->isImplicit());
// Extract device pointer clause information.		// Extract device pointer clause information.
for (const auto *C : Dir.getClausesOfKind<OMPIsDevicePtrClause>())		for (const auto *C : Dir.getClausesOfKind<OMPIsDevicePtrClause>())
for (auto L : C->component_lists())		for (auto L : C->component_lists())
DevPointersMap[L.first].push_back(L.second);		DevPointersMap[L.first].push_back(L.second);
}		}

		/// Constructor for the declare mapper directive.
		MappableExprsHandler(const OMPDeclareMapperDecl &Dir, CodeGenFunction &CGF)
		: CurDir(&Dir), CGF(CGF) {}

/// Generate code for the combined entry if we have a partially mapped struct		/// Generate code for the combined entry if we have a partially mapped struct
/// and take care of the mapping flags of the arguments corresponding to		/// and take care of the mapping flags of the arguments corresponding to
/// individual struct members.		/// individual struct members.
void emitCombinedEntry(MapBaseValuesArrayTy &BasePointers,		void emitCombinedEntry(MapBaseValuesArrayTy &BasePointers,
MapValuesArrayTy &Pointers, MapValuesArrayTy &Sizes,		MapValuesArrayTy &Pointers, MapValuesArrayTy &Sizes,
MapFlagsArrayTy &Types, MapFlagsArrayTy &CurTypes,		MapFlagsArrayTy &Types, MapFlagsArrayTy &CurTypes,
const StructRangeInfoTy &PartialStruct) const {		const StructRangeInfoTy &PartialStruct) const {
// Base is the base of the struct		// Base is the base of the struct
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	auto &&InfoGen = [&Info](
ArrayRef<OpenMPMapModifierKind> MapModifiers,		ArrayRef<OpenMPMapModifierKind> MapModifiers,
bool ReturnDevicePointer, bool IsImplicit) {		bool ReturnDevicePointer, bool IsImplicit) {
const ValueDecl *VD =		const ValueDecl *VD =
D ? cast<ValueDecl>(D->getCanonicalDecl()) : nullptr;		D ? cast<ValueDecl>(D->getCanonicalDecl()) : nullptr;
Info[VD].emplace_back(L, MapType, MapModifiers, ReturnDevicePointer,		Info[VD].emplace_back(L, MapType, MapModifiers, ReturnDevicePointer,
IsImplicit);		IsImplicit);
};		};

// FIXME: MSVC 2013 seems to require this-> to find member CurDir.		assert(CurDir.is<const OMPExecutableDirective *>() &&
for (const auto *C : this->CurDir.getClausesOfKind<OMPMapClause>())		"Expect a executable directive");
		const auto CurExecDir = CurDir.get<const OMPExecutableDirective >();
		for (const auto *C : CurExecDir->getClausesOfKind<OMPMapClause>())
for (const auto &L : C->component_lists()) {		for (const auto &L : C->component_lists()) {
InfoGen(L.first, L.second, C->getMapType(), C->getMapTypeModifiers(),		InfoGen(L.first, L.second, C->getMapType(), C->getMapTypeModifiers(),
/ReturnDevicePointer=/false, C->isImplicit());		/ReturnDevicePointer=/false, C->isImplicit());
}		}
for (const auto *C : this->CurDir.getClausesOfKind<OMPToClause>())		for (const auto *C : CurExecDir->getClausesOfKind<OMPToClause>())
for (const auto &L : C->component_lists()) {		for (const auto &L : C->component_lists()) {
InfoGen(L.first, L.second, OMPC_MAP_to, llvm::None,		InfoGen(L.first, L.second, OMPC_MAP_to, llvm::None,
/ReturnDevicePointer=/false, C->isImplicit());		/ReturnDevicePointer=/false, C->isImplicit());
}		}
for (const auto *C : this->CurDir.getClausesOfKind<OMPFromClause>())		for (const auto *C : CurExecDir->getClausesOfKind<OMPFromClause>())
for (const auto &L : C->component_lists()) {		for (const auto &L : C->component_lists()) {
InfoGen(L.first, L.second, OMPC_MAP_from, llvm::None,		InfoGen(L.first, L.second, OMPC_MAP_from, llvm::None,
/ReturnDevicePointer=/false, C->isImplicit());		/ReturnDevicePointer=/false, C->isImplicit());
}		}

// Look at the use_device_ptr clause information and mark the existing map		// Look at the use_device_ptr clause information and mark the existing map
// entries as such. If there is no map information for an entry in the		// entries as such. If there is no map information for an entry in the
// use_device_ptr list, we create one with map type 'alloc' and zero size		// use_device_ptr list, we create one with map type 'alloc' and zero size
// section. It is the user fault if that was not mapped before. If there is		// section. It is the user fault if that was not mapped before. If there is
// no map information and the pointer is a struct member, then we defer the		// no map information and the pointer is a struct member, then we defer the
// emission of that entry until the whole struct has been processed.		// emission of that entry until the whole struct has been processed.
llvm::MapVector<const ValueDecl *, SmallVector<DeferredDevicePtrEntryTy, 4>>		llvm::MapVector<const ValueDecl *, SmallVector<DeferredDevicePtrEntryTy, 4>>
DeferredInfo;		DeferredInfo;

// FIXME: MSVC 2013 seems to require this-> to find member CurDir.
for (const auto *C :		for (const auto *C :
this->CurDir.getClausesOfKind<OMPUseDevicePtrClause>()) {		CurExecDir->getClausesOfKind<OMPUseDevicePtrClause>()) {
for (const auto &L : C->component_lists()) {		for (const auto &L : C->component_lists()) {
assert(!L.second.empty() && "Not expecting empty list of components!");		assert(!L.second.empty() && "Not expecting empty list of components!");
const ValueDecl *VD = L.second.back().getAssociatedDeclaration();		const ValueDecl *VD = L.second.back().getAssociatedDeclaration();
VD = cast<ValueDecl>(VD->getCanonicalDecl());		VD = cast<ValueDecl>(VD->getCanonicalDecl());
const Expr *IE = L.second.back().getAssociatedExpression();		const Expr *IE = L.second.back().getAssociatedExpression();
// If the first component is a member expression, we have to look into		// If the first component is a member expression, we have to look into
// 'this', which maps to null in the map of map information. Otherwise		// 'this', which maps to null in the map of map information. Otherwise
// look directly for the information.		// look directly for the information.
Show All 12 Lines	for (const auto *C :
CI->ReturnDevicePointer = true;		CI->ReturnDevicePointer = true;
continue;		continue;
}		}
}		}

// We didn't find any match in our map information - generate a zero		// We didn't find any match in our map information - generate a zero
// size array section - if the pointer is a struct member we defer this		// size array section - if the pointer is a struct member we defer this
// action until the whole struct has been processed.		// action until the whole struct has been processed.
// FIXME: MSVC 2013 seems to require this-> to find member CGF.
if (isa<MemberExpr>(IE)) {		if (isa<MemberExpr>(IE)) {
// Insert the pointer into Info to be processed by		// Insert the pointer into Info to be processed by
// generateInfoForComponentList. Because it is a member pointer		// generateInfoForComponentList. Because it is a member pointer
// without a pointee, no entry will be generated for it, therefore		// without a pointee, no entry will be generated for it, therefore
// we need to generate one after the whole struct has been processed.		// we need to generate one after the whole struct has been processed.
// Nonetheless, generateInfoForComponentList must be called to take		// Nonetheless, generateInfoForComponentList must be called to take
// the pointer into account for the calculation of the range of the		// the pointer into account for the calculation of the range of the
// partial struct.		// partial struct.
InfoGen(nullptr, L.second, OMPC_MAP_unknown, llvm::None,		InfoGen(nullptr, L.second, OMPC_MAP_unknown, llvm::None,
/ReturnDevicePointer=/false, C->isImplicit());		/ReturnDevicePointer=/false, C->isImplicit());
DeferredInfo[nullptr].emplace_back(IE, VD);		DeferredInfo[nullptr].emplace_back(IE, VD);
} else {		} else {
llvm::Value *Ptr = this->CGF.EmitLoadOfScalar(		llvm::Value *Ptr =
this->CGF.EmitLValue(IE), IE->getExprLoc());		CGF.EmitLoadOfScalar(CGF.EmitLValue(IE), IE->getExprLoc());
BasePointers.emplace_back(Ptr, VD);		BasePointers.emplace_back(Ptr, VD);
Pointers.push_back(Ptr);		Pointers.push_back(Ptr);
Sizes.push_back(llvm::Constant::getNullValue(this->CGF.Int64Ty));		Sizes.push_back(llvm::Constant::getNullValue(CGF.Int64Ty));
Types.push_back(OMP_MAP_RETURN_PARAM \| OMP_MAP_TARGET_PARAM);		Types.push_back(OMP_MAP_RETURN_PARAM \| OMP_MAP_TARGET_PARAM);
}		}
}		}
}		}

for (const auto &M : Info) {		for (const auto &M : Info) {
// We need to know when we generate information for the first component		// We need to know when we generate information for the first component
// associated with a capture, because the mapping flags depend on it.		// associated with a capture, because the mapping flags depend on it.
bool IsFirstComponentList = true;		bool IsFirstComponentList = true;

// Temporary versions of arrays		// Temporary versions of arrays
MapBaseValuesArrayTy CurBasePointers;		MapBaseValuesArrayTy CurBasePointers;
MapValuesArrayTy CurPointers;		MapValuesArrayTy CurPointers;
MapValuesArrayTy CurSizes;		MapValuesArrayTy CurSizes;
MapFlagsArrayTy CurTypes;		MapFlagsArrayTy CurTypes;
StructRangeInfoTy PartialStruct;		StructRangeInfoTy PartialStruct;

for (const MapInfo &L : M.second) {		for (const MapInfo &L : M.second) {
assert(!L.Components.empty() &&		assert(!L.Components.empty() &&
"Not expecting declaration with no component lists.");		"Not expecting declaration with no component lists.");

// Remember the current base pointer index.		// Remember the current base pointer index.
unsigned CurrentBasePointersIdx = CurBasePointers.size();		unsigned CurrentBasePointersIdx = CurBasePointers.size();
// FIXME: MSVC 2013 seems to require this-> to find the member method.		generateInfoForComponentList(L.MapType, L.MapModifiers, L.Components,
this->generateInfoForComponentList(		CurBasePointers, CurPointers, CurSizes,
L.MapType, L.MapModifiers, L.Components, CurBasePointers,		CurTypes, PartialStruct,
CurPointers, CurSizes, CurTypes, PartialStruct,
IsFirstComponentList, L.IsImplicit);		IsFirstComponentList, L.IsImplicit);

// If this entry relates with a device pointer, set the relevant		// If this entry relates with a device pointer, set the relevant
// declaration and add the 'return pointer' flag.		// declaration and add the 'return pointer' flag.
if (L.ReturnDevicePointer) {		if (L.ReturnDevicePointer) {
assert(CurBasePointers.size() > CurrentBasePointersIdx &&		assert(CurBasePointers.size() > CurrentBasePointersIdx &&
"Unexpected number of mapped base pointers.");		"Unexpected number of mapped base pointers.");

const ValueDecl *RelevantVD =		const ValueDecl *RelevantVD =
Show All 35 Lines	for (const auto &M : Info) {
// We need to append the results of this capture to what we already have.		// We need to append the results of this capture to what we already have.
BasePointers.append(CurBasePointers.begin(), CurBasePointers.end());		BasePointers.append(CurBasePointers.begin(), CurBasePointers.end());
Pointers.append(CurPointers.begin(), CurPointers.end());		Pointers.append(CurPointers.begin(), CurPointers.end());
Sizes.append(CurSizes.begin(), CurSizes.end());		Sizes.append(CurSizes.begin(), CurSizes.end());
Types.append(CurTypes.begin(), CurTypes.end());		Types.append(CurTypes.begin(), CurTypes.end());
}		}
}		}

		/// Generate all the base pointers, section pointers, sizes and map types for
		/// the extracted map clauses of user-defined mapper.
		void generateAllInfoForMapper(MapBaseValuesArrayTy &BasePointers,
		ABataevUnsubmitted Not Done Reply Inline Actions This code has too many common parts with the existing one. Is it possible to merge it somehow or outline into a function? ABataev: This code has too many common parts with the existing one. Is it possible to merge it somehow…
		lildmhAuthorUnsubmitted Done Reply Inline Actions I tried to merge it with `generateAllInfo`. The problem is `generateAllInfo` also generates information for clauses including `to, from, is_device_ptr, use_device_ptr`, which don't exist for `declare mapper`. There is no clear way to extract them separately. For example, every 4 or 5 lines, the code is intended to address a different clause type. At last, I think the most clear way is to extract all code related to map clauses into this function `generateAllInfoForMapper`. It's ~70 lines of code so not too much. lildmh: I tried to merge it with `generateAllInfo`. The problem is `generateAllInfo` also generates…
		ABataevUnsubmitted Not Done Reply Inline Actions If those clauses do not exist for the declare mapper, it is fine, no problems with them. If they don't exist, we can't generate anything for them, no? But if you think, that it would be better to extract some common parts into a separate function, this also works for me. ABataev: If those clauses do not exist for the declare mapper, it is fine, no problems with them. If…
		MapValuesArrayTy &Pointers,
		MapValuesArrayTy &Sizes,
		MapFlagsArrayTy &Types) const {
		assert(CurDir.is<const OMPDeclareMapperDecl *>() &&
		ABataevUnsubmitted Not Done Reply Inline Actions AFAIK, LLVM has dropped support for msvc 2013, do we still need this? ABataev: AFAIK, LLVM has dropped support for msvc 2013, do we still need this?
		ABataevUnsubmitted Not Done Reply Inline Actions Ping. ABataev: Ping.
		lildmhAuthorUnsubmitted Done Reply Inline Actions I'm okay either way. I guess it doesn't hurt to keep it? lildmh: I'm okay either way. I guess it doesn't hurt to keep it?
		"Expect a declare mapper directive");
		const auto CurMapperDir = CurDir.get<const OMPDeclareMapperDecl >();
		// We have to process the component lists that relate with the same
		// declaration in a single chunk so that we can generate the map flags
		// correctly. Therefore, we organize all lists in a map.
		llvm::MapVector<const ValueDecl *, SmallVector<MapInfo, 8>> Info;

		// Helper function to fill the information map for the different supported
		// clauses.
		auto &&InfoGen = [&Info](
		const ValueDecl *D,
		OMPClauseMappableExprCommon::MappableExprComponentListRef L,
		OpenMPMapClauseKind MapType,
		ArrayRef<OpenMPMapModifierKind> MapModifiers,
		bool ReturnDevicePointer, bool IsImplicit) {
		const ValueDecl *VD =
		D ? cast<ValueDecl>(D->getCanonicalDecl()) : nullptr;
		Info[VD].emplace_back(L, MapType, MapModifiers, ReturnDevicePointer,
		IsImplicit);
		};

		for (const auto *C : CurMapperDir->clauselists()) {
		const auto *MC = cast<OMPMapClause>(C);
		for (const auto &L : MC->component_lists()) {
		InfoGen(L.first, L.second, MC->getMapType(), MC->getMapTypeModifiers(),
		/ReturnDevicePointer=/false, MC->isImplicit());
		}
		}

		for (const auto &M : Info) {
		// We need to know when we generate information for the first component
		// associated with a capture, because the mapping flags depend on it.
		bool IsFirstComponentList = true;

		// Temporary versions of arrays
		MapBaseValuesArrayTy CurBasePointers;
		MapValuesArrayTy CurPointers;
		MapValuesArrayTy CurSizes;
		MapFlagsArrayTy CurTypes;
		StructRangeInfoTy PartialStruct;

		for (const MapInfo &L : M.second) {
		assert(!L.Components.empty() &&
		"Not expecting declaration with no component lists.");
		generateInfoForComponentList(L.MapType, L.MapModifiers, L.Components,
		CurBasePointers, CurPointers, CurSizes,
		CurTypes, PartialStruct,
		IsFirstComponentList, L.IsImplicit);
		IsFirstComponentList = false;
		}

		// If there is an entry in PartialStruct it means we have a struct with
		// individual members mapped. Emit an extra combined entry.
		if (PartialStruct.Base.isValid())
		emitCombinedEntry(BasePointers, Pointers, Sizes, Types, CurTypes,
		PartialStruct);

		// We need to append the results of this capture to what we already have.
		BasePointers.append(CurBasePointers.begin(), CurBasePointers.end());
		Pointers.append(CurPointers.begin(), CurPointers.end());
		Sizes.append(CurSizes.begin(), CurSizes.end());
		Types.append(CurTypes.begin(), CurTypes.end());
		}
		}

/// Emit capture info for lambdas for variables captured by reference.		/// Emit capture info for lambdas for variables captured by reference.
void generateInfoForLambdaCaptures(		void generateInfoForLambdaCaptures(
const ValueDecl VD, llvm::Value Arg, MapBaseValuesArrayTy &BasePointers,		const ValueDecl VD, llvm::Value Arg, MapBaseValuesArrayTy &BasePointers,
MapValuesArrayTy &Pointers, MapValuesArrayTy &Sizes,		MapValuesArrayTy &Pointers, MapValuesArrayTy &Sizes,
MapFlagsArrayTy &Types,		MapFlagsArrayTy &Types,
llvm::DenseMap<llvm::Value , llvm::Value > &LambdaPointers) const {		llvm::DenseMap<llvm::Value , llvm::Value > &LambdaPointers) const {
const auto *RD = VD->getType()		const auto *RD = VD->getType()
.getCanonicalType()		.getCanonicalType()
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (DevPointersMap.count(VD)) {
Types.push_back(OMP_MAP_LITERAL \| OMP_MAP_TARGET_PARAM);		Types.push_back(OMP_MAP_LITERAL \| OMP_MAP_TARGET_PARAM);
return;		return;
}		}

using MapData =		using MapData =
std::tuple<OMPClauseMappableExprCommon::MappableExprComponentListRef,		std::tuple<OMPClauseMappableExprCommon::MappableExprComponentListRef,
OpenMPMapClauseKind, ArrayRef<OpenMPMapModifierKind>, bool>;		OpenMPMapClauseKind, ArrayRef<OpenMPMapModifierKind>, bool>;
SmallVector<MapData, 4> DeclComponentLists;		SmallVector<MapData, 4> DeclComponentLists;
// FIXME: MSVC 2013 seems to require this-> to find member CurDir.		assert(CurDir.is<const OMPExecutableDirective *>() &&
for (const auto *C : this->CurDir.getClausesOfKind<OMPMapClause>()) {		"Expect a executable directive");
		const auto CurExecDir = CurDir.get<const OMPExecutableDirective >();
		for (const auto *C : CurExecDir->getClausesOfKind<OMPMapClause>()) {
for (const auto &L : C->decl_component_lists(VD)) {		for (const auto &L : C->decl_component_lists(VD)) {
assert(L.first == VD &&		assert(L.first == VD &&
"We got information for the wrong declaration??");		"We got information for the wrong declaration??");
assert(!L.second.empty() &&		assert(!L.second.empty() &&
"Not expecting declaration with no component lists.");		"Not expecting declaration with no component lists.");
DeclComponentLists.emplace_back(L.second, C->getMapType(),		DeclComponentLists.emplace_back(L.second, C->getMapType(),
C->getMapTypeModifiers(),		C->getMapTypeModifiers(),
C->isImplicit());		C->isImplicit());
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	public:
}		}

/// Generate the base pointers, section pointers, sizes and map types		/// Generate the base pointers, section pointers, sizes and map types
/// associated with the declare target link variables.		/// associated with the declare target link variables.
void generateInfoForDeclareTargetLink(MapBaseValuesArrayTy &BasePointers,		void generateInfoForDeclareTargetLink(MapBaseValuesArrayTy &BasePointers,
MapValuesArrayTy &Pointers,		MapValuesArrayTy &Pointers,
MapValuesArrayTy &Sizes,		MapValuesArrayTy &Sizes,
MapFlagsArrayTy &Types) const {		MapFlagsArrayTy &Types) const {
		assert(CurDir.is<const OMPExecutableDirective *>() &&
		"Expect a executable directive");
		const auto CurExecDir = CurDir.get<const OMPExecutableDirective >();
// Map other list items in the map clause which are not captured variables		// Map other list items in the map clause which are not captured variables
// but "declare target link" global variables.		// but "declare target link" global variables.
for (const auto *C : this->CurDir.getClausesOfKind<OMPMapClause>()) {		for (const auto *C : CurExecDir->getClausesOfKind<OMPMapClause>()) {
for (const auto &L : C->component_lists()) {		for (const auto &L : C->component_lists()) {
if (!L.first)		if (!L.first)
continue;		continue;
const auto *VD = dyn_cast<VarDecl>(L.first);		const auto *VD = dyn_cast<VarDecl>(L.first);
if (!VD)		if (!VD)
continue;		continue;
llvm::Optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =		llvm::Optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =
OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);		OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	if (Info.NumberOfPtrs) {
auto *MapTypesArrayGbl = new llvm::GlobalVariable(		auto *MapTypesArrayGbl = new llvm::GlobalVariable(
CGM.getModule(), MapTypesArrayInit->getType(),		CGM.getModule(), MapTypesArrayInit->getType(),
/isConstant=/true, llvm::GlobalValue::PrivateLinkage,		/isConstant=/true, llvm::GlobalValue::PrivateLinkage,
MapTypesArrayInit, MaptypesName);		MapTypesArrayInit, MaptypesName);
MapTypesArrayGbl->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);		MapTypesArrayGbl->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
Info.MapTypesArray = MapTypesArrayGbl;		Info.MapTypesArray = MapTypesArrayGbl;

for (unsigned I = 0; I < Info.NumberOfPtrs; ++I) {		for (unsigned I = 0; I < Info.NumberOfPtrs; ++I) {
llvm::Value BPVal = BasePointers[I];		llvm::Value BPVal = BasePointers[I];
		ABataevUnsubmitted Not Done Reply Inline Actions Hmm, how could you calculate the required size of the array for the mapper? Or this is constant? ABataev: Hmm, how could you calculate the required size of the array for the mapper? Or this is constant?
		lildmhAuthorUnsubmitted Done Reply Inline Actions I'm not sure I understand your question here. Do you mean the size when an OpenMP array section is mapped? If that's the case, it is not constant. Existing code can already handle it. Or do you mean the size of mapper array (i.e., `MapArrayType`)? This is constant and depends on how many map clauses exist in the declare mapper directive. lildmh: I'm not sure I understand your question here. Do you mean the size when an OpenMP array section…
		ABataevUnsubmitted Not Done Reply Inline Actions Yes, it is the question about the size of mapper array. It is the part of our discussion about mappers generation we had before. You said that it is hard to generate the sizes of the arrays since we don't know the number of map clauses at the codegen phase. bu there we have this number. ABataev: Yes, it is the question about the size of mapper array. It is the part of our discussion about…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Sorry there was probably some miscommunication. What I meant is that after fully expanded, for example, from `map(mapper(id):a[0:n])`, eventually to `map(a.b.c[0:e]) map(a.k) ...`, the number of things in the results is unknown at compile time. Here, we only do one level of expansion of one instance based on the `declare mapper` directive, for example, the mapper is `declare mapper(class A a) map(a.b[0:a.n]) map(a.c)` In this case, the size of mapper array is 2, because there are 2 map clauses (actually it's more than 2 because the first map clause maps an array). This number can be decided at compile time easily. lildmh: Sorry there was probably some miscommunication. What I meant is that after fully expanded, for…
		ABataevUnsubmitted Not Done Reply Inline Actions Let's discuss the runtime at first, later we can return to this. ABataev: Let's discuss the runtime at first, later we can return to this.
llvm::Value *BP = CGF.Builder.CreateConstInBoundsGEP2_32(		llvm::Value *BP = CGF.Builder.CreateConstInBoundsGEP2_32(
llvm::ArrayType::get(CGM.VoidPtrTy, Info.NumberOfPtrs),		llvm::ArrayType::get(CGM.VoidPtrTy, Info.NumberOfPtrs),
Info.BasePointersArray, 0, I);		Info.BasePointersArray, 0, I);
BP = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(		BP = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
BP, BPVal->getType()->getPointerTo(/AddrSpace=/0));		BP, BPVal->getType()->getPointerTo(/AddrSpace=/0));
Address BPAddr(BP, Ctx.getTypeAlignInChars(Ctx.VoidPtrTy));		Address BPAddr(BP, Ctx.getTypeAlignInChars(Ctx.VoidPtrTy));
CGF.Builder.CreateStore(BPVal, BPAddr);		CGF.Builder.CreateStore(BPVal, BPAddr);

Show All 17 Lines	for (unsigned I = 0; I < Info.NumberOfPtrs; ++I) {
/Idx0=/0,		/Idx0=/0,
/Idx1=/I);		/Idx1=/I);
Address SAddr(S, Ctx.getTypeAlignInChars(Int64Ty));		Address SAddr(S, Ctx.getTypeAlignInChars(Int64Ty));
CGF.Builder.CreateStore(		CGF.Builder.CreateStore(
CGF.Builder.CreateIntCast(Sizes[I], CGM.Int64Ty, /isSigned=/true),		CGF.Builder.CreateIntCast(Sizes[I], CGM.Int64Ty, /isSigned=/true),
SAddr);		SAddr);
}		}
}		}
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions I think it would be good to move this part to the runtime. We should just pass the mapper function to the runtime functions and the runtime should call those mapper functions and get base pointers, pointers, sizes and maptypes. ABataev: I think it would be good to move this part to the runtime. We should just pass the mapper…
		lildmhAuthorUnsubmitted Done Reply Inline Actions This part cannot be moved into the runtime, because the runtime does not know the map type associated with the mapper. Another argument can be potentially added to the runtime API, but that will be more work and I don't think it's necessary lildmh: This part cannot be moved into the runtime, because the runtime does not know the map type…
		ABataevUnsubmitted Not Done Reply Inline Actions I think it is better again discuss the runtime part of the patch, because everything else depends on the runtime. I would suggest to try to implement the solution we discussed before, where the required data is stored in the runtime dynamic arrays and only after that it is used to transfer the data. ABataev: I think it is better again discuss the runtime part of the patch, because everything else…
}		}

/// Emit the arguments to be passed to the runtime library based on the		/// Emit the arguments to be passed to the runtime library based on the
/// arrays of pointers, sizes and map types.		/// arrays of pointers, sizes and map types.
static void emitOffloadingArraysArgument(		static void emitOffloadingArraysArgument(
CodeGenFunction &CGF, llvm::Value *&BasePointersArrayArg,		CodeGenFunction &CGF, llvm::Value *&BasePointersArrayArg,
llvm::Value &PointersArrayArg, llvm::Value &SizesArrayArg,		llvm::Value &PointersArrayArg, llvm::Value &SizesArrayArg,
llvm::Value *&MapTypesArrayArg, CGOpenMPRuntime::TargetDataInfo &Info) {		llvm::Value *&MapTypesArrayArg, CGOpenMPRuntime::TargetDataInfo &Info) {
CodeGenModule &CGM = CGF.CGM;		CodeGenModule &CGM = CGF.CGM;
if (Info.NumberOfPtrs) {		if (Info.NumberOfPtrs) {
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	if (const auto *NestedDir =
case OMPD_unknown:		case OMPD_unknown:
llvm_unreachable("Unexpected directive.");		llvm_unreachable("Unexpected directive.");
}		}
}		}

return nullptr;		return nullptr;
}		}

		/// Emit the user-defined mapper function. The code generation follows the
		/// pattern in the example below.
		/// \code
		/// void .omp_mapper.<type_name>.<mapper_id>.(void *rt_mapper_handle,
		ABataevUnsubmitted Not Done Reply Inline Actions This function looks like the universal one, regardless of the type `<type_name>` specifics. Do we really need to generate it for each particular type and mapper? Or we could use the same function for all types/mappers? ABataev: This function looks like the universal one, regardless of the type `<type_name>` specifics. Do…
		lildmhAuthorUnsubmitted Done Reply Inline Actions I think we need a particular mapper function for each type and mapper, because the code generated within the mapper function depends on what type and what mapper it is. lildmh: I think we need a particular mapper function for each type and mapper, because the code…
		ABataevUnsubmitted Not Done Reply Inline Actions Hmm, maybe I'm wrong but I don't see significant mapper or type-specific dependencies in this mapper function. It uses the pointer to type and size of the type, but this information can be generalized, I think. Could you point the lines of code that are type and mapper specific? ABataev: Hmm, maybe I'm wrong but I don't see significant mapper or type-specific dependencies in this…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Code between line 8857-8965 is type and mapper specific. For instance, `generateAllInforForMapper` depends on the map clauses associated with the mapper and the internal structure of struct/class type, and generates difference code as a result. `BasePointers.size()` also depends on the above things. lildmh: Code between line 8857-8965 is type and mapper specific. For instance…
		ABataevUnsubmitted Not Done Reply Inline Actions Most of these data can be passed as parameters to the function. It would be good, if we could move this function to the libomptaret library and reduce the number of changes (and, thus, complexity) of the compiler itself. It is always easier to review and to maintain the source code written in C/C++ rather than the changes in the compiler codegen. Plus, it may reduce the size of the final code significantly, I assume. I would appreciate it if you would try to move this function to libomptarget. ABataev: Most of these data can be passed as parameters to the function. It would be good, if we could…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, I don't think libomptarget can do this efficiently. For example, class C { int a; double b } #pragma omp declare mapper(C c) map(c.a, c.b[0:1]) The codegen can directly know there are 2 components (c.a, c.b[0]) in this mapper function (3 actually when we count the pointer), and it can also know the size, starting address, map type, etc. about these components. Passing all these information to libomptarget seems to be a bad idea. Or did I get your idea wrong? lildmh:* Hi Alexey, I don't think libomptarget can do this efficiently. For example, ``` class C {…
		ABataevUnsubmitted Not Done Reply Inline Actions Yes, I understand this. But can we pass some additional parameters to this function so we don't need to generate a unique copy of almost the same function for all types/mappers? ABataev: Yes, I understand this. But can we pass some additional parameters to this function so we don't…
		lildmhAuthorUnsubmitted Done Reply Inline Actions For different types/mappers, the skeleton of mapper functions are similar (i.e., the things outlined in the comment here). I would say most other code is unique, for instance, the code to prepare parameters of call to `__tgt_push_mapper_component`. These code should be much more compared with the skeleton shown here. I cannot think of a way to reduce the code by passing more parameters to this function. Please let me know if you have some suggestions. lildmh: For different types/mappers, the skeleton of mapper functions are similar (i.e., the things…
		/// void base, void begin,
		/// int64_t size, int64_t type) {
		/// // Allocate space for an array section first.
		/// if (size > 1 && !maptype.IsDelete)
		ABataevUnsubmitted Not Done Reply Inline Actions With the buffering-based implementation we need only function. ABataev: With the buffering-based implementation we need only function.
		lildmhAuthorUnsubmitted Done Reply Inline Actions Yes, in either case, we only generate functions here. Is there a problem? lildmh: Yes, in either case, we only generate functions here. Is there a problem?
		ABataevUnsubmitted Not Done Reply Inline Actions Sorry, I meant we'll need only one function. ABataev: Sorry, I meant we'll need only one function.
		lildmhAuthorUnsubmitted Done Reply Inline Actions Yes, in that case, only one is needed. lildmh: Yes, in that case, only one is needed.
		/// __tgt_push_mapper_component(rt_mapper_handle, base, begin,
		/// size*sizeof(Ty), clearToFrom(type));
		/// // Map members.
		/// for (unsigned i = 0; i < size; i++) {
		/// // For each component specified by this mapper:
		/// for (auto c : all_components) {
		ABataevUnsubmitted Not Done Reply Inline Actions Currently `currentComponent` is generated by the compiler. But can we instead pass this data as an extra parameter to this `omp_mapper` function. ABataev: Currently `currentComponent` is generated by the compiler. But can we instead pass this data as…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Emm, I think this scheme will be very difficult and inefficient. If we pass components as an argument of `omp_mapper` function, it means that the runtime needs to generate all components related to a map clause. I don't think the runtime is able to do that efficiently. On the other hand, in the current scheme, these components are naturally generated by the compiler, and the runtime only needs to know the base pointer, pointer, type, size. etc. lildmh: Emm, I think this scheme will be very difficult and inefficient. If we pass components as an…
		ABataevUnsubmitted Not Done Reply Inline Actions With the current scheme, we may end with the code blowout. We need to generate very similar code for different types and variables. The worst thing here is that we will be unable to optimize this huge amount of code because the codegen relies on the runtime functions and the code cannot be inlined. That's why I would like to move as much as possible code to the runtime rather than to emit it in the compiler. ABataev: With the current scheme, we may end with the code blowout. We need to generate very similar…
		lildmhAuthorUnsubmitted Done Reply Inline Actions I understand your concerns. I think this is the best we can do right now. The most worrisome case will be when we have nested mappers within each other. In this case, a mapper function will call another mapper function. We can inline the inner mapper functions in this scenario, so that these mapper function can be properly optimized. As a result, I think the performance should be fine. lildmh: I understand your concerns. I think this is the best we can do right now. The most worrisome…
		ABataevUnsubmitted Not Done Reply Inline Actions Instead, we can use indirect function calls passed in the array to the runtime. Do you think it is going to be slower? In your current scheme, we generate many runtime calls instead. Could you try to estimate the number of calls in cases if we'll call the mappers through the indirect function calls and in your cuurent scheme, where we need to call the runtime functions many times in each particular mapper? ABataev: Instead, we can use indirect function calls passed in the array to the runtime. Do you think it…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, Sorry I don't understand your idea. What indirect function calls do you propose to be passed to the runtime? What are these functions supposed to do? The number of function calls will be exactly equal to the number of components mapped, no matter whether there are nested mappers or not. The number of components depend on the program. E.g., if we map a large array section, then there will be many more function calls. lildmh: Hi Alexey, Sorry I don't understand your idea. What indirect function calls do you propose to…
		ABataevUnsubmitted Not Done Reply Inline Actions I mean the pointers to the mapper function, generated by the compiler. In your comment, it is `c.Mapper()` ABataev: I mean the pointers to the mapper function, generated by the compiler. In your comment, it is…
		lildmhAuthorUnsubmitted Done Reply Inline Actions If we pass nested mapper functions to the runtime, I think it will slow down execution because of the extra level of indirect function calls. E.g., the runtime will call `omp_mapper1`, which calls the runtime back, which calls `omp_mapper2`, .... This can result in a deep call stack. I think the current implementation will be more efficient, which doesn't pass nested mappers to the runtime. One call to the outer most mapper function will have all data mapping done. The call stack will be 2 level deep (the first level is the mapper function, and the second level is `__tgt_push_mapper_component`) in this case from the runtime. There are also more compiler optimization space when we inline all nested mapper functions. lildmh: If we pass nested mapper functions to the runtime, I think it will slow down execution because…
		ABataevUnsubmitted Not Done Reply Inline Actions Yes, if we leave it as is. But if instead of the bunch unique functions we'll have the common one, that accept list if indirect pointers to functions additionally, and move it to the runtime library, we won't need those 2 functions we have currently. We'll have full access to the mapping data vector in the runtime library and won't need to use those 2 accessors we have currently. Instead, we'll need just one runtime functions, which implements the whole mapping logic. We still need to call it recursively, but I assume the number of calls will remain the same as in the current scheme. Did you understand the idea? If yes, it would good if you coild try to estimate the number of function calls in current scheme and in this new scheme to estimate possible pros and cons. ABataev: Yes, if we leave it as is. But if instead of the bunch unique functions we'll have the common…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, Could you give an example for this scheme? 1) I don't understand how the mapper function can have full access to the mapping data vector without providing these 2 accessors. 2) I don't think it is possible to have a common function instead of bunch of unique functions for each mapper declared. lildmh: Hi Alexey, Could you give an example for this scheme? 1) I don't understand how the mapper…
		ABataevUnsubmitted Not Done Reply Inline Actions Hi Lingda, something like this. void __tgt_mapper(void base, void begin, size_t size, int64_t type, auto components[]) { // Allocate space for an array section first. if (size > 1 && !maptype.IsDelete) <push>(base, begin, sizesizeof(Ty), clearToFrom(type)); // Map members. for (unsigned i = 0; i < size; i++) { // For each component specified by this mapper: for (auto c : components) { if (c.hasMapper()) (c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type); else <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type); } } // Delete the array section. if (size > 1 && maptype.IsDelete) <push>(base, begin, sizesizeof(Ty), clearToFrom(type)); } void <type>.mapper(void base, void begin, size_t size, int64_t type) { auto sub_components[] = {...}; __tgt_mapper(base, begin, size, type, sub_components); } ABataev:* Hi Lingda, something like this. ``` void __tgt_mapper(void base, void begin, size_t size…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, I don't think this scheme is more efficient than the current scheme. My reasons are: Most code here is essentially to generate `components`, i.e., we need to generate `c.arg_base, c.arg_begin, c.arg_size, c.arg_type` for each `c` in `components`, so there will still be a lot of code in `<type>.mapper`. It will not reduce the mapper function code, i.e., we will still have a bunch of unique mapper functions. This scheme will prevent a lot of compiler optimization from happening. In reality, a lot of computation should be redundant. E.g., for two components `c1` and `c2`, `c1`'s base may be the same as `c2`'s begin, so the compiler will be able to eliminate these reduction computation, especially when we inline all nested mapper functions together. If we move these computation into the runtime, the compiler will not be able to do such optimization. In terms of the number of `push` function calls, this scheme has the exact same number of calls as the current scheme, so I don't think this scheme can bring performance benefits. The scheme should perform worse than the current scheme, because it reduces the opportunities of compiler optimization as mentioned above. lildmh: Hi Alexey, I don't think this scheme is more efficient than the current scheme. My reasons are…
		ABataevUnsubmitted Not Done Reply Inline Actions Hi Lingda, I'm trying to simplify the code generated by clang and avoid some unnecessary code duplications. If the complexity of this scheme is the same as proposed by you, I would prefer to use this scheme unless there are some other opinions. It is not a problem. This code is unique and is not duplicated in the different mappers. Inlining is no solution here. We still generate to much code, which is almost the same in many cases and it will lead to very ineffective codegen because we still end up with a lot of almost the same code. This also might lead to poor performance. Yes, the number of pushes is always the same, in all possible schemes. It would be good to compare somehow the performance of both schemes, at least preliminary. Also, this solution reduces the number of required runtime functions, instead of 2 we need just 1 and, thus, we need to make fewer runtime functions calls. I think it would better to propose this scheme as an alternate design and discuss it in the OpenMP telecon. What do you think? Or we can try to discuss it in the offline mode via the e-mail with other members. I'm not trying to convince you to implement this scheme right now, but it would be good to discuss it. Maybe it will lead to some better ideas from others? ABataev: Hi Lingda, I'm trying to simplify the code generated by clang and avoid some unnecessary code…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, I still prefer the current scheme, because: I don't like recursive mapper calls, which goes back to my original scheme a little bit. I really think inlining can make a big difference when we have nested mappers. These compiler optimizations are the keys to have better performance for mappers. I don't think the codegen here is inefficient. Yes there is duplicated code across different mapper functions, but why that will lead to poor performance? Although we have 2 runtime functions now, the `__tgt_mapper_num_components` is called only once per mapper. It should have very negligible performance impact. But if you have a different option, we can discuss it next time in the meeting. I do have a time constraint to work on the mapper implementation. I'll no longer work in this project starting this September, and I have about 30% of my time working on it until then. lildmh: Hi Alexey, I still prefer the current scheme, because: 1) I don't like recursive mapper calls…
		ABataevUnsubmitted Not Done Reply Inline Actions Lingda, We have recursive (actually, not recursive, because you cannot use types recursively) mappers calls anyway, it is nature of struсtures/classes. We have a lot of similar code. And I'm not sure that it can be optimized out. Yes, but it means that we have n extra runtime calls, where n is the number of branches in the structure/class tree. I see :(. I understand your concern. In this case, we could try to discuss it offline, in the mailing list, to make it a little bit faster. We just need to hear other opinions on this matter, maybe there are some other pros and cons for these schemes. ABataev: Lingda, 1. We have recursive (actually, not recursive, because you cannot use types…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing list later. We have recursive (actually, not recursive, because you cannot use types recursively) mappers calls anyway, it is nature of struсtures/classes. We won't have recursive calls with inlining. We have a lot of similar code. And I'm not sure that it can be optimized out. I think it's even harder to optimized these code out when we move them into the runtime. Yes, but it means that we have n extra runtime calls, where n is the number of branches in the structure/class tree. I don't quite understand. It's still equal to the number of mappers in any case. lildmh: Hi Alexey, Sure, let's discuss this in the mailing list. I'll summarize it and send it to the…
		ABataevUnsubmitted Not Done Reply Inline Actions Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing list later. Good, thanks! We won't have recursive calls with inlining. We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure that inlining is the best option here. We have a lot of code for each mapper and I'm not sure that the optimizer will be able to squash it effectively. I think it's even harder to optimized these code out when we move them into the runtime. Definitely not, unless we use LTO or inlined runtime. ABataev: > Sure, let's discuss this in the mailing list. I'll summarize it and send it to the mailing…
		lildmhAuthorUnsubmitted Done Reply Inline Actions We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure that inlining is the best option here. We have a lot of code for each mapper and I'm not sure that the optimizer will be able to squash it effectively. Sorry I should not say recursive calls. Here it needs to "recursively" call other mapper functions in case of nested mappers, but we don't need it in case of inlining. Definitely not, unless we use LTO or inlined runtime. But you are proposing to move many code to the runtime here, right? That doesn't make sense to me. lildmh: > We won't have recursive calls anyway (recursive types are not allowed). Plus, I'm not sure…
		ABataevUnsubmitted Not Done Reply Inline Actions But you are proposing to move much code to the runtime here, right? That doesn't make sense to me. I'm just not sure that there going be significant problems with the performance because of that. And it significantly simplifies codegen in the compiler and moves the common part into a single function. Plus, if in future we'll need to modify this functionality for some reason, 2 different versions of the compiler will produce incompatible code. With my scheme, you still can use old runtime and have the same functionality as the old compiler and the new one. ABataev: > But you are proposing to move much code to the runtime here, right? That doesn't make sense…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Hi Alexey, I think more carefully about your scheme, and I don't think we can solve the 2 problems below with this scheme: In the example you gave before, the compiler needs to generate all map types and pass them to `__tgt_mapper` through `sub_components`. But in this case, the compiler won't be able to generate the correct `MEMBER_OF` field in map type. As a result, the runtime has to fix it using the mechanism we already have here: `__tgt_mapper_num_components`. This not only increases complexity, but also, it means the runtime needs further manipulation of the map type, which creates locality issues. While in the current scheme, the map type is generated by compiler once, so the data locality will be very good in this case. `sub_components` includes all components that should be mapped. If we are mapping an array, this means we need to map many components, which will need to allocate memory for `sub_components` in the heap. This creates further memory management burden and is not an efficient way to use memory. Based on these reasons, I think the current scheme is still more preferable. lildmh: Hi Alexey, I think more carefully about your scheme, and I don't think we can solve the 2…
		ABataevUnsubmitted Not Done Reply Inline Actions Hi Lingda, Actually, I thought that the runtime function `__tgt_mapper` will do this, not the compiler. Why do we need to allocate it on the heap? We can allocate it on the stack. ABataev: Hi Lingda, 1. Actually, I thought that the runtime function `__tgt_mapper` will do this, not…
		lildmhAuthorUnsubmitted Done Reply Inline Actions In your scheme, both compiler and `__tgt_mapper` need to do this: the compiler will generate other parts in the type, e.g., `TO` `FROM` bits and basic `MEMBER_OF` bits. Then `__tgt_mapper` needs to modify the `MEMBER_OF` bits later. Since there are a lot of other memory accesses between the compiler and `__tgt_mapper` operations to the same map type, it's very likely the map type will not stay in the cache, which causes locality problem. Assume we are mapping an array with 1000000 elements, and each elements have 5 components. For each component, we need `base, begin_ptr, size, type, mapper`, which are 40 bytes. Together, we will need 1000000 * 5 * 40 = 200MB of space for this array, which stack cannnot handle. lildmh: 1. In your scheme, both compiler and `__tgt_mapper` need to do this: the compiler will generate…
		ABataevUnsubmitted Not Done Reply Inline Actions I don't think it is a big problem, this part of the code is executed on the CPU and I don't think it will lead to significant overhead. When we map an array, we do not map it element-by-element, so we don't need 10000 records. Moreover, we try to merge contiguous parts into single one, reducing the total number of elements. ABataev: 1. I don't think it is a big problem, this part of the code is executed on the CPU and I don't…
		lildmhAuthorUnsubmitted Done Reply Inline Actions I think it is a problem. Beside, doing so is not elegant: why having a single thing (setting the map type) done in 2 places while we can do it in one place? We need to map them element by element because it is not always possible to merge contiguous parts together (there may be no contiguous parts based on the mapper). And merging parts together will be a complex procedure: I don't think it can be done in the runtime because many code is moved into the runtime now. In contrast, the compiler will have better opportunities to merge things. Besides, I don't think there is a valid reason that the current scheme is not good. You mentioned it's complex codegen. But it only has less 200 loc here, I don't see why it is complex. lildmh: 1. I think it is a problem. Beside, doing so is not elegant: why having a single thing (setting…
		ABataevUnsubmitted Not Done Reply Inline Actions I rather doubt that we will need to map a record with 100000 fields element-by-element. I think it would be better to share it with others and listen to their opinions. It is better to spend some extra time to provide good design. You can include your doubts in the description of the new scheme, of course. ABataev: 2. I rather doubt that we will need to map a record with 100000 fields element-by-element. I…
		lildmhAuthorUnsubmitted Done Reply Inline Actions The implementation should work for any case anyway. Besides, I think mapping a large array should be actually a common case. The users don't want to map 1000000 elements by themselves, so they will want to use mapper to let the system do it automatically. Sure, I can release the discussion to the mailing list. I don't see a reason to use the new scheme now. lildmh: 2. The implementation should work for any case anyway. Besides, I think mapping a large array…
		ABataevUnsubmitted Not Done Reply Inline Actions Lingda, I meant to send the message to the OpenMP telecon list :) Could you forward the same e-mail to Ravi and others, please? ABataev: Lingda, I meant to send the message to the OpenMP telecon list :) Could you forward the same e…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Sure, probably no one in the public mailing list will care about it. I'll send it to the bi-weekly meeting list. lildmh: Sure, probably no one in the public mailing list will care about it. I'll send it to the bi…
		/// if (c.hasMapper())
		ABataevUnsubmitted Not Done Reply Inline Actions I don't see this part of logic in the code. Could you show me where is it exactly? ABataev: I don't see this part of logic in the code. Could you show me where is it exactly?
		lildmhAuthorUnsubmitted Done Reply Inline Actions This part doesn't exist in this patch. Currently we don't really look up the mapper for any mapped variable/array/etc. The next patch will add the code to look up the specified mapper for every map clause, and get the mapper function for them correspondingly. lildmh: This part doesn't exist in this patch. Currently we don't really look up the mapper for any…
		ABataevUnsubmitted Done Reply Inline Actions Then we need at least some kind of `TODO` note that this part is not implemented in this patch. ABataev: Then we need at least some kind of `TODO` note that this part is not implemented in this patch.
		/// (*c.Mapper())(rt_mapper_handle, c.arg_base, c.arg_begin, c.arg_size,
		/// c.arg_type);
		/// else
		/// __tgt_push_mapper_component(rt_mapper_handle, c.arg_base,
		/// c.arg_begin, c.arg_size, c.arg_type);
		/// }
		/// }
		/// // Delete the array section.
		/// if (size > 1 && maptype.IsDelete)
		/// __tgt_push_mapper_component(rt_mapper_handle, base, begin,
		/// size*sizeof(Ty), clearToFrom(type));
		/// }
		/// \endcode
		void CGOpenMPRuntime::emitUserDefinedMapper(const OMPDeclareMapperDecl *D,
		CodeGenFunction *CGF) {
		if (UDMMap.count(D) > 0)
		return;
		ASTContext &C = CGM.getContext();
		QualType Ty = D->getType();
		QualType PtrTy = C.getPointerType(Ty).withRestrict();
		QualType Int64Ty = C.getIntTypeForBitwidth(/DestWidth=/64, /Signed=/true);
		auto *MapperVarDecl =
		cast<VarDecl>(cast<DeclRefExpr>(D->getMapperVarRef())->getDecl());
		SourceLocation Loc = D->getLocation();
		CharUnits ElementSize = C.getTypeSizeInChars(Ty);

		// Prepare mapper function arguments and attributes.
		ImplicitParamDecl HandleArg(C, /DC=/nullptr, Loc, /Id=/nullptr,
		C.VoidPtrTy, ImplicitParamDecl::Other);
		ImplicitParamDecl BaseArg(C, /DC=/nullptr, Loc, /Id=/nullptr, C.VoidPtrTy,
		ImplicitParamDecl::Other);
		ImplicitParamDecl BeginArg(C, /DC=/nullptr, Loc, /Id=/nullptr,
		C.VoidPtrTy, ImplicitParamDecl::Other);
		ABataevUnsubmitted Done Reply Inline Actions Always better to use constructor with the location to generate correct debug info for all the parameters. ABataev: Always better to use constructor with the location to generate correct debug info for all the…
		ImplicitParamDecl SizeArg(C, /DC=/nullptr, Loc, /Id=/nullptr, Int64Ty,
		ImplicitParamDecl::Other);
		ImplicitParamDecl TypeArg(C, /DC=/nullptr, Loc, /Id=/nullptr, Int64Ty,
		ImplicitParamDecl::Other);
		FunctionArgList Args;
		Args.push_back(&HandleArg);
		Args.push_back(&BaseArg);
		Args.push_back(&BeginArg);
		Args.push_back(&SizeArg);
		Args.push_back(&TypeArg);
		const CGFunctionInfo &FnInfo =
		CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);
		ABataevUnsubmitted Not Done Reply Inline Actions Bad idea to do this. Better to use something like this: SmallString<256> TyStr; llvm::raw_svector_ostream Out(TyStr); CGM.getCXXABI().getMangleContext().mangleTypeName(Ty, Out); ABataev: Bad idea to do this. Better to use something like this: ``` SmallString<256> TyStr; llvm…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Sounds good. Thanks! lildmh: Sounds good. Thanks!
		llvm::FunctionType *FnTy = CGM.getTypes().GetFunctionType(FnInfo);
		SmallString<64> TyStr;
		llvm::raw_svector_ostream Out(TyStr);
		CGM.getCXXABI().getMangleContext().mangleTypeName(Ty, Out);
		std::string Name = getName({"omp_mapper", TyStr, D->getName()});
		auto *Fn = llvm::Function::Create(FnTy, llvm::GlobalValue::InternalLinkage,
		Name, &CGM.getModule());
		CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, FnInfo);
		Fn->removeFnAttr(llvm::Attribute::OptimizeNone);
		// Start the mapper function code generation.
		CodeGenFunction MapperCGF(CGM);
		MapperCGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args, Loc, Loc);
		// Compute the starting and end addreses of array elements.
		llvm::Value *Size = MapperCGF.EmitLoadOfScalar(
		MapperCGF.GetAddrOfLocalVar(&SizeArg), /Volatile=/false,
		C.getPointerType(Int64Ty), Loc);
		llvm::Value *PtrBegin = MapperCGF.Builder.CreateBitCast(
		MapperCGF.GetAddrOfLocalVar(&BeginArg).getPointer(),
		CGM.getTypes().ConvertTypeForMem(C.getPointerType(PtrTy)));
		llvm::Value *PtrEnd = MapperCGF.Builder.CreateGEP(PtrBegin, Size);
		llvm::Value *MapType = MapperCGF.EmitLoadOfScalar(
		MapperCGF.GetAddrOfLocalVar(&TypeArg), /Volatile=/false,
		C.getPointerType(Int64Ty), Loc);
		// Prepare common arguments for array initiation and deletion.
		llvm::Value *Handle = MapperCGF.EmitLoadOfScalar(
		MapperCGF.GetAddrOfLocalVar(&HandleArg),
		/Volatile=/false, C.getPointerType(C.VoidPtrTy), Loc);
		llvm::Value *BaseIn = MapperCGF.EmitLoadOfScalar(
		MapperCGF.GetAddrOfLocalVar(&BaseArg),
		/Volatile=/false, C.getPointerType(C.VoidPtrTy), Loc);
		llvm::Value *BeginIn = MapperCGF.EmitLoadOfScalar(
		MapperCGF.GetAddrOfLocalVar(&BeginArg),
		/Volatile=/false, C.getPointerType(C.VoidPtrTy), Loc);

		// Emit array initiation if this is an array section and \p MapType indicates
		// that memory allocation is required.
		llvm::BasicBlock *HeadBB = MapperCGF.createBasicBlock("omp.arraymap.head");
		emitUDMapperArrayInitOrDel(MapperCGF, Handle, BaseIn, BeginIn, Size, MapType,
		ElementSize, HeadBB, /IsInit=/true);

		// Emit a for loop to iterate through SizeArg of elements and map all of them.

		// Emit the loop header block.
		MapperCGF.EmitBlock(HeadBB);
		llvm::BasicBlock *BodyBB = MapperCGF.createBasicBlock("omp.arraymap.body");
		llvm::BasicBlock *DoneBB = MapperCGF.createBasicBlock("omp.done");
		// Evaluate whether the initial condition is satisfied.
		llvm::Value *IsEmpty =
		MapperCGF.Builder.CreateICmpEQ(PtrBegin, PtrEnd, "omp.arraymap.isempty");
		MapperCGF.Builder.CreateCondBr(IsEmpty, DoneBB, BodyBB);
		llvm::BasicBlock *EntryBB = MapperCGF.Builder.GetInsertBlock();

		// Emit the loop body block.
		MapperCGF.EmitBlock(BodyBB);
		llvm::PHINode *PtrPHI = MapperCGF.Builder.CreatePHI(
		PtrBegin->getType(), 2, "omp.arraymap.ptrcurrent");
		PtrPHI->addIncoming(PtrBegin, EntryBB);
		Address PtrCurrent =
		Address(PtrPHI, MapperCGF.GetAddrOfLocalVar(&BeginArg)
		.getAlignment()
		.alignmentOfArrayElement(ElementSize));
		// Privatize the declared variable of mapper to be the current array element.
		CodeGenFunction::OMPPrivateScope Scope(MapperCGF);
		Scope.addPrivate(MapperVarDecl, [&MapperCGF, PtrCurrent, PtrTy]() {
		return MapperCGF
		.EmitLoadOfPointerLValue(PtrCurrent, PtrTy->castAs<PointerType>())
		.getAddress();
		});
		(void)Scope.Privatize();

		// Get map clause information. Fill up the arrays with all mapped variables.
		MappableExprsHandler::MapBaseValuesArrayTy BasePointers;
		ABataevUnsubmitted Not Done Reply Inline Actions These data can be passed as parameters to the function, no? ABataev: These data can be passed as parameters to the function, no?
		MappableExprsHandler::MapValuesArrayTy Pointers;
		MappableExprsHandler::MapValuesArrayTy Sizes;
		MappableExprsHandler::MapFlagsArrayTy MapTypes;
		MappableExprsHandler MEHandler(*D, MapperCGF);
		MEHandler.generateAllInfoForMapper(BasePointers, Pointers, Sizes, MapTypes);

		// Call the runtime API __tgt_mapper_num_components to get the number of
		// pre-existing components.
		llvm::Value *OffloadingArgs[] = {Handle};
		llvm::Value *PreviousSize = MapperCGF.EmitRuntimeCall(
		ABataevUnsubmitted Done Reply Inline Actions I don't like this code very much! It hides the logiс ща the MEMBER_OF flag deep inside and it is going to be very hard to update it in future if there are some changes in the flags. ABataev: I don't like this code very much! It hides the logiс ща the MEMBER_OF flag deep inside and it…
		lildmhAuthorUnsubmitted Done Reply Inline Actions Add a function to calculate this offset. Also modify another existing place using the hard coded number 48. lildmh: Add a function to calculate this offset. Also modify another existing place using the hard…
		createRuntimeFunction(OMPRTL__tgt_mapper_num_components), OffloadingArgs);
		llvm::Value *ShiftedPreviousSize = MapperCGF.Builder.CreateShl(
		PreviousSize,
		MapperCGF.Builder.getInt64(MappableExprsHandler::getFlagMemberOffset()));

		// Fill up the runtime mapper handle for all components.
		for (unsigned I = 0; I < BasePointers.size(); ++I) {
		llvm::Value *CurBaseArg = MapperCGF.Builder.CreateBitCast(
		*BasePointers[I], CGM.getTypes().ConvertTypeForMem(C.VoidPtrTy));
		llvm::Value *CurBeginArg = MapperCGF.Builder.CreateBitCast(
		Pointers[I], CGM.getTypes().ConvertTypeForMem(C.VoidPtrTy));
		llvm::Value *CurSizeArg = Sizes[I];

		// Extract the MEMBER_OF field from the map type.
		llvm::BasicBlock *MemberBB = MapperCGF.createBasicBlock("omp.member");
		MapperCGF.EmitBlock(MemberBB);
		llvm::Value *OriMapType = MapperCGF.Builder.getInt64(MapTypes[I]);
		llvm::Value *Member = MapperCGF.Builder.CreateAnd(
		OriMapType,
		MapperCGF.Builder.getInt64(MappableExprsHandler::OMP_MAP_MEMBER_OF));
		llvm::BasicBlock *MemberCombineBB =
		MapperCGF.createBasicBlock("omp.member.combine");
		llvm::BasicBlock *TypeBB = MapperCGF.createBasicBlock("omp.type");
		llvm::Value *IsMember = MapperCGF.Builder.CreateIsNull(Member);
		MapperCGF.Builder.CreateCondBr(IsMember, TypeBB, MemberCombineBB);
		// Add the number of pre-existing components to the MEMBER_OF field if it
		// is valid.
		MapperCGF.EmitBlock(MemberCombineBB);
		llvm::Value *CombinedMember =
		MapperCGF.Builder.CreateNUWAdd(OriMapType, ShiftedPreviousSize);
		// Do nothing if it is not a member of previous components.
		ABataevUnsubmitted Done Reply Inline Actions You can use `nuw` attribute here, I think ABataev: You can use `nuw` attribute here, I think
		MapperCGF.EmitBlock(TypeBB);
		llvm::PHINode *MemberMapType =
		MapperCGF.Builder.CreatePHI(CGM.Int64Ty, 4, "omp.membermaptype");
		MemberMapType->addIncoming(OriMapType, MemberBB);
		MemberMapType->addIncoming(CombinedMember, MemberCombineBB);

		// Combine the map type inherited from user-defined mapper with that
		// specified in the program. According to the OMP_MAP_TO and OMP_MAP_FROM
		// bits of the \a MapType, which is the input argument of the mapper
		// function, the following code will set the OMP_MAP_TO and OMP_MAP_FROM
		// bits of MemberMapType.
		// [OpenMP 5.0], 1.2.6. map-type decay.
		// \| alloc \| to \| from \| tofrom \| release \| delete
		// ----------------------------------------------------------
		// alloc \| alloc \| alloc \| alloc \| alloc \| release \| delete
		// to \| alloc \| to \| alloc \| to \| release \| delete
		// from \| alloc \| alloc \| from \| from \| release \| delete
		// tofrom \| alloc \| to \| from \| tofrom \| release \| delete
		llvm::Value *LeftToFrom = MapperCGF.Builder.CreateAnd(
		MapType,
		MapperCGF.Builder.getInt64(MappableExprsHandler::OMP_MAP_TO \|
		MappableExprsHandler::OMP_MAP_FROM));
		llvm::BasicBlock *AllocBB = MapperCGF.createBasicBlock("omp.type.alloc");
		llvm::BasicBlock *AllocElseBB =
		MapperCGF.createBasicBlock("omp.type.alloc.else");
		llvm::BasicBlock *ToBB = MapperCGF.createBasicBlock("omp.type.to");
		llvm::BasicBlock *ToElseBB = MapperCGF.createBasicBlock("omp.type.to.else");
		llvm::BasicBlock *FromBB = MapperCGF.createBasicBlock("omp.type.from");
		llvm::BasicBlock *EndBB = MapperCGF.createBasicBlock("omp.type.end");
		llvm::Value *IsAlloc = MapperCGF.Builder.CreateIsNull(LeftToFrom);
		MapperCGF.Builder.CreateCondBr(IsAlloc, AllocBB, AllocElseBB);
		// In case of alloc, clear OMP_MAP_TO and OMP_MAP_FROM.
		MapperCGF.EmitBlock(AllocBB);
		llvm::Value *AllocMapType = MapperCGF.Builder.CreateAnd(
		MemberMapType,
		MapperCGF.Builder.getInt64(~(MappableExprsHandler::OMP_MAP_TO \|
		ABataevUnsubmitted Done Reply Inline Actions I don't see this logic in the comment for the function. Could you add more details for all this logic implemented here? ABataev: I don't see this logic in the comment for the function. Could you add more details for all this…
		MappableExprsHandler::OMP_MAP_FROM)));
		MapperCGF.Builder.CreateBr(EndBB);
		MapperCGF.EmitBlock(AllocElseBB);
		llvm::Value *IsTo = MapperCGF.Builder.CreateICmpEQ(
		LeftToFrom,
		MapperCGF.Builder.getInt64(MappableExprsHandler::OMP_MAP_TO));
		MapperCGF.Builder.CreateCondBr(IsTo, ToBB, ToElseBB);
		// In case of to, clear OMP_MAP_FROM.
		MapperCGF.EmitBlock(ToBB);
		llvm::Value *ToMapType = MapperCGF.Builder.CreateAnd(
		MemberMapType,
		MapperCGF.Builder.getInt64(~MappableExprsHandler::OMP_MAP_FROM));
		MapperCGF.Builder.CreateBr(EndBB);
		MapperCGF.EmitBlock(ToElseBB);
		llvm::Value *IsFrom = MapperCGF.Builder.CreateICmpEQ(
		LeftToFrom,
		MapperCGF.Builder.getInt64(MappableExprsHandler::OMP_MAP_FROM));
		MapperCGF.Builder.CreateCondBr(IsFrom, FromBB, EndBB);
		// In case of from, clear OMP_MAP_TO.
		MapperCGF.EmitBlock(FromBB);
		llvm::Value *FromMapType = MapperCGF.Builder.CreateAnd(
		MemberMapType,
		MapperCGF.Builder.getInt64(~MappableExprsHandler::OMP_MAP_TO));
		// In case of tofrom, do nothing.
		MapperCGF.EmitBlock(EndBB);
		llvm::PHINode *CurMapType =
		MapperCGF.Builder.CreatePHI(CGM.Int64Ty, 4, "omp.maptype");
		CurMapType->addIncoming(AllocMapType, AllocBB);
		CurMapType->addIncoming(ToMapType, ToBB);
		CurMapType->addIncoming(FromMapType, FromBB);
		CurMapType->addIncoming(MemberMapType, ToElseBB);

		// TODO: call the corresponding mapper function if a user-defined mapper is
		// associated with this map clause.
		// Call the runtime API __tgt_push_mapper_component to fill up the runtime
		// data structure.
		llvm::Value *OffloadingArgs[] = {Handle, CurBaseArg, CurBeginArg,
		CurSizeArg, CurMapType};
		MapperCGF.EmitRuntimeCall(
		createRuntimeFunction(OMPRTL__tgt_push_mapper_component),
		OffloadingArgs);
		}

		// Update the pointer to point to the next element that needs to be mapped,
		// and check whether we have mapped all elements.
		llvm::Value *PtrNext = MapperCGF.Builder.CreateConstGEP1_32(
		PtrPHI, /Idx0=/1, "omp.arraymap.next");
		PtrPHI->addIncoming(PtrNext, BodyBB);
		llvm::Value *IsDone =
		MapperCGF.Builder.CreateICmpEQ(PtrNext, PtrEnd, "omp.arraymap.isdone");
		llvm::BasicBlock *ExitBB = MapperCGF.createBasicBlock("omp.arraymap.exit");
		MapperCGF.Builder.CreateCondBr(IsDone, ExitBB, BodyBB);

		MapperCGF.EmitBlock(ExitBB);
		// Emit array deletion if this is an array section and \p MapType indicates
		// that deletion is required.
		emitUDMapperArrayInitOrDel(MapperCGF, Handle, BaseIn, BeginIn, Size, MapType,
		ElementSize, DoneBB, /IsInit=/false);

		// Emit the function exit block.
		MapperCGF.EmitBlock(DoneBB, /IsFinished=/true);
		ABataevUnsubmitted Done Reply Inline Actions Use `///` style of comment here Add the description of the logic implemented here ABataev: 1. Use `///` style of comment here 2. Add the description of the logic implemented here
		MapperCGF.FinishFunction();
		UDMMap.try_emplace(D, Fn);
		if (CGF) {
		auto &Decls = FunctionUDMMap.FindAndConstruct(CGF->CurFn);
		Decls.second.push_back(D);
		}
		}

		ABataevUnsubmitted Done Reply Inline Actions Use `StringRef` or `SmallString` ABataev: Use `StringRef` or `SmallString`
		/// Emit the array initialization or deletion portion for user-defined mapper
		/// code generation. First, it evaluates whether an array section is mapped and
		/// whether the \a MapType instructs to delete this section. If \a IsInit is
		/// true, and \a MapType indicates to not delete this array, array
		/// initialization code is generated. If \a IsInit is false, and \a MapType
		/// indicates to not this array, array deletion code is generated.
		void CGOpenMPRuntime::emitUDMapperArrayInitOrDel(
		CodeGenFunction &MapperCGF, llvm::Value Handle, llvm::Value Base,
		llvm::Value Begin, llvm::Value Size, llvm::Value *MapType,
		CharUnits ElementSize, llvm::BasicBlock *ExitBB, bool IsInit) {
		StringRef Prefix = IsInit ? ".init" : ".del";

		// Evaluate if this is an array section.
		llvm::BasicBlock *IsDeleteBB =
		MapperCGF.createBasicBlock("omp.array" + Prefix + ".evaldelete");
		llvm::BasicBlock *BodyBB = MapperCGF.createBasicBlock("omp.array" + Prefix);
		llvm::Value *IsArray = MapperCGF.Builder.CreateICmpSGE(
		Size, MapperCGF.Builder.getInt64(1), "omp.arrayinit.isarray");
		MapperCGF.Builder.CreateCondBr(IsArray, IsDeleteBB, ExitBB);

		// Evaluate if we are going to delete this section.
		ABataevUnsubmitted Done Reply Inline Actions You can use `C.toBits(CGM.getSizeSize())` ABataev: You can use `C.toBits(CGM.getSizeSize())`
		MapperCGF.EmitBlock(IsDeleteBB);
		llvm::Value *DeleteBit = MapperCGF.Builder.CreateAnd(
		ABataevUnsubmitted Done Reply Inline Actions Enclose all substatements into braces or none of them. ABataev: Enclose all substatements into braces or none of them.
		MapType,
		MapperCGF.Builder.getInt64(MappableExprsHandler::OMP_MAP_DELETE));
		llvm::Value *DeleteCond;
		if (IsInit) {
		DeleteCond = MapperCGF.Builder.CreateIsNull(
		DeleteBit, "omp.array" + Prefix + ".delete");
		} else {
		DeleteCond = MapperCGF.Builder.CreateIsNotNull(
		DeleteBit, "omp.array" + Prefix + ".delete");
		}
		MapperCGF.Builder.CreateCondBr(DeleteCond, BodyBB, ExitBB);

		MapperCGF.EmitBlock(BodyBB);
		// Get the array size by multiplying element size and element number (i.e., \p
		// Size).
		llvm::Value *ArraySize = MapperCGF.Builder.CreateNUWMul(
		Size, MapperCGF.Builder.getInt64(ElementSize.getQuantity()));
		// Remove OMP_MAP_TO and OMP_MAP_FROM from the map type, so that it achieves
		// memory allocation/deletion purpose only.
		llvm::Value *MapTypeArg = MapperCGF.Builder.CreateAnd(
		ABataevUnsubmitted Done Reply Inline Actions Just `CGM.getSizeSize()` ABataev: Just `CGM.getSizeSize()`
		MapType,
		ABataevUnsubmitted Done Reply Inline Actions Add `nuw` attribute ABataev: Add `nuw` attribute
		MapperCGF.Builder.getInt64(~(MappableExprsHandler::OMP_MAP_TO \|
		MappableExprsHandler::OMP_MAP_FROM)));
		// Call the runtime API __tgt_push_mapper_component to fill up the runtime
		// data structure.
		llvm::Value *OffloadingArgs[] = {Handle, Base, Begin, ArraySize, MapTypeArg};
		MapperCGF.EmitRuntimeCall(
		createRuntimeFunction(OMPRTL__tgt_push_mapper_component), OffloadingArgs);
		}

void CGOpenMPRuntime::emitTargetNumIterationsCall(		void CGOpenMPRuntime::emitTargetNumIterationsCall(
CodeGenFunction &CGF, const OMPExecutableDirective &D, const Expr *Device,		CodeGenFunction &CGF, const OMPExecutableDirective &D, const Expr *Device,
const llvm::function_ref<llvm::Value *(		const llvm::function_ref<llvm::Value *(
CodeGenFunction &CGF, const OMPLoopDirective &D)> &SizeEmitter) {		CodeGenFunction &CGF, const OMPLoopDirective &D)> &SizeEmitter) {
OpenMPDirectiveKind Kind = D.getDirectiveKind();		OpenMPDirectiveKind Kind = D.getDirectiveKind();
const OMPExecutableDirective *TD = &D;		const OMPExecutableDirective *TD = &D;
// Get nested teams distribute kind directive, if any.		// Get nested teams distribute kind directive, if any.
if (!isOpenMPDistributeDirective(Kind) \|\| !isOpenMPTeamsDirective(Kind))		if (!isOpenMPDistributeDirective(Kind) \|\| !isOpenMPTeamsDirective(Kind))
▲ Show 20 Lines • Show All 2,161 Lines • Show Last 20 Lines

lib/CodeGen/ModuleBuilder.cpp

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	void HandleTagDeclDefinition(TagDecl *D) override {
}		}
}		}
// For OpenMP emit declare reduction functions, if required.		// For OpenMP emit declare reduction functions, if required.
if (Ctx->getLangOpts().OpenMP) {		if (Ctx->getLangOpts().OpenMP) {
for (Decl *Member : D->decls()) {		for (Decl *Member : D->decls()) {
if (auto *DRD = dyn_cast<OMPDeclareReductionDecl>(Member)) {		if (auto *DRD = dyn_cast<OMPDeclareReductionDecl>(Member)) {
if (Ctx->DeclMustBeEmitted(DRD))		if (Ctx->DeclMustBeEmitted(DRD))
Builder->EmitGlobal(DRD);		Builder->EmitGlobal(DRD);
		} else if (auto *DMD = dyn_cast<OMPDeclareMapperDecl>(Member)) {
		if (Ctx->DeclMustBeEmitted(DMD))
		Builder->EmitGlobal(DMD);
}		}
}		}
}		}
}		}

void HandleTagDeclRequiredDefinition(const TagDecl *D) override {		void HandleTagDeclRequiredDefinition(const TagDecl *D) override {
if (Diags.hasErrorOccurred())		if (Diags.hasErrorOccurred())
return;		return;
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

test/OpenMP/declare_mapper_codegen.cpp

	///==========================================================================///
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - \| FileCheck -allow-deprecated-dag-overlap %s
	// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck -allow-deprecated-dag-overlap %s
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - \| FileCheck -allow-deprecated-dag-overlap %s
	// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck -allow-deprecated-dag-overlap %s

	// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - \| FileCheck -allow-deprecated-dag-overlap --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck -allow-deprecated-dag-overlap --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - \| FileCheck -allow-deprecated-dag-overlap --check-prefix SIMD-ONLY0 %s
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
	// RUN: %clang_cc1 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - \| FileCheck -allow-deprecated-dag-overlap --check-prefix SIMD-ONLY0 %s

	// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}			// SIMD-ONLY0-NOT: {{__kmpc\|__tgt}}

	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

				///==========================================================================///
				// RUN: %clang_cc1 -DCK0 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix CK0 --check-prefix CK0-64 %s
				// RUN: %clang_cc1 -DCK0 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK0 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix CK0 --check-prefix CK0-64 %s
				// RUN: %clang_cc1 -DCK0 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix CK0 --check-prefix CK0-32 %s
				// RUN: %clang_cc1 -DCK0 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK0 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix CK0 --check-prefix CK0-32 %s

				// RUN: %clang_cc1 -DCK0 -verify -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK0 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK0 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK0 -verify -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK0 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK0 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s

				#ifdef CK0

				// CK0-LABEL: @.__omp_offloading_{{.}}foo{{.}}.region_id = weak constant i8 0
				// CK0-64: [[SIZES:@.+]] = {{.+}}constant [1 x i64] [i64 16]
				// CK0-32: [[SIZES:@.+]] = {{.+}}constant [1 x i64] [i64 8]
				// CK0: [[TYPES:@.+]] = {{.+}}constant [1 x i64] [i64 35]
				// CK0-64: [[TSIZES:@.+]] = {{.+}}constant [1 x i64] [i64 16]
				// CK0-32: [[TSIZES:@.+]] = {{.+}}constant [1 x i64] [i64 8]
				// CK0: [[TTYPES:@.+]] = {{.+}}constant [1 x i64] [i64 33]
				// CK0-64: [[FSIZES:@.+]] = {{.+}}constant [1 x i64] [i64 16]
				// CK0-32: [[FSIZES:@.+]] = {{.+}}constant [1 x i64] [i64 8]
				// CK0: [[FTYPES:@.+]] = {{.+}}constant [1 x i64] [i64 34]

	class C {			class C {
	public:			public:
	int a;			int a;
				double *b;
	};			};

	#pragma omp declare mapper(id: C s) map(s.a)			#pragma omp declare mapper(id: C s) map(s.a, s.b[0:2])

	// CHECK-LABEL: @.__omp_offloading_{{.}}foo{{.}}_l54.region_id = weak constant i8 0			// CK0-LABEL: define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8{{.}}, i8{{.}}, i8{{.}}, i64{{.}}, i64{{.*}})
				// CK0: store i8* %{{[^,]+}}, i8** [[HANDLEADDR:%[^,]+]]
				// CK0: store i8* %{{[^,]+}}, i8** [[BPTRADDR:%[^,]+]]
				// CK0: store i8* %{{[^,]+}}, i8** [[VPTRADDR:%[^,]+]]
				// CK0: store i64 %{{[^,]+}}, i{{64\|32}}* [[SIZEADDR:%[^,]+]]
				// CK0: store i64 %{{[^,]+}}, i64* [[TYPEADDR:%[^,]+]]
				ABataevUnsubmitted Not Done Reply Inline Actions I would not rely on the predetermined indices here, better to use some kind of patterns here just like in other places. ABataev: I would not rely on the predetermined indices here, better to use some kind of patterns here…
				lildmhAuthorUnsubmitted Done Reply Inline Actions Could you give an example about what you suggest? For instance, some other tests I should look into. lildmh: Could you give an example about what you suggest? For instance, some other tests I should look…
				ABataevUnsubmitted Not Done Reply Inline Actions Just like in this test when you're using vars. ABataev: Just like in this test when you're using vars.
				lildmhAuthorUnsubmitted Done Reply Inline Actions Sorry I was not clear before. What do you mean by "predetermined indices" here? If you are referring to, for example, `%0` in `store i8* %0, i8 [[HANDLEADDR:%[^,]+]]`, I guess there is no way to get rid of `%0` because it means the first argument of the function? lildmh:** Sorry I was not clear before. What do you mean by "predetermined indices" here? If you are…
				ABataevUnsubmitted Not Done Reply Inline Actions Yes, I meant those `%0` like registers. Better to mark them as variables in function declaration and use those names in the checks. ABataev: Yes, I meant those `%0` like registers. Better to mark them as variables in function…
				lildmhAuthorUnsubmitted Done Reply Inline Actions Now it's like `define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8, i8, i8, i64, i64)`, I think you are suggesting something like `define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8 [[HANDLE:%[^,]+]], i8* [[BPTR:%[^,]+]], ...)`, and later I can use `store i8* [[HANDLE]], i8** [[HANDLEADDR:%[^,]+]]` I'm not sure how to add names for function arguments. They seems to be always nameless like `(i8, i8, i8, i64, i64)`. Is there a way to do that? lildmh:* Now it's like `define {{.}}void @.omp_mapper.{{.}}C.id{{.}}(i8, i8, i8, i64, i64)`, I…
				ABataevUnsubmitted Not Done Reply Inline Actions If the clang parameters have names, the llvm params also will get the names. But it is not worth it to add the names to the function. Could just use regexp here to avoid using LLVM register names? Just `%{{·+}}`. And rely on the order, i.e. remove `-DAG` checks? ABataev: If the clang parameters have names, the llvm params also will get the names. But it is not…
				// CK0-DAG: [[SIZE:%.+]] = load i64, i64* [[SIZEADDR]]
				// CK0-DAG: [[TYPE:%.+]] = load i64, i64* [[TYPEADDR]]
				// CK0-DAG: [[HANDLE:%.+]] = load i8, i8* [[HANDLEADDR]]
				// CK0-DAG: [[PTRBEGIN:%.+]] = bitcast i8 [[VPTRADDR]] to %class.C
				// CK0-DAG: [[PTREND:%.+]] = getelementptr %class.C, %class.C* [[PTRBEGIN]], i64 [[SIZE]]
				// CK0-DAG: [[BPTR:%.+]] = load i8, i8* [[BPTRADDR]]
				// CK0-DAG: [[BEGIN:%.+]] = load i8, i8* [[VPTRADDR]]
				// CK0: [[ISARRAY:%.+]] = icmp sge i64 [[SIZE]], 1
				// CK0: br i1 [[ISARRAY]], label %[[INITEVALDEL:[^,]+]], label %[[LHEAD:[^,]+]]

				// CK0: [[INITEVALDEL]]
				// CK0: [[TYPEDEL:%.+]] = and i64 [[TYPE]], 8
				// CK0: [[ISNOTDEL:%.+]] = icmp eq i64 [[TYPEDEL]], 0
				// CK0: br i1 [[ISNOTDEL]], label %[[INIT:[^,]+]], label %[[LHEAD:[^,]+]]
				// CK0: [[INIT]]
				// CK0-64-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 16
				// CK0-32-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 8
				// CK0-DAG: [[ITYPE:%.+]] = and i64 [[TYPE]], -4
				// CK0: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTR]], i8* [[BEGIN]], i64 [[ARRSIZE]], i64 [[ITYPE]])
				// CK0: br label %[[LHEAD:[^,]+]]

				// CK0: [[LHEAD]]
				// CK0: [[ISEMPTY:%.+]] = icmp eq %class.C** [[PTRBEGIN]], [[PTREND]]
				// CK0: br i1 [[ISEMPTY]], label %[[DONE:[^,]+]], label %[[LBODY:[^,]+]]
				// CK0: [[LBODY]]
				// CK0: [[PTR:%.+]] = phi %class.C** [ [[PTRBEGIN]], %[[LHEAD]] ], [ [[PTRNEXT:%.+]], %[[LCORRECT:[^,]+]] ]
				// CK0: [[OBJ:%.+]] = load %class.C, %class.C* [[PTR]]
				// CK0-DAG: [[ABEGIN:%.+]] = getelementptr inbounds %class.C, %class.C* [[OBJ]], i32 0, i32 0
				// CK0-DAG: [[BBEGIN:%.+]] = getelementptr inbounds %class.C, %class.C* [[OBJ]], i32 0, i32 1
				// CK0-DAG: [[BBEGIN2:%.+]] = getelementptr inbounds %class.C, %class.C* [[OBJ]], i32 0, i32 1
				// CK0-DAG: [[BARRBEGIN:%.+]] = load double, double* [[BBEGIN2]]
				// CK0-DAG: [[BARRBEGINGEP:%.+]] = getelementptr inbounds double, double* [[BARRBEGIN]], i[[sz:64\|32]] 0
				// CK0-DAG: [[BEND:%.+]] = getelementptr double, double* [[BBEGIN]], i32 1
				// CK0-DAG: [[ABEGINV:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK0-DAG: [[BENDV:%.+]] = bitcast double** [[BEND]] to i8*
				// CK0-DAG: [[ABEGINI:%.+]] = ptrtoint i8* [[ABEGINV]] to i64
				// CK0-DAG: [[BENDI:%.+]] = ptrtoint i8* [[BENDV]] to i64
				// CK0-DAG: [[CSIZE:%.+]] = sub i64 [[BENDI]], [[ABEGINI]]
				// CK0-DAG: [[CUSIZE:%.+]] = sdiv exact i64 [[CSIZE]], ptrtoint (i8* getelementptr (i8, i8* null, i32 1) to i64)
				// CK0-DAG: [[BPTRADDR0BC:%.+]] = bitcast %class.C* [[OBJ]] to i8*
				// CK0-DAG: [[PTRADDR0BC:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK0-DAG: [[PRESIZE:%.+]] = call i64 @__tgt_mapper_num_components(i8* [[HANDLE]])
				// CK0-DAG: [[SHIPRESIZE:%.+]] = shl i64 [[PRESIZE]], 48
				// CK0-DAG: br label %[[MEMBER:[^,]+]]
				// CK0-DAG: [[MEMBER]]
				// CK0-DAG: br i1 true, label %[[LTYPE:[^,]+]], label %[[MEMBERCOM:[^,]+]]
				// CK0-DAG: [[MEMBERCOM]]
				// CK0-DAG: [[MEMBERCOMTYPE:%.+]] = add nuw i64 32, [[SHIPRESIZE]]
				// CK0-DAG: br label %[[LTYPE]]
				// CK0-DAG: [[LTYPE]]
				// CK0-DAG: [[MEMBERTYPE:%.+]] = phi i64 [ 32, %[[MEMBER]] ], [ [[MEMBERCOMTYPE]], %[[MEMBERCOM]] ]
				// CK0-DAG: [[TYPETF:%.+]] = and i64 [[TYPE]], 3
				// CK0-DAG: [[ISALLOC:%.+]] = icmp eq i64 [[TYPETF]], 0
				// CK0-DAG: br i1 [[ISALLOC]], label %[[ALLOC:[^,]+]], label %[[ALLOCELSE:[^,]+]]
				// CK0-DAG: [[ALLOC]]
				// CK0-DAG: [[ALLOCTYPE:%.+]] = and i64 [[MEMBERTYPE]], -4
				// CK0-DAG: br label %[[TYEND:[^,]+]]
				// CK0-DAG: [[ALLOCELSE]]
				// CK0-DAG: [[ISTO:%.+]] = icmp eq i64 [[TYPETF]], 1
				// CK0-DAG: br i1 [[ISTO]], label %[[TO:[^,]+]], label %[[TOELSE:[^,]+]]
				// CK0-DAG: [[TO]]
				// CK0-DAG: [[TOTYPE:%.+]] = and i64 [[MEMBERTYPE]], -3
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TOELSE]]
				// CK0-DAG: [[ISFROM:%.+]] = icmp eq i64 [[TYPETF]], 2
				// CK0-DAG: br i1 [[ISFROM]], label %[[FROM:[^,]+]], label %[[TYEND]]
				// CK0-DAG: [[FROM]]
				// CK0-DAG: [[FROMTYPE:%.+]] = and i64 [[MEMBERTYPE]], -2
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TYEND]]
				// CK0-DAG: [[PHITYPE0:%.+]] = phi i64 [ [[ALLOCTYPE]], %[[ALLOC]] ], [ [[TOTYPE]], %[[TO]] ], [ [[FROMTYPE]], %[[FROM]] ], [ [[MEMBERTYPE]], %[[TOELSE]] ]
				// CK0: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTRADDR0BC]], i8* [[PTRADDR0BC]], i64 [[CUSIZE]], i64 [[PHITYPE0]])
				// CK0-DAG: [[BPTRADDR1BC:%.+]] = bitcast %class.C* [[OBJ]] to i8*
				// CK0-DAG: [[PTRADDR1BC:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK0-DAG: br label %[[MEMBER:[^,]+]]
				// CK0-DAG: [[MEMBER]]
				// CK0-DAG: br i1 false, label %[[LTYPE:[^,]+]], label %[[MEMBERCOM:[^,]+]]
				// CK0-DAG: [[MEMBERCOM]]
				// 281474976710659 == 0x1,000,000,003
				// CK0-DAG: [[MEMBERCOMTYPE:%.+]] = add nuw i64 281474976710659, [[SHIPRESIZE]]
				// CK0-DAG: br label %[[LTYPE]]
				// CK0-DAG: [[LTYPE]]
				// CK0-DAG: [[MEMBERTYPE:%.+]] = phi i64 [ 281474976710659, %[[MEMBER]] ], [ [[MEMBERCOMTYPE]], %[[MEMBERCOM]] ]
				// CK0-DAG: [[TYPETF:%.+]] = and i64 [[TYPE]], 3
				// CK0-DAG: [[ISALLOC:%.+]] = icmp eq i64 [[TYPETF]], 0
				// CK0-DAG: br i1 [[ISALLOC]], label %[[ALLOC:[^,]+]], label %[[ALLOCELSE:[^,]+]]
				// CK0-DAG: [[ALLOC]]
				// CK0-DAG: [[ALLOCTYPE:%.+]] = and i64 [[MEMBERTYPE]], -4
				// CK0-DAG: br label %[[TYEND:[^,]+]]
				// CK0-DAG: [[ALLOCELSE]]
				// CK0-DAG: [[ISTO:%.+]] = icmp eq i64 [[TYPETF]], 1
				// CK0-DAG: br i1 [[ISTO]], label %[[TO:[^,]+]], label %[[TOELSE:[^,]+]]
				// CK0-DAG: [[TO]]
				// CK0-DAG: [[TOTYPE:%.+]] = and i64 [[MEMBERTYPE]], -3
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TOELSE]]
				// CK0-DAG: [[ISFROM:%.+]] = icmp eq i64 [[TYPETF]], 2
				// CK0-DAG: br i1 [[ISFROM]], label %[[FROM:[^,]+]], label %[[TYEND]]
				// CK0-DAG: [[FROM]]
				// CK0-DAG: [[FROMTYPE:%.+]] = and i64 [[MEMBERTYPE]], -2
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TYEND]]
				// CK0-DAG: [[TYPE1:%.+]] = phi i64 [ [[ALLOCTYPE]], %[[ALLOC]] ], [ [[TOTYPE]], %[[TO]] ], [ [[FROMTYPE]], %[[FROM]] ], [ [[MEMBERTYPE]], %[[TOELSE]] ]
				// CK0: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTRADDR1BC]], i8* [[PTRADDR1BC]], i64 4, i64 [[TYPE1]])
				// CK0-DAG: [[BPTRADDR2BC:%.+]] = bitcast double** [[BBEGIN]] to i8*
				// CK0-DAG: [[PTRADDR2BC:%.+]] = bitcast double* [[BARRBEGINGEP]] to i8*
				// CK0-DAG: br label %[[MEMBER:[^,]+]]
				// CK0-DAG: [[MEMBER]]
				// CK0-DAG: br i1 false, label %[[LTYPE:[^,]+]], label %[[MEMBERCOM:[^,]+]]
				// CK0-DAG: [[MEMBERCOM]]
				// 281474976710675 == 0x1,000,000,013
				// CK0-DAG: [[MEMBERCOMTYPE:%.+]] = add nuw i64 281474976710675, [[SHIPRESIZE]]
				// CK0-DAG: br label %[[LTYPE]]
				// CK0-DAG: [[LTYPE]]
				// CK0-DAG: [[MEMBERTYPE:%.+]] = phi i64 [ 281474976710675, %[[MEMBER]] ], [ [[MEMBERCOMTYPE]], %[[MEMBERCOM]] ]
				// CK0-DAG: [[TYPETF:%.+]] = and i64 [[TYPE]], 3
				// CK0-DAG: [[ISALLOC:%.+]] = icmp eq i64 [[TYPETF]], 0
				// CK0-DAG: br i1 [[ISALLOC]], label %[[ALLOC:[^,]+]], label %[[ALLOCELSE:[^,]+]]
				// CK0-DAG: [[ALLOC]]
				// CK0-DAG: [[ALLOCTYPE:%.+]] = and i64 [[MEMBERTYPE]], -4
				// CK0-DAG: br label %[[TYEND:[^,]+]]
				// CK0-DAG: [[ALLOCELSE]]
				// CK0-DAG: [[ISTO:%.+]] = icmp eq i64 [[TYPETF]], 1
				// CK0-DAG: br i1 [[ISTO]], label %[[TO:[^,]+]], label %[[TOELSE:[^,]+]]
				// CK0-DAG: [[TO]]
				// CK0-DAG: [[TOTYPE:%.+]] = and i64 [[MEMBERTYPE]], -3
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TOELSE]]
				// CK0-DAG: [[ISFROM:%.+]] = icmp eq i64 [[TYPETF]], 2
				// CK0-DAG: br i1 [[ISFROM]], label %[[FROM:[^,]+]], label %[[TYEND]]
				// CK0-DAG: [[FROM]]
				// CK0-DAG: [[FROMTYPE:%.+]] = and i64 [[MEMBERTYPE]], -2
				// CK0-DAG: br label %[[TYEND]]
				// CK0-DAG: [[TYEND]]
				// CK0-DAG: [[TYPE2:%.+]] = phi i64 [ [[ALLOCTYPE]], %[[ALLOC]] ], [ [[TOTYPE]], %[[TO]] ], [ [[FROMTYPE]], %[[FROM]] ], [ [[MEMBERTYPE]], %[[TOELSE]] ]
				// CK0: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTRADDR2BC]], i8* [[PTRADDR2BC]], i64 16, i64 [[TYPE2]])
				// CK0: [[PTRNEXT]] = getelementptr %class.C, %class.C* [[PTR]], i32 1
				// CK0: [[ISDONE:%.+]] = icmp eq %class.C** [[PTRNEXT]], [[PTREND]]
				// CK0: br i1 [[ISDONE]], label %[[LEXIT:[^,]+]], label %[[LBODY]]

				// CK0: [[LEXIT]]
				// CK0: [[ISARRAY:%.+]] = icmp sge i64 [[SIZE]], 1
				// CK0: br i1 [[ISARRAY]], label %[[EVALDEL:[^,]+]], label %[[DONE]]
				// CK0: [[EVALDEL]]
				// CK0: [[TYPEDEL:%.+]] = and i64 [[TYPE]], 8
				// CK0: [[ISDEL:%.+]] = icmp ne i64 [[TYPEDEL]], 0
				// CK0: br i1 [[ISDEL]], label %[[DEL:[^,]+]], label %[[DONE]]
				// CK0: [[DEL]]
				// CK0-64-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 16
				// CK0-32-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 8
				// CK0-DAG: [[DTYPE:%.+]] = and i64 [[TYPE]], -4
				// CK0: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTR]], i8* [[BEGIN]], i64 [[ARRSIZE]], i64 [[DTYPE]])
				// CK0: br label %[[DONE]]
				// CK0: [[DONE]]
				// CK0: ret void

	// CHECK: [[SIZES:@.+]] = {{.+}}constant [1 x i[[sz:64\|32]]] [i{{64\|32}} 4]
	// CHECK: [[TYPES:@.+]] = {{.+}}constant [1 x i64] [i64 35]
	// CHECK: [[TSIZES:@.+]] = {{.+}}constant [1 x i[[sz]]] [i[[sz]] 4]
	// CHECK: [[TTYPES:@.+]] = {{.+}}constant [1 x i64] [i64 33]
	// CHECK: [[FSIZES:@.+]] = {{.+}}constant [1 x i[[sz]]] [i[[sz]] 4]
	// CHECK: [[FTYPES:@.+]] = {{.+}}constant [1 x i64] [i64 34]

	// CHECK-LABEL: foo{{.*}}(			// CK0-LABEL: define {{.}}void @{{.}}foo{{.*}}
	void foo(int a){			void foo(int a){
	int i = a;			int i = a;
	C c;			C c;
	c.a = a;			c.a = a;

	// CHECK-DAG: call i32 @__tgt_target(i64 {{.+}}, i8* {{.+}}, i32 1, i8 [[BPGEP:%[0-9]+]], i8 [[PGEP:%[0-9]+]], {{.+}}[[SIZES]]{{.+}}, {{.+}}[[TYPES]]{{.+}})			// CK0-DAG: call i32 @__tgt_target(i64 {{.+}}, i8* {{.+}}, i32 1, i8 [[BPGEP:%[0-9]+]], i8 [[PGEP:%[0-9]+]], {{.+}}[[SIZES]]{{.+}}, {{.+}}[[TYPES]]{{.+}})
	// CHECK-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BPS:%[^,]+]], i32 0, i32 0			// CK0-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BPS:%[^,]+]], i32 0, i32 0
	// CHECK-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[PS:%[^,]+]], i32 0, i32 0			// CK0-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[PS:%[^,]+]], i32 0, i32 0
	// CHECK-DAG: [[BP1:%.+]] = getelementptr inbounds {{.+}}[[BPS]], i32 0, i32 0			// CK0-DAG: [[BP1:%.+]] = getelementptr inbounds {{.+}}[[BPS]], i32 0, i32 0
	// CHECK-DAG: [[P1:%.+]] = getelementptr inbounds {{.+}}[[PS]], i32 0, i32 0			// CK0-DAG: [[P1:%.+]] = getelementptr inbounds {{.+}}[[PS]], i32 0, i32 0
	// CHECK-DAG: [[CBP1:%.+]] = bitcast i8 [[BP1]] to %class.C			// CK0-DAG: [[CBP1:%.+]] = bitcast i8 [[BP1]] to %class.C
	// CHECK-DAG: [[CP1:%.+]] = bitcast i8 [[P1]] to %class.C			// CK0-DAG: [[CP1:%.+]] = bitcast i8 [[P1]] to %class.C
	// CHECK-DAG: store %class.C* [[VAL:%[^,]+]], %class.C** [[CBP1]]			// CK0-DAG: store %class.C* [[VAL:%[^,]+]], %class.C** [[CBP1]]
	// CHECK-DAG: store %class.C* [[VAL]], %class.C** [[CP1]]			// CK0-DAG: store %class.C* [[VAL]], %class.C** [[CP1]]
	// CHECK: call void [[KERNEL:@.+]](%class.C* [[VAL]])			// CK0: call void [[KERNEL:@.+]](%class.C* [[VAL]])
	#pragma omp target map(mapper(id),tofrom: c)			#pragma omp target map(mapper(id),tofrom: c)
	{			{
	++c.a;			++c.a;
	}			}

	// CHECK-DAG: call void @__tgt_target_data_update(i64 -1, i32 1, i8 [[TGEPBP:%.+]], i8 [[TGEPP:%.+]], i[[sz]]* getelementptr {{.+}}[1 x i[[sz]]]* [[TSIZES]], i32 0, i32 0), {{.+}}getelementptr {{.+}}[1 x i64]* [[TTYPES]]{{.+}})			// CK0-DAG: call void @__tgt_target_data_update(i64 -1, i32 1, i8 [[TGEPBP:%.+]], i8 [[TGEPP:%.+]], i64* getelementptr {{.+}}[1 x i64]* [[TSIZES]], i32 0, i32 0), {{.+}}getelementptr {{.+}}[1 x i64]* [[TTYPES]]{{.+}})
	// CHECK-DAG: [[TGEPBP]] = getelementptr inbounds {{.+}}[[TBP:%[^,]+]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[TGEPBP]] = getelementptr inbounds {{.+}}[[TBP:%[^,]+]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[TGEPP]] = getelementptr inbounds {{.+}}[[TP:%[^,]+]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[TGEPP]] = getelementptr inbounds {{.+}}[[TP:%[^,]+]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[TBP0:%.+]] = getelementptr inbounds {{.+}}[[TBP]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[TBP0:%.+]] = getelementptr inbounds {{.+}}[[TBP]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[TP0:%.+]] = getelementptr inbounds {{.+}}[[TP]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[TP0:%.+]] = getelementptr inbounds {{.+}}[[TP]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[TCBP0:%.+]] = bitcast i8 [[TBP0]] to %class.C			// CK0-DAG: [[TCBP0:%.+]] = bitcast i8 [[TBP0]] to %class.C
	// CHECK-DAG: [[TCP0:%.+]] = bitcast i8 [[TP0]] to %class.C			// CK0-DAG: [[TCP0:%.+]] = bitcast i8 [[TP0]] to %class.C
	// CHECK-DAG: store %class.C* [[VAL]], %class.C** [[TCBP0]]			// CK0-DAG: store %class.C* [[VAL]], %class.C** [[TCBP0]]
	// CHECK-DAG: store %class.C* [[VAL]], %class.C** [[TCP0]]			// CK0-DAG: store %class.C* [[VAL]], %class.C** [[TCP0]]
	#pragma omp target update to(mapper(id): c)			#pragma omp target update to(mapper(id): c)

	// CHECK-DAG: call void @__tgt_target_data_update(i64 -1, i32 1, i8 [[FGEPBP:%.+]], i8 [[FGEPP:%.+]], i[[sz]]* getelementptr {{.+}}[1 x i[[sz]]]* [[FSIZES]], i32 0, i32 0), {{.+}}getelementptr {{.+}}[1 x i64]* [[FTYPES]]{{.+}})			// CK0-DAG: call void @__tgt_target_data_update(i64 -1, i32 1, i8 [[FGEPBP:%.+]], i8 [[FGEPP:%.+]], i64* getelementptr {{.+}}[1 x i64]* [[FSIZES]], i32 0, i32 0), {{.+}}getelementptr {{.+}}[1 x i64]* [[FTYPES]]{{.+}})
	// CHECK-DAG: [[FGEPBP]] = getelementptr inbounds {{.+}}[[FBP:%[^,]+]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[FGEPBP]] = getelementptr inbounds {{.+}}[[FBP:%[^,]+]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[FGEPP]] = getelementptr inbounds {{.+}}[[FP:%[^,]+]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[FGEPP]] = getelementptr inbounds {{.+}}[[FP:%[^,]+]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[FBP0:%.+]] = getelementptr inbounds {{.+}}[[FBP]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[FBP0:%.+]] = getelementptr inbounds {{.+}}[[FBP]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[FP0:%.+]] = getelementptr inbounds {{.+}}[[FP]], i{{.+}} 0, i{{.+}} 0			// CK0-DAG: [[FP0:%.+]] = getelementptr inbounds {{.+}}[[FP]], i{{.+}} 0, i{{.+}} 0
	// CHECK-DAG: [[FCBP0:%.+]] = bitcast i8 [[FBP0]] to %class.C			// CK0-DAG: [[FCBP0:%.+]] = bitcast i8 [[FBP0]] to %class.C
	// CHECK-DAG: [[FCP0:%.+]] = bitcast i8 [[FP0]] to %class.C			// CK0-DAG: [[FCP0:%.+]] = bitcast i8 [[FP0]] to %class.C
	// CHECK-DAG: store %class.C* [[VAL]], %class.C** [[FCBP0]]			// CK0-DAG: store %class.C* [[VAL]], %class.C** [[FCBP0]]
	// CHECK-DAG: store %class.C* [[VAL]], %class.C** [[FCP0]]			// CK0-DAG: store %class.C* [[VAL]], %class.C** [[FCP0]]
	#pragma omp target update from(mapper(id): c)			#pragma omp target update from(mapper(id): c)
	}			}


	// CHECK: define internal void [[KERNEL]](%class.C* {{.+}}[[ARG:%.+]])			// CK0: define internal void [[KERNEL]](%class.C* {{.+}}[[ARG:%.+]])
	// CHECK: [[ADDR:%.+]] = alloca %class.C*,			// CK0: [[ADDR:%.+]] = alloca %class.C*,
	// CHECK: store %class.C* [[ARG]], %class.C** [[ADDR]]			// CK0: store %class.C* [[ARG]], %class.C** [[ADDR]]
	// CHECK: [[CADDR:%.+]] = load %class.C, %class.C* [[ADDR]]			// CK0: [[CADDR:%.+]] = load %class.C, %class.C* [[ADDR]]
	// CHECK: [[CAADDR:%.+]] = getelementptr inbounds %class.C, %class.C* [[CADDR]], i32 0, i32 0			// CK0: [[CAADDR:%.+]] = getelementptr inbounds %class.C, %class.C* [[CADDR]], i32 0, i32 0
	// CHECK: [[VAL:%[^,]+]] = load i32, i32* [[CAADDR]]			// CK0: [[VAL:%[^,]+]] = load i32, i32* [[CAADDR]]
	// CHECK: {{.+}} = add nsw i32 [[VAL]], 1			// CK0: {{.+}} = add nsw i32 [[VAL]], 1
	// CHECK: }			// CK0: }

				#endif


				///==========================================================================///
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix CK1 --check-prefix CK1-64 %s
				// RUN: %clang_cc1 -DCK1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix CK1 --check-prefix CK1-64 %s
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix CK1 --check-prefix CK1-32 %s
				// RUN: %clang_cc1 -DCK1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix CK1 --check-prefix CK1-32 %s

				// RUN: %clang_cc1 -DCK1 -verify -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK1 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK1 -fopenmp-simd -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm -femit-all-decls -disable-llvm-passes %s -o - \| FileCheck --check-prefix SIMD-ONLY0 %s
				// RUN: %clang_cc1 -DCK1 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -femit-all-decls -disable-llvm-passes -o %t %s
				// RUN: %clang_cc1 -DCK1 -fopenmp-simd -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -femit-all-decls -disable-llvm-passes -include-pch %t -verify %s -emit-llvm -o - \| FileCheck --check-prefix SIMD-ONLY0 %s

				#ifdef CK1

				template <class T>
				class C {
				public:
				T a;
				};

				#pragma omp declare mapper(id: C<int> s) map(s.a)

				// CK1-LABEL: define {{.}}void @.omp_mapper.{{.}}C{{.}}.id{{.}}(i8{{.}}, i8{{.}}, i8{{.}}, i64{{.}}, i64{{.}})
				// CK1: store i8* %{{[^,]+}}, i8** [[HANDLEADDR:%[^,]+]]
				// CK1: store i8* %{{[^,]+}}, i8** [[BPTRADDR:%[^,]+]]
				// CK1: store i8* %{{[^,]+}}, i8** [[VPTRADDR:%[^,]+]]
				// CK1: store i64 %{{[^,]+}}, i{{64\|32}}* [[SIZEADDR:%[^,]+]]
				// CK1: store i64 %{{[^,]+}}, i64* [[TYPEADDR:%[^,]+]]
				ABataevUnsubmitted Not Done Reply Inline Actions Same here ABataev: Same here
				// CK1-DAG: [[SIZE:%.+]] = load i64, i64* [[SIZEADDR]]
				// CK1-DAG: [[TYPE:%.+]] = load i64, i64* [[TYPEADDR]]
				// CK1-DAG: [[HANDLE:%.+]] = load i8, i8* [[HANDLEADDR]]
				// CK1-DAG: [[PTRBEGIN:%.+]] = bitcast i8 [[VPTRADDR]] to %class.C
				// CK1-DAG: [[PTREND:%.+]] = getelementptr %class.C, %class.C* [[PTRBEGIN]], i64 [[SIZE]]
				// CK1-DAG: [[BPTR:%.+]] = load i8, i8* [[BPTRADDR]]
				// CK1-DAG: [[BEGIN:%.+]] = load i8, i8* [[VPTRADDR]]
				// CK1: [[ISARRAY:%.+]] = icmp sge i64 [[SIZE]], 1
				// CK1: br i1 [[ISARRAY]], label %[[INITEVALDEL:[^,]+]], label %[[LHEAD:[^,]+]]

				// CK1: [[INITEVALDEL]]
				// CK1: [[TYPEDEL:%.+]] = and i64 [[TYPE]], 8
				// CK1: [[ISNOTDEL:%.+]] = icmp eq i64 [[TYPEDEL]], 0
				// CK1: br i1 [[ISNOTDEL]], label %[[INIT:[^,]+]], label %[[LHEAD:[^,]+]]
				// CK1: [[INIT]]
				// CK1-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 4
				// CK1-DAG: [[ITYPE:%.+]] = and i64 [[TYPE]], -4
				// CK1: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTR]], i8* [[BEGIN]], i64 [[ARRSIZE]], i64 [[ITYPE]])
				// CK1: br label %[[LHEAD:[^,]+]]

				// CK1: [[LHEAD]]
				// CK1: [[ISEMPTY:%.+]] = icmp eq %class.C** [[PTRBEGIN]], [[PTREND]]
				// CK1: br i1 [[ISEMPTY]], label %[[DONE:[^,]+]], label %[[LBODY:[^,]+]]
				// CK1: [[LBODY]]
				// CK1: [[PTR:%.+]] = phi %class.C** [ [[PTRBEGIN]], %[[LHEAD]] ], [ [[PTRNEXT:%.+]], %[[LCORRECT:[^,]+]] ]
				// CK1: [[OBJ:%.+]] = load %class.C, %class.C* [[PTR]]
				// CK1-DAG: [[ABEGIN:%.+]] = getelementptr inbounds %class.C, %class.C* [[OBJ]], i32 0, i32 0
				// CK1-DAG: [[AEND:%.+]] = getelementptr i32, i32* [[ABEGIN]], i32 1
				// CK1-DAG: [[ABEGINV:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK1-DAG: [[AENDV:%.+]] = bitcast i32* [[AEND]] to i8*
				// CK1-DAG: [[ABEGINI:%.+]] = ptrtoint i8* [[ABEGINV]] to i64
				// CK1-DAG: [[AENDI:%.+]] = ptrtoint i8* [[AENDV]] to i64
				// CK1-DAG: [[CSIZE:%.+]] = sub i64 [[AENDI]], [[ABEGINI]]
				// CK1-DAG: [[CUSIZE:%.+]] = sdiv exact i64 [[CSIZE]], ptrtoint (i8* getelementptr (i8, i8* null, i32 1) to i64)
				// CK1-DAG: [[BPTRADDR0BC:%.+]] = bitcast %class.C* [[OBJ]] to i8*
				// CK1-DAG: [[PTRADDR0BC:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK1-DAG: [[PRESIZE:%.+]] = call i64 @__tgt_mapper_num_components(i8* [[HANDLE]])
				// CK1-DAG: [[SHIPRESIZE:%.+]] = shl i64 [[PRESIZE]], 48
				// CK1-DAG: br label %[[MEMBER:[^,]+]]
				// CK1-DAG: [[MEMBER]]
				// CK1-DAG: br i1 true, label %[[LTYPE:[^,]+]], label %[[MEMBERCOM:[^,]+]]
				// CK1-DAG: [[MEMBERCOM]]
				// CK1-DAG: [[MEMBERCOMTYPE:%.+]] = add nuw i64 32, [[SHIPRESIZE]]
				// CK1-DAG: br label %[[LTYPE]]
				// CK1-DAG: [[LTYPE]]
				// CK1-DAG: [[MEMBERTYPE:%.+]] = phi i64 [ 32, %[[MEMBER]] ], [ [[MEMBERCOMTYPE]], %[[MEMBERCOM]] ]
				// CK1-DAG: [[TYPETF:%.+]] = and i64 [[TYPE]], 3
				// CK1-DAG: [[ISALLOC:%.+]] = icmp eq i64 [[TYPETF]], 0
				// CK1-DAG: br i1 [[ISALLOC]], label %[[ALLOC:[^,]+]], label %[[ALLOCELSE:[^,]+]]
				// CK1-DAG: [[ALLOC]]
				// CK1-DAG: [[ALLOCTYPE:%.+]] = and i64 [[MEMBERTYPE]], -4
				// CK1-DAG: br label %[[TYEND:[^,]+]]
				// CK1-DAG: [[ALLOCELSE]]
				// CK1-DAG: [[ISTO:%.+]] = icmp eq i64 [[TYPETF]], 1
				// CK1-DAG: br i1 [[ISTO]], label %[[TO:[^,]+]], label %[[TOELSE:[^,]+]]
				// CK1-DAG: [[TO]]
				// CK1-DAG: [[TOTYPE:%.+]] = and i64 [[MEMBERTYPE]], -3
				// CK1-DAG: br label %[[TYEND]]
				// CK1-DAG: [[TOELSE]]
				// CK1-DAG: [[ISFROM:%.+]] = icmp eq i64 [[TYPETF]], 2
				// CK1-DAG: br i1 [[ISFROM]], label %[[FROM:[^,]+]], label %[[TYEND]]
				// CK1-DAG: [[FROM]]
				// CK1-DAG: [[FROMTYPE:%.+]] = and i64 [[MEMBERTYPE]], -2
				// CK1-DAG: br label %[[TYEND]]
				// CK1-DAG: [[TYEND]]
				// CK1-DAG: [[TYPE0:%.+]] = phi i64 [ [[ALLOCTYPE]], %[[ALLOC]] ], [ [[TOTYPE]], %[[TO]] ], [ [[FROMTYPE]], %[[FROM]] ], [ [[MEMBERTYPE]], %[[TOELSE]] ]
				// CK1-64: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTRADDR0BC]], i8* [[PTRADDR0BC]], i64 [[CUSIZE]], i64 [[TYPE0]])
				// CK1-DAG: [[BPTRADDR1BC:%.+]] = bitcast %class.C* [[OBJ]] to i8*
				// CK1-DAG: [[PTRADDR1BC:%.+]] = bitcast i32* [[ABEGIN]] to i8*
				// CK1-DAG: br label %[[MEMBER:[^,]+]]
				// CK1-DAG: [[MEMBER]]
				// CK1-DAG: br i1 false, label %[[LTYPE:[^,]+]], label %[[MEMBERCOM:[^,]+]]
				// CK1-DAG: [[MEMBERCOM]]
				// 281474976710659 == 0x1,000,000,003
				// CK1-DAG: [[MEMBERCOMTYPE:%.+]] = add nuw i64 281474976710659, [[SHIPRESIZE]]
				// CK1-DAG: br label %[[LTYPE]]
				// CK1-DAG: [[LTYPE]]
				// CK1-DAG: [[MEMBERTYPE:%.+]] = phi i64 [ 281474976710659, %[[MEMBER]] ], [ [[MEMBERCOMTYPE]], %[[MEMBERCOM]] ]
				// CK1-DAG: [[TYPETF:%.+]] = and i64 [[TYPE]], 3
				// CK1-DAG: [[ISALLOC:%.+]] = icmp eq i64 [[TYPETF]], 0
				// CK1-DAG: br i1 [[ISALLOC]], label %[[ALLOC:[^,]+]], label %[[ALLOCELSE:[^,]+]]
				// CK1-DAG: [[ALLOC]]
				// CK1-DAG: [[ALLOCTYPE:%.+]] = and i64 [[MEMBERTYPE]], -4
				// CK1-DAG: br label %[[TYEND:[^,]+]]
				// CK1-DAG: [[ALLOCELSE]]
				// CK1-DAG: [[ISTO:%.+]] = icmp eq i64 [[TYPETF]], 1
				// CK1-DAG: br i1 [[ISTO]], label %[[TO:[^,]+]], label %[[TOELSE:[^,]+]]
				// CK1-DAG: [[TO]]
				// CK1-DAG: [[TOTYPE:%.+]] = and i64 [[MEMBERTYPE]], -3
				// CK1-DAG: br label %[[TYEND]]
				// CK1-DAG: [[TOELSE]]
				// CK1-DAG: [[ISFROM:%.+]] = icmp eq i64 [[TYPETF]], 2
				// CK1-DAG: br i1 [[ISFROM]], label %[[FROM:[^,]+]], label %[[TYEND]]
				// CK1-DAG: [[FROM]]
				// CK1-DAG: [[FROMTYPE:%.+]] = and i64 [[MEMBERTYPE]], -2
				// CK1-DAG: br label %[[TYEND]]
				// CK1-DAG: [[TYEND]]
				// CK1-DAG: [[TYPE1:%.+]] = phi i64 [ [[ALLOCTYPE]], %[[ALLOC]] ], [ [[TOTYPE]], %[[TO]] ], [ [[FROMTYPE]], %[[FROM]] ], [ [[MEMBERTYPE]], %[[TOELSE]] ]
				// CK1: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTRADDR1BC]], i8* [[PTRADDR1BC]], i64 4, i64 [[TYPE1]])
				// CK1: [[PTRNEXT]] = getelementptr %class.C, %class.C* [[PTR]], i32 1
				// CK1: [[ISDONE:%.+]] = icmp eq %class.C** [[PTRNEXT]], [[PTREND]]
				// CK1: br i1 [[ISDONE]], label %[[LEXIT:[^,]+]], label %[[LBODY]]

				// CK1: [[LEXIT]]
				// CK1: [[ISARRAY:%.+]] = icmp sge i64 [[SIZE]], 1
				// CK1: br i1 [[ISARRAY]], label %[[EVALDEL:[^,]+]], label %[[DONE]]
				// CK1: [[EVALDEL]]
				// CK1: [[TYPEDEL:%.+]] = and i64 [[TYPE]], 8
				// CK1: [[ISDEL:%.+]] = icmp ne i64 [[TYPEDEL]], 0
				// CK1: br i1 [[ISDEL]], label %[[DEL:[^,]+]], label %[[DONE]]
				// CK1: [[DEL]]
				// CK1-DAG: [[ARRSIZE:%.+]] = mul nuw i64 [[SIZE]], 4
				// CK1-DAG: [[DTYPE:%.+]] = and i64 [[TYPE]], -4
				// CK1: call void @__tgt_push_mapper_component(i8* [[HANDLE]], i8* [[BPTR]], i8* [[BEGIN]], i64 [[ARRSIZE]], i64 [[DTYPE]])
				// CK1: br label %[[DONE]]
				// CK1: [[DONE]]
				// CK1: ret void

				#endif

	#endif			#endif

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP 5.0] Codegen support for user-defined mappersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 213339

include/clang/AST/GlobalDecl.h

lib/AST/ASTContext.cpp

lib/CodeGen/CGDecl.cpp

lib/CodeGen/CGOpenMPRuntime.h

lib/CodeGen/CGOpenMPRuntime.cpp

lib/CodeGen/ModuleBuilder.cpp

test/OpenMP/declare_mapper_codegen.cpp

[OpenMP 5.0] Codegen support for user-defined mappers
ClosedPublic