This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP 5.0] libomptarget interface for declare mapper functions
ClosedPublic

Authored by lildmh on Apr 22 2019, 10:56 AM.

Details

Summary

This patch implements the libomptarget runtime interface for OpenMP 5.0 declare mapper functions. The declare mapper functions generated by Clang will call them to complete the mapping of members.
kmpc_mapper_num_components gets the current number of components for a user-defined mapper; kmpc_push_mapper_component pushes back one component for a user-defined mapper.
The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

Diff Detail

Repository
rL LLVM

Event Timeline

lildmh created this revision.Apr 22 2019, 10:56 AM
grokos added a subscriber: grokos.
grokos removed a subscriber: grokos.
grokos added a comment.EditedApr 24 2019, 12:12 PM

I see that target_data pretty much copies the entire code from target_data_begin and target_data_end. That's 200+ duplicate lines of code. Couldn't we just add this check:

// If a valid mapper is attached, call the mapper function to complete the
// mapping of its members.
if (arg_mapper_ptrs && arg_mapper_ptrs[i]) {
  int (*mapper_func_ptr)(int64_t, void *, void *, int64_t, int64_t);
  mapper_func_ptr = (int (*)(int64_t, void *, void *, int64_t, int64_t))(
      arg_mapper_ptrs[i]);
  int rt = (*mapper_func_ptr)(Device.DeviceID, HstPtrBase, HstPtrBegin,
                              data_size, arg_types[i]);
  if (rt != OFFLOAD_SUCCESS) {
    DP("User-defined mapper function failed.\n");
    return OFFLOAD_FAIL;
  }
}

to the existing functions? All that is needed is change the function signature of target_data_begin and target_data_end to include the extra arg_mapper_ptrs argument and set a default value of NULL so that no other changes are needed throughout the code:

target_data_begin(DeviceTy &Device, int32_t arg_num, void **args_base, void **args, int64_t *arg_sizes, int64_t *arg_types, void **arg_mapper_ptrs = NULL);
target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base, void **args, int64_t *arg_sizes, int64_t *arg_types, void **arg_mapper_ptrs = NULL);

After all, I don't see any point in having a function which tries to execute the "begin" code and the "end" code one after the other.

Hi George,

Thanks for the review.

You are right that target_data is a combination of target_data_begin and target_data_end. They also need to add that code to check if there is a valid mapper. This modification will be in a future patch.

The purpose of this patch is to have an API that is called by the mapper function generated by Clang. In the generated mapper function, it could perform either the functionality of target_data_begin or target_data_end, or a combination of both, depending on the map type. Therefore it's necessary to have target_data, the combination of target_data_begin and target_data_end.

lildmh updated this revision to Diff 202777.Jun 3 2019, 12:42 PM
lildmh edited the summary of this revision. (Show Details)

Implement the new mapper scheme.

grokos added a comment.Jun 3 2019, 1:53 PM

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

lildmh added a comment.Jun 3 2019, 2:07 PM

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

Hi, this is the new scheme, so the previous code is irrelevant now. (no target_data any more)

lildmh added a comment.Jun 3 2019, 2:10 PM

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

Please see the attachment of Ravi sent for the biweekly meeting to see details about this new scheme. Sorry I forgot to mention

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

Please see the attachment of Ravi sent for the biweekly meeting to see details about this new scheme. Sorry I forgot to mention

It would be great to have such things in public...

lildmh added a comment.Jun 4 2019, 5:33 AM

I think this patch is wrong. Where are the previous additions of target_data and __tgt_target_data_mapper? Possibly you uploaded the diff between the previous revision and the current one, whereas you need to upload the diff between origin/master and your current HEAD!

Please see the attachment of Ravi sent for the biweekly meeting to see details about this new scheme. Sorry I forgot to mention

It would be great to have such things in public...

Sure, there is no secret. Please see it here if you are interested: https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

lildmh edited the summary of this revision. (Show Details)Jun 4 2019, 5:34 AM

It would be great to have such things in public...

Sure, there is no secret. Please see it here if you are interested: https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

From a quick look, I'd say this does not reflect the current design: The types are named differently, have a different layout (SoA vs AoS) and there's no implementation of __tgt_target_mapper in this patch as @grokos mentioned.
Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.
  2. Do you really want to construct std::vectors in the compiler generated code? That's bound to cause trouble with Fortran, isn't it?
lildmh added a comment.Jun 4 2019, 7:07 AM

It would be great to have such things in public...

Sure, there is no secret. Please see it here if you are interested: https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

From a quick look, I'd say this does not reflect the current design: The types are named differently, have a different layout (SoA vs AoS) and there's no implementation of __tgt_target_mapper in this patch as @grokos mentioned.

Hi Jonas, starting from slide 8 is the current design, __tgt_target_mapper etc. are deprecated. It may not accurately reflect the current code but the framework should be the same.

Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.

There are other functions in libomptarget starting with __kmpc_, for example, __kmpc_push_target_tripcount.
My understanding is anything that does not directly call the device starting with __kmpc_. The IBM and Intel compiler people seem to be okay with this naming.

  1. Do you really want to construct std::vectors in the compiler generated code? That's bound to cause trouble with Fortran, isn't it?

The compiler doesn't generate code related to std::vector. It's only used in the runtime implementation, so it should be okay with Fortran. Again, the IBM and Intel compiler people seem to agree with it.

It would be great to have such things in public...

Sure, there is no secret. Please see it here if you are interested: https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

From a quick look, I'd say this does not reflect the current design: The types are named differently, have a different layout (SoA vs AoS) and there's no implementation of __tgt_target_mapper in this patch as @grokos mentioned.

Hi Jonas, starting from slide 8 is the current design, __tgt_target_mapper etc. are deprecated. It may not accurately reflect the current code but the framework should be the same.

Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.

There are other functions in libomptarget starting with __kmpc_, for example, __kmpc_push_target_tripcount.
My understanding is anything that does not directly call the device starting with __kmpc_. The IBM and Intel compiler people seem to be okay with this naming.

Yes, this is the only function AFAICS and it has a comment "will be revised". All other functions related to mapping start with __tgt so unless there is good incentive, we should follow this naming convention.

  1. Do you really want to construct std::vectors in the compiler generated code? That's bound to cause trouble with Fortran, isn't it?

The compiler doesn't generate code related to std::vector. It's only used in the runtime implementation, so it should be okay with Fortran. Again, the IBM and Intel compiler people seem to agree with it.

Maybe I don't understand where rt_mapper_handle comes from. According to the design slides and D59474, it passed as an argument to the generated omp_mapper[...] function, but how is the runtime system involved in its creation? Will there be additional interface functions / changes that will call this?

lildmh added a comment.Jun 4 2019, 7:35 AM

It would be great to have such things in public...

Sure, there is no secret. Please see it here if you are interested: https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx

From a quick look, I'd say this does not reflect the current design: The types are named differently, have a different layout (SoA vs AoS) and there's no implementation of __tgt_target_mapper in this patch as @grokos mentioned.

Hi Jonas, starting from slide 8 is the current design, __tgt_target_mapper etc. are deprecated. It may not accurately reflect the current code but the framework should be the same.

Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.

There are other functions in libomptarget starting with __kmpc_, for example, __kmpc_push_target_tripcount.
My understanding is anything that does not directly call the device starting with __kmpc_. The IBM and Intel compiler people seem to be okay with this naming.

Yes, this is the only function AFAICS and it has a comment "will be revised". All other functions related to mapping start with __tgt so unless there is good incentive, we should follow this naming convention.

Sure, I do prefer __tgt, as long as everyone else is okay with it.

  1. Do you really want to construct std::vectors in the compiler generated code? That's bound to cause trouble with Fortran, isn't it?

The compiler doesn't generate code related to std::vector. It's only used in the runtime implementation, so it should be okay with Fortran. Again, the IBM and Intel compiler people seem to agree with it.

Maybe I don't understand where rt_mapper_handle comes from. According to the design slides and D59474, it passed as an argument to the generated omp_mapper[...] function, but how is the runtime system involved in its creation? Will there be additional interface functions / changes that will call this?

The idea is the runtime will create a MapperComponentsTy (std::vector) before calling the mapper function, in, for instance, __tgt_target_data_begin (These parts will be implemented in later patches). When the mapper function is called, the pointer of MapperComponentsTy is passed to it, as void *rt_mapper_handle. The mapper function will call __kmpc_push_mapper_component using this rt_mapper_handle, and then the runtime can put it into the MapperComponentsTy

Moreover, I'd question the following things:

  1. Why are we back to __kmpc_? naming? Most other functions specific to libomptarget are called __tgt_?.

There are other functions in libomptarget starting with __kmpc_, for example, __kmpc_push_target_tripcount.
My understanding is anything that does not directly call the device starting with __kmpc_. The IBM and Intel compiler people seem to be okay with this naming.

Yes, this is the only function AFAICS and it has a comment "will be revised". All other functions related to mapping start with __tgt so unless there is good incentive, we should follow this naming convention.

Sure, I do prefer __tgt, as long as everyone else is okay with it.

Yes, please use a name that starts with __tgt. Names starting with __kmpc are meant for the libomp host runtime specifically. __kmpc_push_tripcount was implemented in libomptarget in a rush and the implementation is incorrect, it will be transferred to libomp and re-implemented correctly.

The compiler doesn't generate code related to std::vector. It's only used in the runtime implementation, so it should be okay with Fortran. Again, the IBM and Intel compiler people seem to agree with it.

Maybe I don't understand where rt_mapper_handle comes from. According to the design slides and D59474, it passed as an argument to the generated omp_mapper[...] function, but how is the runtime system involved in its creation? Will there be additional interface functions / changes that will call this?

The idea is the runtime will create a MapperComponentsTy (std::vector) before calling the mapper function, in, for instance, __tgt_target_data_begin (These parts will be implemented in later patches). When the mapper function is called, the pointer of MapperComponentsTy is passed to it, as void *rt_mapper_handle. The mapper function will call __kmpc_push_mapper_component using this rt_mapper_handle, and then the runtime can put it into the MapperComponentsTy

That's hard to guess from the current patch and isn't in the design slides either (or I have overlooked it). Such information should be made available to reviewers such that they can assess the implementation!

If the new functions mapper_num_components and push_mapper_component are only used by a "callback" that is called from the runtime, why do we need to export them? We could pass the function pointers next to rt_mapper_handle and be done with it.

lildmh added a comment.Jun 5 2019, 9:29 AM

The compiler doesn't generate code related to std::vector. It's only used in the runtime implementation, so it should be okay with Fortran. Again, the IBM and Intel compiler people seem to agree with it.

Maybe I don't understand where rt_mapper_handle comes from. According to the design slides and D59474, it passed as an argument to the generated omp_mapper[...] function, but how is the runtime system involved in its creation? Will there be additional interface functions / changes that will call this?

The idea is the runtime will create a MapperComponentsTy (std::vector) before calling the mapper function, in, for instance, __tgt_target_data_begin (These parts will be implemented in later patches). When the mapper function is called, the pointer of MapperComponentsTy is passed to it, as void *rt_mapper_handle. The mapper function will call __kmpc_push_mapper_component using this rt_mapper_handle, and then the runtime can put it into the MapperComponentsTy

That's hard to guess from the current patch and isn't in the design slides either (or I have overlooked it). Such information should be made available to reviewers such that they can assess the implementation!

Sorry for the inconvenience. I was assuming the reviewer is in the biweekly OpenMP Clang meeting where this scheme is discussed extensively. If you are interested, I can ask Ravi to add you to the mailing list.

If the new functions mapper_num_components and push_mapper_component are only used by a "callback" that is called from the runtime, why do we need to export them? We could pass the function pointers next to rt_mapper_handle and be done with it.

That sounds reasonable to have them function pointers, although I am not sure why it is better than exporting these functions. I present this idea to others in that meeting today, and let's see how they like it.

lildmh updated this revision to Diff 203430.Jun 6 2019, 1:11 PM

Use tgt instead of kmpc

lildmh added a comment.Jun 6 2019, 1:13 PM

@Hahnfeld Do you really think it is necessary to pass these two functions as arguments, instead of exporting them. If you do, could you explain why?

Thanks,
Lingda Li

@Hahnfeld Do you really think it is necessary to pass these two functions as arguments, instead of exporting them. If you do, could you explain why?

I don't say it's necessary, but rather an option. Asked differently, why do we need to export them (and decide on externally visible names) if they're only used in a function where we can pass them as arguments?

At the moment, this patch doesn't really add any value (meaning changed behavior) to the runtime. It's needed for D59474 which is already large enough on its own, but it doesn't get to any state where it can be used.

@Hahnfeld Do you really think it is necessary to pass these two functions as arguments, instead of exporting them. If you do, could you explain why?

I don't say it's necessary, but rather an option. Asked differently, why do we need to export them (and decide on externally visible names) if they're only used in a function where we can pass them as arguments?

At the moment, this patch doesn't really add any value (meaning changed behavior) to the runtime. It's needed for D59474 which is already large enough on its own, but it doesn't get to any state where it can be used.

I don't have a strong reason to export it either. One reason is it avoids one indirect memory access so could be potentially better for performance (the difference should be very minor).

Yea I agree this patch doesn't really do anything itself, it just provide a runtime interface for D59474. Please let me know if there is anything that prevents this patch from being accepted.

lildmh updated this revision to Diff 212211.Jul 29 2019, 12:07 PM

Ping and rebase

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Sure, since this is very simple I'll add a simple test. Do you think the test should be in libomptarget/test/offloading/ or libomptarget/test/api/?

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Sure, since this is very simple I'll add a simple test. Do you think the test should be in libomptarget/test/offloading/ or libomptarget/test/api/?

Sorry, I meant libomptarget/test/mapping/!

The api directory is meant for test cases calling omp_* API functions, which is not the case here.

Now that the clang-side patch is finalized, this is almost good to go. What is missing from this patch is a test. Can you add something under libomptarget/test/offloading/?

Sure, since this is very simple I'll add a simple test. Do you think the test should be in libomptarget/test/offloading/ or libomptarget/test/api/?

Sorry, I meant libomptarget/test/mapping/!

The api directory is meant for test cases calling omp_* API functions, which is not the case here.

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Yes, it's good to have a test, even a very elementary one. When full support for declare mapper is upstreamed we can revisit the test and extend it to check real-use scenarios.

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Yes, it's good to have a test, even a very elementary one. When full support for declare mapper is upstreamed we can revisit the test and extend it to check real-use scenarios.

I just realized that these functions (__tgt_push_mapper_component and __tgt_mapper_num_components) are not exposed to users (i.e., defined in omp.h), so the test proposed above is not possible.

I cannot imagine what test it can have for this patch now, so I think we can leave this patch testless. If you have an idea of test, please let me know.

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Yes, it's good to have a test, even a very elementary one. When full support for declare mapper is upstreamed we can revisit the test and extend it to check real-use scenarios.

I just realized that these functions (__tgt_push_mapper_component and __tgt_mapper_num_components) are not exposed to users (i.e., defined in omp.h), so the test proposed above is not possible.

I cannot imagine what test it can have for this patch now, so I think we can leave this patch testless. If you have an idea of test, please let me know.

You need to declare them yourself in the test. That's not really elegant, but many other runtime tests already do.

You need to declare them yourself in the test. That's not really elegant, but many other runtime tests already do.

Thanks for the tip! I can do that. Since I'm pretty new to the runtime, could you suggest a similar test that I should look into?

You need to declare them yourself in the test. That's not really elegant, but many other runtime tests already do.

Thanks for the tip! I can do that. Since I'm pretty new to the runtime, could you suggest a similar test that I should look into?

test/offloading/requires.c does so: https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/test/offloading/requires.c#L20

I think that's the only one for libomptarget right now, but the host runtime has more complex ones, for example https://github.com/llvm/llvm-project/blob/master/openmp/runtime/test/tasking/kmp_taskloop.c

lildmh updated this revision to Diff 212581.Jul 31 2019, 7:36 AM

Add a test

grokos accepted this revision.Jul 31 2019, 2:31 PM

OK, it's good to go now.

This revision is now accepted and ready to land.Jul 31 2019, 2:31 PM
lildmh added a comment.Aug 1 2019, 6:06 AM

@Meinersbur Hi Michael, could you help me commit this patch? Thanks!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2019, 9:17 PM