This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Add support for close map modifier
ClosedPublic

Authored by gtbercea on Jul 26 2019, 10:55 AM.

Details

Summary

This patch adds support for the close map modifier.

The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.

Event Timeline

gtbercea created this revision.Jul 26 2019, 10:55 AM
Herald added a project: Restricted Project. · View Herald Transcript

I think this mostly looks good, but depends on D65001, right?

I think this mostly looks good, but depends on D65001, right?

It does yes. I won't push it until the unified memory stuff is done.

This revision is now accepted and ready to land.Jul 31 2019, 1:49 PM
gtbercea updated this revision to Diff 212827.Aug 1 2019, 8:51 AM
  • Add close modifier flag.
  • Move test to appropriate folder.

I think this still needs some work.

libomptarget/src/device.cpp
173–174

Please update this comment.

207–208

Why do we need this check? If logic doesn't fail me, this should always be true with the condition above.

libomptarget/src/device.h
141

I'd call this HasCloseModifier (here and in other places)

libomptarget/src/omptarget.cpp
299

Please use IsCloseModifier / HasCloseModifier from above.

301

Can you remove the second Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY? Either it's not set, or it's set and the condition needs to check if the mapping has the close modifier.

398

Same here, use IsCloseModifier / HasCloseModifier from above.

400

Same, logic can be simplified.

413–415

Can we have TgtPtrBegin == HstPtrBegin when the mapping has the close modifier?

libomptarget/test/unified_shared_memory/close_modifier.c
18

Please use int as in the other tests.

21

Please format.

26–27

Please don't cast to long long!

gtbercea updated this revision to Diff 212879.Aug 1 2019, 12:14 PM
gtbercea marked 2 inline comments as done.Aug 1 2019, 12:14 PM
  • Address comments.
gtbercea marked 6 inline comments as done.Aug 1 2019, 12:15 PM
gtbercea marked 3 inline comments as done.Aug 1 2019, 12:21 PM
gtbercea updated this revision to Diff 212886.Aug 1 2019, 12:52 PM
  • Update.
Hahnfeld added inline comments.Aug 2 2019, 12:06 AM
libomptarget/src/omptarget.cpp
301–302

Hmm, thinking about this again: Does close imply always? I don't find this in the spec, so I think HasCloseModifier should not be in the inner condition.

413

Same here.

libomptarget/test/unified_shared_memory/close_modifier.c
21

Ping, test is still not formatted.

I'd propose to mark the test close_modifier.c as unsupported with older compiler versions. Additionally, can you add a small, new test with manual calls to __tgt_target_data_begin / _end without relying on the compiler?

libomptarget/test/unified_shared_memory/close_modifier.c
29–31

Unfortunately, I don't think that's enough for this test to pass with older versions of Clang because it didn't support code generation for close. Did you test this?

gtbercea marked an inline comment as done.Aug 2 2019, 11:53 AM
gtbercea added inline comments.
libomptarget/src/omptarget.cpp
301–302

I think I see what you mean. The issue is not that this is in the innermost loop but that it may generate a copy to the device when IsNew is false.

Hahnfeld added inline comments.Aug 2 2019, 1:18 PM
libomptarget/src/omptarget.cpp
301–302

Yes, with "innermost condition" I meant the inner if statement, the one in the lines I marked.

gtbercea updated this revision to Diff 213415.Aug 5 2019, 11:25 AM
  • Format test.
gtbercea marked an inline comment as done.Aug 5 2019, 11:29 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_modifier.c
29–31

I don't think it will either but I'm not sure how we can actually make it work. I think there's not much more this test can do. To make this work probably requires a more forgiving check in the compiler.

gtbercea marked an inline comment as done.Aug 5 2019, 11:30 AM
gtbercea updated this revision to Diff 213422.Aug 5 2019, 11:39 AM
  • Add back close modifier flag check.
gtbercea updated this revision to Diff 213423.Aug 5 2019, 11:45 AM
gtbercea marked 2 inline comments as done.
  • Remove close flag from checks where it is not needed.
gtbercea marked an inline comment as done.Aug 5 2019, 11:48 AM

@Hahnfeld I have addressed your comments, please let me know if you have further comments.

Hahnfeld requested changes to this revision.Aug 6 2019, 6:09 AM

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

This revision now requires changes to proceed.Aug 6 2019, 6:09 AM

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation?

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation?

It adds coverage with older compiler versions. If it's too complicated, I'm also fine with adding a check for use_device_ptr and restrict the test with close modifier to newer versions of Clang, up to you.

gtbercea updated this revision to Diff 213661.Aug 6 2019, 11:20 AM
  • Add unsupported clang versions to test.
  • Add more tests.
gtbercea updated this revision to Diff 213666.Aug 6 2019, 11:23 AM
  • Clean-up.
gtbercea updated this revision to Diff 213671.Aug 6 2019, 11:37 AM
  • Clean test.

@Hahnfeld I have updated the tests, I added two new ones. Let me know if you have any further comments.

gtbercea updated this revision to Diff 213699.Aug 6 2019, 1:10 PM
  • Update.

@grokos I have updated this patch as suggested and have added new tests. Please let me know if you any further comments.

@Hahnfeld @grokos any further comments?

I don't have any comments about this patch other than the ones @Hahnfeld mentioned.

I don't have any comments about this patch other than the ones @Hahnfeld mentioned.

Great, @Hahnfeld please let me know if you have any other comments.

Hahnfeld added inline comments.Aug 9 2019, 2:59 AM
libomptarget/test/unified_shared_memory/close_enter_exit.c
7

I think you can safely remove clang-3 to clang-5 because CMake will block tests with anything before clang-6.

libomptarget/test/unified_shared_memory/close_manual.c
7

This completely misses the point: This test is supposed to work with older versions of clang, even those that don't support close.

47–53

Why do we need this?

libomptarget/test/unified_shared_memory/close_modifier.c
7

Same as above.

gtbercea marked 2 inline comments as done.Aug 9 2019, 7:37 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
47–53

To initialize the device.

gtbercea marked an inline comment as done.Aug 9 2019, 7:56 AM
gtbercea marked an inline comment as done.Aug 9 2019, 8:57 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
7

This is because of __tgt_register_requires not because of the other two methods.

gtbercea updated this revision to Diff 214391.Aug 9 2019, 9:34 AM
  • Remove requires manual call.
gtbercea marked 4 inline comments as done.Aug 9 2019, 9:36 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
7

I believe clang-9 will work with unified shared memory. Because we are calling the tgt functions manually we can enable this test for clang-9 even if it tests the close functionality.

gtbercea added a comment.EditedAug 9 2019, 9:48 AM

@Hahnfeld
I would like to clarify something about tgt_register_requires call.

This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference:

===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ========
Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that.
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> RTL 0x000000004747b3e0 has index 0!
Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> Done registering entries!
===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ========

Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.

gtbercea marked an inline comment as done.Aug 9 2019, 9:52 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_enter_exit.c
7

In this test, clang-9 needs to be unsupported because close is used explicitly in the code.

@Hahnfeld
I would like to clarify something about tgt_register_requires call.

This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference:

===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ========
Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that.
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> RTL 0x000000004747b3e0 has index 0!
Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> Done registering entries!
===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ========

Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.

I disagree, the information about requires directives only needs to be present when initializing the device, not when loading a library. I've posted D66019 to solve this and fix the test I mentioned in D65001.

libomptarget/test/unified_shared_memory/close_manual.c
47–53

This should happen in __tgt_target_data_begin which can be a legitimate first directive in a program.

I disagree, the information about requires directives only needs to be present when initializing the device, not when loading a library. I've posted D66019 to solve this and fix the test I mentioned in D65001.

Ok I'll add back the calls to the register_requires functions.

gtbercea marked an inline comment as done.Aug 9 2019, 11:39 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
47–53

Yes target data can be a first directive but if there's no target region in the program then the runtime library won't be loaded/initialized.

gtbercea updated this revision to Diff 214413.Aug 9 2019, 12:10 PM
  • Add back requires registration call.
Hahnfeld requested changes to this revision.Aug 9 2019, 12:30 PM

Can you please rebase on top of D66019? Sorry, that'll probably give you some conflicts :-/

libomptarget/test/unified_shared_memory/close_manual.c
47–53

Ah, got the intention. Can you please add a comment explaining that we need at least one target region to load a "binary" and make all of the data calls work? And I don't think we need to check that device_alloc is correctly transferred, do we? There are other tests for that (plus casting 10 to void * looks evil anyway).

This revision now requires changes to proceed.Aug 9 2019, 12:30 PM
gtbercea updated this revision to Diff 214423.Aug 9 2019, 12:39 PM
  • Rebase.
gtbercea marked an inline comment as done.Aug 9 2019, 12:42 PM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
47–53

Of course, I'll add that.
Yeah I'll get rid of the check and the cast! :)

Hahnfeld accepted this revision.Aug 9 2019, 1:17 PM

LG after addressing the last minor nits (comment + cast + check).

This revision is now accepted and ready to land.Aug 9 2019, 1:17 PM
gtbercea updated this revision to Diff 214451.Aug 9 2019, 2:19 PM
  • Simplify target region.
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 2:37 PM