Page MenuHomePhabricator

[OpenMP][libomptarget] Add support for close map modifier
ClosedPublic

Authored by gtbercea on Jul 26 2019, 10:55 AM.

Details

Summary

This patch adds support for the close map modifier.

The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gtbercea marked 2 inline comments as done.Aug 1 2019, 12:14 PM
  • Address comments.
gtbercea marked 6 inline comments as done.Aug 1 2019, 12:15 PM
gtbercea marked 3 inline comments as done.Aug 1 2019, 12:21 PM
gtbercea updated this revision to Diff 212886.Aug 1 2019, 12:52 PM
  • Update.
Hahnfeld added inline comments.Aug 2 2019, 12:06 AM
libomptarget/src/omptarget.cpp
299–300 ↗(On Diff #212886)

Hmm, thinking about this again: Does close imply always? I don't find this in the spec, so I think HasCloseModifier should not be in the inner condition.

414 ↗(On Diff #212886)

Same here.

libomptarget/test/unified_shared_memory/close_modifier.c
20 ↗(On Diff #212827)

Ping, test is still not formatted.

I'd propose to mark the test close_modifier.c as unsupported with older compiler versions. Additionally, can you add a small, new test with manual calls to __tgt_target_data_begin / _end without relying on the compiler?

libomptarget/test/unified_shared_memory/close_modifier.c
28–30 ↗(On Diff #212886)

Unfortunately, I don't think that's enough for this test to pass with older versions of Clang because it didn't support code generation for close. Did you test this?

gtbercea marked an inline comment as done.Aug 2 2019, 11:53 AM
gtbercea added inline comments.
libomptarget/src/omptarget.cpp
299–300 ↗(On Diff #212886)

I think I see what you mean. The issue is not that this is in the innermost loop but that it may generate a copy to the device when IsNew is false.

Hahnfeld added inline comments.Aug 2 2019, 1:18 PM
libomptarget/src/omptarget.cpp
299–300 ↗(On Diff #212886)

Yes, with "innermost condition" I meant the inner if statement, the one in the lines I marked.

gtbercea updated this revision to Diff 213415.Aug 5 2019, 11:25 AM
  • Format test.
gtbercea marked an inline comment as done.Aug 5 2019, 11:29 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_modifier.c
28–30 ↗(On Diff #212886)

I don't think it will either but I'm not sure how we can actually make it work. I think there's not much more this test can do. To make this work probably requires a more forgiving check in the compiler.

gtbercea marked an inline comment as done.Aug 5 2019, 11:30 AM
gtbercea updated this revision to Diff 213422.Aug 5 2019, 11:39 AM
  • Add back close modifier flag check.
gtbercea updated this revision to Diff 213423.Aug 5 2019, 11:45 AM
gtbercea marked 2 inline comments as done.
  • Remove close flag from checks where it is not needed.
gtbercea marked an inline comment as done.Aug 5 2019, 11:48 AM

@Hahnfeld I have addressed your comments, please let me know if you have further comments.

Hahnfeld requested changes to this revision.Aug 6 2019, 6:09 AM

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

This revision now requires changes to proceed.Aug 6 2019, 6:09 AM

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation?

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation?

It adds coverage with older compiler versions. If it's too complicated, I'm also fine with adding a check for use_device_ptr and restrict the test with close modifier to newer versions of Clang, up to you.

gtbercea updated this revision to Diff 213661.Aug 6 2019, 11:20 AM
  • Add unsupported clang versions to test.
  • Add more tests.
gtbercea updated this revision to Diff 213666.Aug 6 2019, 11:23 AM
  • Clean-up.
gtbercea updated this revision to Diff 213671.Aug 6 2019, 11:37 AM
  • Clean test.

@Hahnfeld I have updated the tests, I added two new ones. Let me know if you have any further comments.

gtbercea updated this revision to Diff 213699.Aug 6 2019, 1:10 PM
  • Update.

@grokos I have updated this patch as suggested and have added new tests. Please let me know if you any further comments.

@Hahnfeld @grokos any further comments?

I don't have any comments about this patch other than the ones @Hahnfeld mentioned.

I don't have any comments about this patch other than the ones @Hahnfeld mentioned.

Great, @Hahnfeld please let me know if you have any other comments.

Hahnfeld added inline comments.Aug 9 2019, 2:59 AM
libomptarget/test/unified_shared_memory/close_enter_exit.c
6 ↗(On Diff #213699)

I think you can safely remove clang-3 to clang-5 because CMake will block tests with anything before clang-6.

libomptarget/test/unified_shared_memory/close_manual.c
6 ↗(On Diff #213699)

This completely misses the point: This test is supposed to work with older versions of clang, even those that don't support close.

46–52 ↗(On Diff #213699)

Why do we need this?

libomptarget/test/unified_shared_memory/close_modifier.c
6 ↗(On Diff #213699)

Same as above.

gtbercea marked 2 inline comments as done.Aug 9 2019, 7:37 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
46–52 ↗(On Diff #213699)

To initialize the device.

gtbercea marked an inline comment as done.Aug 9 2019, 7:56 AM
gtbercea marked an inline comment as done.Aug 9 2019, 8:57 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
6 ↗(On Diff #213699)

This is because of __tgt_register_requires not because of the other two methods.

gtbercea updated this revision to Diff 214391.Aug 9 2019, 9:34 AM
  • Remove requires manual call.
gtbercea marked 4 inline comments as done.Aug 9 2019, 9:36 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
6 ↗(On Diff #213699)

I believe clang-9 will work with unified shared memory. Because we are calling the tgt functions manually we can enable this test for clang-9 even if it tests the close functionality.

gtbercea added a comment.EditedAug 9 2019, 9:48 AM

@Hahnfeld
I would like to clarify something about tgt_register_requires call.

This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference:

===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ========
Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that.
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> RTL 0x000000004747b3e0 has index 0!
Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> Done registering entries!
===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ========

Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.

gtbercea marked an inline comment as done.Aug 9 2019, 9:52 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_enter_exit.c
6 ↗(On Diff #213699)

In this test, clang-9 needs to be unsupported because close is used explicitly in the code.

@Hahnfeld
I would like to clarify something about tgt_register_requires call.

This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference:

===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ========
Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that.
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> RTL 0x000000004747b3e0 has index 0!
Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> Done registering entries!
===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ========

Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.

I disagree, the information about requires directives only needs to be present when initializing the device, not when loading a library. I've posted D66019 to solve this and fix the test I mentioned in D65001.

libomptarget/test/unified_shared_memory/close_manual.c
46–52 ↗(On Diff #213699)

This should happen in __tgt_target_data_begin which can be a legitimate first directive in a program.

I disagree, the information about requires directives only needs to be present when initializing the device, not when loading a library. I've posted D66019 to solve this and fix the test I mentioned in D65001.

Ok I'll add back the calls to the register_requires functions.

gtbercea marked an inline comment as done.Aug 9 2019, 11:39 AM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
46–52 ↗(On Diff #213699)

Yes target data can be a first directive but if there's no target region in the program then the runtime library won't be loaded/initialized.

gtbercea updated this revision to Diff 214413.Aug 9 2019, 12:10 PM
  • Add back requires registration call.
Hahnfeld requested changes to this revision.Aug 9 2019, 12:30 PM

Can you please rebase on top of D66019? Sorry, that'll probably give you some conflicts :-/

libomptarget/test/unified_shared_memory/close_manual.c
46–52 ↗(On Diff #213699)

Ah, got the intention. Can you please add a comment explaining that we need at least one target region to load a "binary" and make all of the data calls work? And I don't think we need to check that device_alloc is correctly transferred, do we? There are other tests for that (plus casting 10 to void * looks evil anyway).

This revision now requires changes to proceed.Aug 9 2019, 12:30 PM
gtbercea updated this revision to Diff 214423.Aug 9 2019, 12:39 PM
  • Rebase.
gtbercea marked an inline comment as done.Aug 9 2019, 12:42 PM
gtbercea added inline comments.
libomptarget/test/unified_shared_memory/close_manual.c
46–52 ↗(On Diff #213699)

Of course, I'll add that.
Yeah I'll get rid of the check and the cast! :)

Hahnfeld accepted this revision.Aug 9 2019, 1:17 PM

LG after addressing the last minor nits (comment + cast + check).

This revision is now accepted and ready to land.Aug 9 2019, 1:17 PM
gtbercea updated this revision to Diff 214451.Aug 9 2019, 2:19 PM
  • Simplify target region.
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 2:37 PM