This patch adds support for the close map modifier.
The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.
Differential D65340
[OpenMP][libomptarget] Add support for close map modifier gtbercea on Jul 26 2019, 10:55 AM. Authored by
Details This patch adds support for the close map modifier. The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.
Diff Detail
Event TimelineComment Actions I think this still needs some work.
Comment Actions I'd propose to mark the test close_modifier.c as unsupported with older compiler versions. Additionally, can you add a small, new test with manual calls to __tgt_target_data_begin / _end without relying on the compiler?
Comment Actions @Hahnfeld I have addressed your comments, please let me know if you have further comments. Comment Actions The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10. Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following: int *a = // malloc int *device; __tgt_target_data_begin(...); // with close modifier for a #pragma omp target data use_device_ptr(a) { device = a; } __tgt_target_data_end(...); // correspondingly to the begin call above // check... Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test. Comment Actions I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation? Comment Actions It adds coverage with older compiler versions. If it's too complicated, I'm also fine with adding a check for use_device_ptr and restrict the test with close modifier to newer versions of Clang, up to you. Comment Actions @Hahnfeld I have updated the tests, I added two new ones. Let me know if you have any further comments. Comment Actions @grokos I have updated this patch as suggested and have added new tests. Please let me know if you any further comments.
Comment Actions @Hahnfeld This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference: ===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ======== Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that. Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'... Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'! Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices! Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'... Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory! Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'... Target CUDA RTL --> Start initializing CUDA Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'! Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices! Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'... Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory! Libomptarget --> RTLs loaded! Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so! Libomptarget --> RTL 0x000000004747b3e0 has index 0! Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so! Libomptarget --> Done registering entries! ===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ======== Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.
Comment Actions
Comment Actions Can you please rebase on top of D66019? Sorry, that'll probably give you some conflicts :-/
|