This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Initial implementation of OpenMP offloading library - libomptarget.
ClosedPublic

Authored by grokos on Oct 23 2015, 5:00 PM.

Details

Summary

This patches proposes an initial implementation of the libomptarget based on what we have working today in https://github.com/clang-omp.

I tried to follow what has been done in openmp/runtime in terms of the cmake build system.

The library sources are placed under libomptarget and it consists of three components:

  • device agnostic library (the interface that clang uses)
  • device plugins - right now we create a plug-in for CUDA-enabled devices, and a generic 64-bit plugin that works on powerpcBE/LE and x86_64 (mostly for testing purposes)
  • device runtime library for CUDA enabled devices.

The interface of the plugins and target agnostic library are documented in http://goo.gl/L1rnKJ.

I included the logic for testing based on llvm-lit and FileCheck. Currently there are only two regression tests that basically check that offloading failed. This is because the target side codegen (http://reviews.llvm.org/D12614) and driver support (http://reviews.llvm.org/D9888) are still under review in clang. Once that functionality is in clang I plan to include more tests to exercise the library.

In this patch, I tried to organize things based on what makes sense to me and also based on previous discussions with other OpenMP project contributors. I am happy to organize things in a different way if suggested to do so - the goal here is also open the discussion on how this library should be contributed to the the project.

Let me now any suggestions/comments you may have,

Thanks!
Samuel

Diff Detail

Event Timeline

sfantao updated this revision to Diff 38275.Oct 23 2015, 5:00 PM
sfantao retitled this revision from to [OpenMP] Initial implementation of OpenMP offloading library - libomptarget..
sfantao updated this object.
sfantao added reviewers: hfinkel, jlpeyton, jcownie, ABataev.

Great work!
Would it be possible to split this into multiple, smaller reviews (for main library, plugins and device runtime for nvptx)?

Greetings
Jonas

libomptarget/deviceRTLs/nvptx/src/parallel.cu
151–158 ↗(On Diff #38275)

Commented out - should this be committed?

libomptarget/deviceRTLs/nvptx/src/reduction.cu
71–117 ↗(On Diff #38275)

Commented or #if 0

157–178 ↗(On Diff #38275)

Not ready for shipping?

780–795 ↗(On Diff #38275)

#if 0

1001–1028 ↗(On Diff #38275)

some more #if 0

1239–1246 ↗(On Diff #38275)

#if 1?

sfantao updated this revision to Diff 38952.Nov 2 2015, 12:07 PM

Partition the original patch and do some code cleanup as suggested by Jonas.

The plugins patch is posted in http://reviews.llvm.org/D14253.

The device RTLs patch is posted in http://reviews.llvm.org/D14254.

sfantao updated this revision to Diff 39734.Nov 9 2015, 12:10 PM

Rebase and add libomptarget install rule that was missing in the previous diff.

I'm fine otherwise - but definitely not the one for any final decision!

libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
63–78

In previous versions you have been using PkgConfig - is there a particular reason that this was removed?

(I'm asking because RHEL 6 is using a rather weird path: /usr/lib64/libffi-3.0.5/include/ffi.h)

libomptarget/src/omptarget.cpp
198–199

auto and range-based loop?

228–229

auto and range-based loop?

258–259

auto and range-based loop?

1023–1030

Should this be committed for development?

sfantao updated this revision to Diff 41091.Nov 24 2015, 3:12 PM

Rewrite a few loops to use auto and iterate on ranges. Add hint entries to cmake for the detection of libffi following issue reported by Jonas Hahnfeld.

Hi Jonas,

Thanks for reviewing the patch!

libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
63–78

I was using the same logic I used for libelf and that was working well in the environments I tested this with. I restored the hint based mechanism so your problem may now be solved. Let me know if not.

Thanks for reporting the issue.

libomptarget/src/omptarget.cpp
198–199

Done!

228–229

I am using auto now but have to keep the iterator so that the entry in the map can be removed in the body of the loop.

258–259

Done!

1023–1030

I've removed these two debug calls.

Hahnfeld added inline comments.Nov 25 2015, 12:11 AM
libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
64–79

After looking at this quite some time and wondering why it didn't work: It's CMake, they use curly braces. With this small change everything builds correctly out-of-the-box, thanks!

sfantao updated this revision to Diff 41151.Nov 25 2015, 9:09 AM

Use curly braces in variables when looking for libffi.

Hi Jonas sorry for the mess up with the braces. It should be good now.

Thanks!
Samuel

libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
64–79

Oh, right, sorry, my bad.... The new diff fixes it.

sfantao updated this revision to Diff 41337.Nov 28 2015, 3:51 PM

Remove considerations about the format (e.g. ELF) of the target binary. The format is only going to be evaluated by the plugins. This makes the target agnostic part of the library to be really target agnostic.

I just came across an interesting problem...

libomptarget/src/omptarget.cpp
724–727

I think this assumption may be wrong: The standard defines that each device has an initial data environment (section 1.4.2).

Therefore a #pragma omp target update may be a first (and valid) statement which means that it has to be init here as well. Currently the library will in such a case return at this point and ignore the data transfer.

OpenMP says : " If the corresponding list item is not present in the device data environment then no assignment occurs to or from the original list item."

Device.IsInit being false can be one's implementation as the list item not present in the device data environment.

Ravi

OpenMP says : " If the corresponding list item is not present in the device data environment then no assignment occurs to or from the original list item."

Device.IsInit being false can be one's implementation as the list item not present in the device data environment.

Ravi

Sorry for not directly posting the code in question that I had in mind:

#include <stdio.h>

#pragma omp declare target
int a = 1;
#pragma omp end declare target

int main(int argc, char* argv[]) {
	a = 2;
	#pragma omp target update to(a)

	#pragma omp target
	{
		printf("a = %d\n", a);
	}

	return 0;
}

What would the expected result be? However I just noticed that yesterday's simple init() didn't change it to 2 because the code and global variables are still only mapped in the first omp target...

sfantao added a subscriber: sfantao.Jan 7 2016, 5:26 AM

Hi Jonas,

I think you are correct, we need to register the device and initialize it
for #target update and #target data directives not only the #target
directive. I'll send a fix for that soon.

Thanks,
Samuel

From: Jonas Hahnfeld <Hahnfeld@itc.rwth-aachen.de>
To: Samuel F Antao/Watson/IBM@IBMUS, hfinkel@anl.gov,

jonathan.l.peyton@intel.com, james.h.cownie@intel.com,
a.bataev@hotmail.com, chandlerc@gmail.com,
ravi.narayanaswamy@intel.com

Cc: f.brygidyn@samsung.com, Alexandre

Eichenberger/Watson/IBM@IBMUS, openmp-commits@lists.llvm.org,
Hahnfeld@itc.rwth-aachen.de, hyviquel@gmail.com, Kevin K
O'Brien/Watson/IBM@IBMUS, Carlo Bertolli/Watson/IBM@IBMUS,
andreybokhanko@gmail.com, sergos.gnu@gmail.com

Date: 01/07/2016 02:53 AM
Subject: Re: [PATCH] D14031: [OpenMP] Initial implementation of OpenMP

offloading library - libomptarget.

Hahnfeld added a comment.

In http://reviews.llvm.org/D14031#320528, @RaviNarayanaswamy wrote:

OpenMP says : " If the corresponding list item is not present in the

device data environment then no assignment occurs to or from the original
list item."

Device.IsInit being false can be one's implementation as the list item

not present in the device data environment.

Ravi

Sorry for not directly posting the code in question that I had in mind:

#include <stdio.h>

#pragma omp declare target
int a = 1;
#pragma omp end declare target

int main(int argc, char* argv[]) {
		 a = 2;
		 #pragma omp target update to(a)

		 #pragma omp target
		 {
		 		 printf("a = %d\n", a);
		 }

		 return 0;
}

What would the expected result be? However I just noticed that yesterday's
simple init() didn't change it to 2 because the code and global
variables are still only mapped in the first omp target...

http://reviews.llvm.org/D14031

Below some comments on the tests

libomptarget/test/CMakeLists.txt
29–35

Maybe we could later on think about sharing some testing setup between libomp and libomptarget...

65–68

I think this can safely assume the relative location of ../runtime/src

libomptarget/test/offloading/offloading_success.c
10

Maybe make this isHost = 1. clang 3.7.1 seems to completely ignore the target code, so omp_is_initial_device is never called...

libomptarget/test/offloading/offloading_success.cpp
10

likewise

jprice added a subscriber: jprice.Feb 5 2016, 9:18 AM
jbeyer added a subscriber: jbeyer.Feb 11 2016, 11:21 AM

Please take this comment as more a nitpick instead of having tested it

In general I see a lot of really dirty code - Why so many casts? What makes "long long" universally correct and portable.. are there standard types we can use to represent pointers?

https://docs.oracle.com/cd/E19683-01/806-6543/chapter3-10/index.html
http://stackoverflow.com/questions/25211015/using-x-to-print-the-hex-address-contained-in-a-pointer

uintptr_t
printf("%" PRIxPTR "\n", (uintptr_t) p);


Is it really a good idea to have two functions which are almost the same, but one is blocking and the other isn't?

tgt_target_data_begin_nowait vs tgt_target_data_begin

Would it not be "better" or more clean for it to take a parameter?

libomptarget/src/omptarget.cpp
349

Instead of casting to long long and friends - what about making the type uintptr_t ?

mkuron added a subscriber: mkuron.Mar 19 2016, 1:14 AM
sergos added inline comments.Mar 23 2016, 6:08 AM
libomptarget/src/omptarget.cpp
724–727

not quite relevant. the initial data environment should be initialized at the point of __tgt_register_lib() call that should prepend any data/control transfer to/from target. that's why we can say the target image was not initialized yet.

Hahnfeld added inline comments.Mar 23 2016, 7:39 AM
libomptarget/src/omptarget.cpp
724–727

I think the plan was to lazily initialize the device once it is really needed.

I support this because there may be more libs registered than actually used. So it would be wasted to initialize all of their devices and transfer data to them.

sfantao added inline comments.Mar 23 2016, 7:48 AM
libomptarget/src/omptarget.cpp
724–727

I agree with Jonas, we should allow the device to be initialized in target update so that we only copy the data for that device and not other devices that may not be used at all by the application. I'll fix this.

jleidel added a subscriber: jleidel.Apr 1 2016, 7:30 AM

This looks interesting. Will it ever be applied upstream?

hyviquel added inline comments.Sep 30 2016, 9:57 AM
libomptarget/src/omptarget.cpp
930

Why the offloading is continuing if the mapping of data already failed...
target_data_begin could return OFFLOAD_FAIL or OFFLOAD_SUCCESS to allow the offloading to stop earlier

981

you return OFFLOAD_SUCCESS even if something went wrong in target_data_end

grokos added a subscriber: grokos.Nov 29 2016, 3:39 PM
grokos commandeered this revision.Nov 30 2016, 2:18 PM
grokos added a reviewer: sfantao.
grokos added inline comments.
libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
64–79

So I'm marking this issue as "done".

libomptarget/src/omptarget.cpp
349

Done in the new diff.

724–727

Done in new diff.

930

I fixed that behavior in the new diff. If target_data_begin fails, the offloading stops.

981

Fixed.

libomptarget/test/CMakeLists.txt
29–35

Sure, let's keep this on our list for future implementation.

65–68

Actually, it's ../../runtime/src. I've hardcoded it as:

${CMAKE_CURRENT_BINARY_DIR}/../../runtime/src

libomptarget/test/offloading/offloading_success.c
10

With the current version of clang this should work. Let me know if I'm mistaken and I'll push another patch.

libomptarget/test/offloading/offloading_success.cpp
10

Same here...

grokos updated this revision to Diff 79814.Nov 30 2016, 2:25 PM

Addressed comments from previous revision plus:

  1. support for OpenMP 4.5 target map types
  2. implementation of target API functions (omp_target_*)
  3. protection of internal data structures against data races

This diff looks wrong, maybe it shows the changes compared to the last revision?

grokos updated this revision to Diff 79982.Dec 1 2016, 2:20 PM

My bad, I've upload the correct diff.

grokos marked 20 inline comments as done.Dec 1 2016, 2:33 PM
grokos added a reviewer: jhen.Dec 1 2016, 2:44 PM
hyviquel accepted this revision.Dec 2 2016, 4:26 AM
hyviquel edited edge metadata.
This revision is now accepted and ready to land.Dec 2 2016, 4:26 AM
Hahnfeld requested changes to this revision.Dec 3 2016, 9:14 AM

Full review and comments all over

libomptarget/Build_With_CMake.txt
102–115

This should probably go under NVPTX device RTL specific

libomptarget/CMakeLists.txt
96–97

There is LLVM_LIBDIR_SUFFIX which should probably be respected (compare to LIBOMP_LIBDIR_SUFFIX)

106–108

These are not part of this patch and should therefore not be added here

libomptarget/README.txt
60

Is that true? I remember there were some changes to the section names in the fatbinary?

libomptarget/src/omptarget.cpp
122–130

Could this reuse the assignment operator? Or otherwise do this in the initializer list

255–257

Please move this up, directly below struct DeviceTy

283

This is not in the standard, should this really start with OMP_?

284–285

can be collapsed

421–422

can be collapsed

494

Do we want to check device_is_ready here?

508–511

Should we have a list of all allocated addresses above?

513

Do we want to check device_is_ready here?

532–533

can be collapsed

538–539

likewise

910

bool ForceDelete?

1106

if no RTL was found?

1112–1136

Can't this be done immediately after the corresponding RTL was found the same way we register the image into the translation table? Maybe also refactor the code into another static function?

1114

I think this should either be Devices[FoundRTL->Idx + i] or *FoundRTL->Devices[i]?

1177

I think this should either be Devices[FoundRTL->Idx + i] or *FoundRTL->Devices[i]?

1193–1205

I think this is only needed once per desc and not per img, isn't it?

1354–1355

The Device should only be fetched after the device_id has been proved valid

1357–1377

This looks the same as device_is_ready, can this be reused?

1383–1384

can be collapsed

1393–1395

Backwards-compatible on the first check-in? That sounds weird...

2116

break afterwards?

2131

likewise?

libomptarget/src/omptarget.h
86

Maybe rename this to NumDeviceImages?

88–89

Can those be renamed to have Host in the same so that I'm not that much confused? :D

166

Only as data_end, not as data_update?

libomptarget/test/CMakeLists.txt
70–76
  1. This will abort CMake which probably isn't a good idea.
  2. MSVC is possible for standalone builds?
This revision now requires changes to proceed.Dec 3 2016, 9:14 AM
sfantao edited edge metadata.EditedJan 5 2017, 6:22 AM

George,

Thanks for the patch. Here are few comments following Jonas review.

libomptarget/CMakeLists.txt
38

Following Jonas suggestion, we can select the install suffix here like libomp does. Something like:

if(${LIBOMPTARGET_STANDALONE_BUILD})
  set(LIBOMPTARGET_ENABLE_WERROR FALSE CACHE BOOL
    "Enable -Werror flags to turn warnings into errors for supporting compilers.")
  # CMAKE_BUILD_TYPE was not defined, set default to Release
  if(NOT CMAKE_BUILD_TYPE)
    set(CMAKE_BUILD_TYPE Release)
  endif()
  set(LIBOMPTARGET_LIBDIR_SUFFIX "" CACHE STRING
    "suffix of lib installation directory e.g., 64 => lib64")
else()
  set(LIBOMPTARGET_ENABLE_WERROR ${LLVM_ENABLE_WERROR})
  # If building in tree, we honor the same install suffix LLVM uses.
  set(LIBOMPTARGET_LIBDIR_SUFFIX ${LLVM_LIBDIR_SUFFIX})
endif()

Note that we have LIBOMP_ENABLE_WERROR, that should be LIBOMPTARGET_ENABLE_WERROR.

96–97

Right, following what we selected before we can use lib${LIBOMPTARGET_LIBDIR_SUFFIX} instead of just lib.

libomptarget/README.txt
60

Good catch. The library is compatible with what is https://github.com/clang-ykt, we should probably use that instead. Unlike http://clang-omp.github.io/, it contains the latest Sema/CodeGen reimplementation that has been happening in the trunk.

Also, as Jonas mentioned, some section naming is different and the offload entries descriptors contain more information.

62

This is only true with the change in https://reviews.llvm.org/D28298 we need to make sure that is committed first.

libomptarget/test/CMakeLists.txt
70–76
  1. Ok, we should probably replace this by a warning instead of using an error.
  1. It should be possible. The reason of the error/warning is that it is untested. We try not to do anything that precludes MSVC builds - we also try to pave the way for that to happen by preparing some MSVC options as currently used by other components of LLVM and libomp. However, the goal of our contribution is to add support for linux machines which are the machines we and our users have access to. This is true for this library but is also true for the compiler support - it will need tuning/features to fully work on Windows. We expect someone with interest in having support for Windows to complete the work by contributing the features and testing infrastructure for that.
grokos updated this revision to Diff 84399.Jan 13 2017, 3:14 PM
grokos edited edge metadata.
grokos marked 31 inline comments as done.

Addressed more comments, awaiting new review.

libomptarget/Build_With_CMake.txt
102–115

We only have one instruction file for the whole library. After all, cmake is invoked once from libomptarget's root directory, all -D definitions are passed to the "root" cmake.

I am happy to revise this scheme if you think it makes more sense to split the instructions, although I strongly prefer the current implementation.

libomptarget/CMakeLists.txt
96–97

OK, I incorporated the changes in the new diff. Thanks for the suggestion!

106–108

OK, I'll make them part of the corresponding patches for plugins and RTLs.

libomptarget/README.txt
60

I updated the documentation in the new diff.

libomptarget/src/omptarget.cpp
122–130

It's not efficient to reuse the assignment operator. I've done this in the initializer list instead.

255–257

Done in new diff.

283

Our buildbot already uses this env var name, but I am happy to change it to whatever name you propose. However, I don't see any problem with OMP_, especially if there is any chance that future revisions of the standard introduce such an env var.

421–422

Done.

494

Right, thanks for pointing it out!

508–511

No, omp_target_is_present checks whether some host address has corresponding storage on the device, it's got nothing to do with memory allocated via omp_target_alloc. So, in order to check whether a host address has been mapped we just use the device's HostDataToTargetMap.

513

It's not necessary. omp_target_is_present only accesses the device's HostDataToTargetMap to determine whether an address has been mapped. If the device is not initialized, HostDataToTargetMap has zero contents and omp_target_is_present correctly returns false. The function does not make any call to RTL functions (like data_alloc or data_free), so even if the device has not been initialized there is no problem.

532–533

Done.

538–539

Done.

910

I've changed most instances where long served as a bool, i.e. variables IsNew, IsLast, Forcedelete, IsImplicit etc.

I also realized that long was used for array sizes, whereas the correct datatype should be int64_t (that's the datatype used by clang to communicate with the interface functions). I fixed that too.

1106

Thanks for spotting this.

1112–1136

Done. I refactored the code into a new static function which is called right after registering the image into the translation table.

1114

Correct, fixed in new diff.

1177

Fixed.

1193–1205

Correct.

1354–1355

Fixed.

1357–1377

Correct, I've reused the former function in the new diff.

1383–1384

Done.

1393–1395

Backwards compatible with internal, unreleased versions of clang. Oh well....

2116

Thanks for spotting this, obviously if offload fails at some point there is no need to continue with the loop.

2131

Done.

libomptarget/src/omptarget.h
86

You're right, some variable names caused confusion, I've changed them :)

88–89

Done.

166

Right, it's only data_end.

libomptarget/test/CMakeLists.txt
70–76

OK, I replaced the fatal error with a warning.

Only a minor comment for the code. And I think you may have missed some changes for the uploaded diff?

libomptarget/Build_With_CMake.txt
102–115

I just meant that these options should be moved a few lines below so that they are listed under the secion NVPTX device RTL specific in this document

libomptarget/CMakeLists.txt
38

LIBOMP_ENABLE_WERROR is not yet fixed

96–97

I don't see the suffix here, maybe you forgot to add some changes to this file?

libomptarget/README.txt
60

I don't see that neither

libomptarget/src/omptarget.cpp
283

Ok, let's keep it like that. But if that ever happens, we can't ensure any backwards compatibility...

513

Alright. I think we should then check for device_num < Devices.size() because operator [] will not perform any bounds check. Just in case the user supplies an invalid parameter?

1376–1377

I think this has not yet been collapsed...

1393–1395

Hmm, when is that planned for removal then?

libomptarget/test/CMakeLists.txt
70–76

I still see libomptarget_error_say here...

grokos marked 7 inline comments as done.Jan 16 2017, 5:12 PM

Jonas, thanks for the comments once again. It seems I had forgotten to add the CMakeLists files to the git commit. Here is the latest revision of the patch.

libomptarget/Build_With_CMake.txt
102–115

OK, I moved them.

libomptarget/src/omptarget.cpp
513

Good catch! I added a check.

1376–1377

Done.

1393–1395

It will be removed once clang starts using internally the new map types. As far as I know, we haven't started any work on that yet. The async interface is our next priority, upgrading clang to use the new map types will come afterwards.

grokos updated this revision to Diff 84614.Jan 16 2017, 5:15 PM
grokos marked 3 inline comments as done.

Took care of the last few comments.

Hahnfeld accepted this revision.Jan 17 2017, 12:09 AM

LGTM, thanks for your patience!

This revision is now accepted and ready to land.Jan 17 2017, 12:09 AM
This revision was automatically updated to reflect the committed changes.