Page MenuHomePhabricator

[OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin
ClosedPublic

Authored by kevinsala on Sep 21 2022, 4:10 PM.

Details

Summary

This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid.

The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources.

Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kevinsala added inline comments.Sep 23 2022, 12:21 PM
openmp/libomptarget/plugins-nextgen/cuda/dynamic_cuda/cuda.cpp
117 ↗(On Diff #462033)

Maybe we can reuse the same file from the original plugin so we don't need to apply fixes to both files.

kevinsala updated this revision to Diff 463406.Sep 27 2022, 8:03 PM

This update applies the reviewers' fixes and some other minor fixes. The main change is that now it uses the TgtError class that inherits from llvm::Error for returning and handling code errors.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h
42

Do you want to change it to StringRef for performance reasons (i.e., avoid the std::string's dynamic memory)?

Using StringRef here could be dangerous since we have no control on the string memory lifetime. This can be dangerous if a developer constructs a string on the stack, creates a GlobalTy variable, and passes that string as the name (e.g., as we do for the exec_mode). Thus, we would be adding an extra restriction on GlobalTy, where the user must guarantee that the string name has a longer lifetime than the GlobalTy object.

I would keep it as it is now, and if we see there is a performance penalty on using std::string here, we can change it.

jhuber6 added inline comments.Sep 27 2022, 8:09 PM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h
42

This is a huge patch so I can't really get a good view of the usage, but I figured that this will always point to a global inside the omp_offloading_entires section, which will guarantee we always have access to this memory as it's just a string constant in the binary itself. Does the new plugin interface add new constants not contained in the entry list?

When you update patches, make sure the diff is based off of the current state of upstream. Otherwise it's difficult to tell what is actually being added. In practice this is usually just git commit --amend when you change something.

openmp/libomptarget/include/Utilities.h
36

Does this really add enough functionality to justify a new polymorphic type? All this seems to do is turn toString and consumeError into member functions rather than a free function.

kevinsala added inline comments.Sep 27 2022, 8:26 PM
openmp/libomptarget/include/Utilities.h
36

This is mainly for these two reasons:

  1. TgtError Err creates a checked success. I didn't find an easy way to do it with llvm::Error: a) the default ctor is protected and b) Error Err = Error::success() is a success but it must be checked. Is there an easy way to achieve the same?
  2. consumeand consumeString is an attempt to hide some syntax that does't add much information to the code reader.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h
42

There is an example of the usage in line 326 in openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp. In that case, the global name is constructed from the kernel name + "_exec_mode", it's just a temporary string, and then it's safely copied to the GlobalTy's Name member. We could move the string construction outside the GlobalTy constructor, and pass its StringRef to the ctor, but it seems dangerous from my point of view. It wouldn't be difficult to break this code at the slightest distraction.

Drive by comments, @jhuber6 and @tianshilei1992 should give more input. I want this to eventually get in and I'll look over at some point. Before let's get as much of it right as possible. We'll have time to adjust later too.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp
38–40
45

Add messages for each assert.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h
42

I doubt this is helpful wrt performance. I also doubt we ever would use "stack" (or short lifetime) strings. Both of this said, (and thinking I might have place the "NOTE" there in the first place), I'd keep it as is, this will never hit our profiler either way, but this is the "conservative" version" after all.

106

In general I'm with @jhuber6 here. If we don't really think we need persistent iterators we should not add expectations.

@jhuber6 @tianshilei1992 Can we merge this and address changes in-tree?

openmp/libomptarget/include/Utilities.h
58

Moving *this, couldn't that invalidate *this? If we know this is save, add a comment about it.

Looks okay in general. Please address some of the remaining comments or mark them as done.

openmp/libomptarget/include/Utilities.h
36

Needing to check successes is a design feature, generally you use llvm::Error.

if (Error E = canError())
  handleError(E);

I'm guessing this also skirts around the [[nodiscard]] that llvm::Error uses? If possible I'd like to retain those semantics. For a success, it only needs to be converted to bool to be checked. consumeError in general is a hack so I wouldn't encourage it if possible.

167

Given that this function can fail, I would prefer using LLVM's style for classes that can return errors https://llvm.org/docs/ProgrammersManual.html#fallible-constructors. Essentially, construction is done from a static function that calls a private constructor. I'm not sure how much more effort this will be. Ideally we should have each function return an Error or Expected and then finally handle / check that at the exported library functions.

Sorry if I'm being obsessive with this, one of my grievances with libomptarget is that it didn't follow the LLVM styles so I figure that now is a good chance to change that.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp
57

Again, these should be changed to an error or debug message. Info is for user diagnostics during runtime.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
173–175

We should be able to use C++17's structured bindings.

177

We should probably just use a unique_ptr here to avoid the explicit new and delete.

551

Nit. can remove a lot of these braces below.

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
246

A lot of these can be a single if (Err = ... like above.

kevinsala added inline comments.Oct 14 2022, 12:57 PM
openmp/libomptarget/include/Utilities.h
36

I'm removing the TgtError class and using llvm::Error directly. In the following case (it's just an example among others), the usual procedure should be calling handleError(std::move(Err)) just after creating the temporary Err instance (not actually used)? Or is there another cleaner way?

Error Err = Error::success();
// handleError(std::move(Err));
                                                                               
// Transfer the data from the source to the destination.                       
if (Device2Host)                                                               
  Err = Device.dataRetrieve(HostGlobal.getPtr(), DeviceGlobal.getPtr(),        
                            HostGlobal.getSize(), nullptr);                    
else                                                                           
  Err = Device.dataSubmit(DeviceGlobal.getPtr(), HostGlobal.getPtr(),          
                          HostGlobal.getSize(), nullptr);                      
                                                                               
DP("%s %s %u bytes associated with global symbol '%s' %s the device "          
   "(%p -> %p).\n", (Err) ? "Failed to" : "Successfully",                      
   Device2Host ? "read" : "write", HostGlobal.getSize(),                       
   HostGlobal.getName().data(), Device2Host ? "from" : "to",                   
   DeviceGlobal.getPtr(), HostGlobal.getPtr());                                
                                                                               
return Err;
jhuber6 added inline comments.Oct 14 2022, 1:10 PM
openmp/libomptarget/include/Utilities.h
36

I would probably write this more like the following:

// Transfer the data from the source to the destination.                       
if (Device2Host)                                                               
  if (llvm::Error Err = Device.dataRetrieve(HostGlobal.getPtr(), DeviceGlobal.getPtr(),        
                                                                     HostGlobal.getSize(), nullptr))
    return Err;                   
else if (llvm::Error Err = Device.dataSubmit(DeviceGlobal.getPtr(), HostGlobal.getPtr(),          
                                                                         HostGlobal.getSize(), nullptr);                      
    return Err;                                                                                 
DP("%s %s %u bytes associated with global symbol '%s' %s the device "          
   "(%p -> %p).\n", "Successfully",                      
   Device2Host ? "read" : "write", HostGlobal.getSize(),                       
   HostGlobal.getName().data(), Device2Host ? "from" : "to",                   
   DeviceGlobal.getPtr(), HostGlobal.getPtr());                                
                                                                               
return Error::Success();

Generally, llvm::Error is used to indicate an unrecoverable problem. When we encounter one we immediately return and then the top of the stack will handle and report the error. For libomptarget plugins this will probably involve the __tgt_rtl_ functions printing the contents of the message via toString.

Occasionally you can ignore errors, but that should be reserved for cases where you know that you can continue without it (We do this in a few places).

jdoerfert added inline comments.Oct 14 2022, 1:49 PM
openmp/libomptarget/include/Utilities.h
36

I'm OK with using auto Err = in such conditionals. The type is clear.

kevinsala updated this revision to Diff 468213.Oct 17 2022, 8:30 AM
kevinsala set the repository for this revision to rG LLVM Github Monorepo.

This update fixes the comments of the reviewers. The major changes with respect to the latest version is the removal of TgtError class and use of llvm::Error directly. It also replaces the std::unordered_map with llvm::DenseMap and removes the map's lock too. Additionally, the patch adds a few changes needed by the future "nextgen" AMD plugin, now under development.

If these changes are fine by the reviewers, I'll update the patch with the rebased version, so it can be merged.

kevinsala marked an inline comment as done.Oct 17 2022, 8:31 AM
kevinsala added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h
106

I've switched to llvm::DenseMap and removed the map lock.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
177

We need to destroy the memory manager in GenericDeviceTy::deinit() in a specific order with respect to other device components. The memory manager will try to deallocate its allocations and we need the device resources still alive to do that. I left a comment in line 182 explaining that issue.

kevinsala marked 32 inline comments as done.Oct 17 2022, 8:36 AM
jhuber6 added a comment.EditedOct 17 2022, 12:42 PM

Thanks for the changes, can you update the diff to show the difference against upstream?
Scratch that, seems I had an older one in my viewer.

kevinsala updated this revision to Diff 468543.Oct 18 2022, 8:00 AM

This is updated version, rebased on the recent commit 7caae244730eb49b9e3f632225f53902fd956940.

kevinsala updated this revision to Diff 468749.Oct 18 2022, 5:13 PM
kevinsala set the repository for this revision to rG LLVM Github Monorepo.

Updating the dynamic_cuda/cuda.h file to define CUDA_ERROR_TOO_MANY_PEERS and CU_DEVICE_INVALID. The building was failing when no CUDA installation is provided.

jhuber6 accepted this revision.Oct 20 2022, 12:25 PM

LG overall with a few more nits.

I tested the x86_64 plugins locally and it works well. I really like how much cleaner the interfaces and error handling are overall. Thanks for sticking with this.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
163

Why do we use int32_t for the loop trip count when the interface uses uint64_t? I think it's totally possible someone would have a trip-count > 3 billion or so.

openmp/libomptarget/plugins-nextgen/cuda/CMakeLists.txt
93

Just FYI, I made a change to this on all the plugins in D136365 that should be copied here. This prevents the plugins from having their symbols preempted by other plugins. This is the safest approach as each plugin defines the same symbols but with different values.

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
358

if the debugging message fires we move the Error twice here.

453–478

We should probably just create a new Error for each line here. Either that or replace it with if ((Err = setContext())) or else the linters will complain.

863–867

Nit: no else after return.

openmp/libomptarget/plugins-nextgen/exports
4–38

I changes the exports file upstream recently to use a wildcard. It should be easier this way.

openmp/libomptarget/plugins-nextgen/generic-elf-64bit/src/rtl.cpp
29

In the previous implementation, we used some definitions so that the debug messages indicated which plugin was executing. Right now I just see messages like this.

PluginInterface --> Launching kernel __omp_offloading_10302_4741dfd_main_l5 with 1 blocks and 1 threads in Generic mode

I'm thinking we should be able to add a name for each subclass of the generic plugin to use for the debugging messages. I think it would also be nice to move away from the DP( macros if possible.

This isn't really high priority so I wouldn't worry about it for now.

31

Unrelated, but does anyone know why we still set this?

This revision is now accepted and ready to land.Oct 20 2022, 12:25 PM
kevinsala added inline comments.Oct 22 2022, 5:58 PM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
163

The loop trip count can decide the number of blocks/groups launched by a kernel. In CUDA and HSA, these parameters are unsigned intand uint32_t, respectively, and they are actually limited by the corresponding max blocks/groups properties of the device. So, although the program can generate a larger loop trip count, we will have to adapt the value at some point before the kernel launch. In this patch, we adapt the loop trip count to be the minimum between the original count and the int32_t's maximum. The original plugins just cast the original uint64_t value to unsigned int/int.

Maybe we could continue using the method of this patch but emitting a debug message to notify the user that the loop trip count was changed.

This version fixes the last issues commented by @jhuber6.

kevinsala marked 4 inline comments as done.Oct 23 2022, 10:35 PM

I haven't reviewed this but the following comment stuck out:

where some threads may be consulting an already inserted ELFObjectFile (without holding the map lock) while another potential thread may be inserting a new one (with the lock acquired).

It's never safe to have one thread reading from a hashtable while another writes to it. Even if the keys were always different there is table resizing and hash collisions to worry about.

Either we could use a persistent structure and CAS the new one in place (bit unusual for this codebase) or a single writer multi reader lock, such that multiple readers can make progress at a time but the whole world grinds to a halt for each writer.

I haven't reviewed this but the following comment stuck out:

where some threads may be consulting an already inserted ELFObjectFile (without holding the map lock) while another potential thread may be inserting a new one (with the lock acquired).

It's never safe to have one thread reading from a hashtable while another writes to it. Even if the keys were always different there is table resizing and hash collisions to worry about.

Either we could use a persistent structure and CAS the new one in place (bit unusual for this codebase) or a single writer multi reader lock, such that multiple readers can make progress at a time but the whole world grinds to a halt for each writer.

My comment about multiple threads reading the ELF object map was referring to my initial version. In that version, the std::unordered_map was protected with a mutex for calling find and insert operations (could have been improved with a readers-writers lock). Once the reference to a map's element is retrieved, the threads (i.e. mutex no longer acquired) should be able to read safely the element at the same time another thread is inserting to the map (i.e. mutex acquired). However, it seems that the synchronization was not needed since the loading of images (where we actually access the ELF object map) is already serialized, so we removed the lock and switched to llvm::DenseMap.

kevinsala added inline comments.Oct 24 2022, 7:40 AM
openmp/libomptarget/plugins-nextgen/generic-elf-64bit/src/rtl.cpp
29

I agree. That's something I had as TODO, but as you commented, it's not a high priority right now. We could probably switch to C++ templated functions for debug/info/report, automatically adding the prefix of the running plugin, and stop using the TARGET_NAME and DEBUG_PREFIX macros in these new plugins.

jhuber6 added inline comments.Oct 24 2022, 10:06 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
163

I think it would be better to do the narrowing in the implementation rather than the interface. That way we don't need to do this cast in every plugin and can have a single comment explaining why.

kevinsala set the repository for this revision to rG LLVM Github Monorepo.

This version fixes the types of some variables, such as LoopTripCount.

kevinsala marked an inline comment as done.Oct 24 2022, 11:35 PM
kevinsala added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
163

I agree it's better to use uint64_t directly for the loop trip count. I fixed that in the last version. I also changed some other fields to be unsigned (num threads, dynamic memory size, etc).

jdoerfert accepted this revision.Oct 25 2022, 11:12 PM

LG, it might be easier if @jhuber6 could commit this for @kevinsala. We also need to get you commit access though.

LG, it might be easier if @jhuber6 could commit this for @kevinsala. We also need to get you commit access though.

Sure.

kevinsala updated this revision to Diff 470827.Oct 26 2022, 9:05 AM
kevinsala marked an inline comment as done.

This version has been rebased onto a recent main branch commit. Also, it fixes an error in GenericDeviceTy::dataAlloc that produced two memory allocations per dataAlloc call. Seems that I introduced that bug in one of the intermediate revisions.

If there are no more comments, I guess we can already merge it. I'd appreciate if someone can push it to the repo for me. Thanks for the reviews!

kevinsala marked 2 inline comments as done.Oct 26 2022, 9:10 AM

The diff doesn't apply, a few of the indexes seem to be wrong. Can you create it using git diff HEAD~1 -U999999 from the root directory?

kevinsala updated this revision to Diff 471194.Oct 27 2022, 9:34 AM

@jhuber6 This is the patch generated as you commented. I tried to apply to main and applied with no conflicts. If it reports any error, please share the output and I'll fix it. Thanks!

@jhuber6 This is the patch generated as you commented. I tried to apply to main and applied with no conflicts. If it reports any error, please share the output and I'll fix it. Thanks!

I'm still getting errors for some reason, mostly it just saying files don't exist in the index like it's skipping one of the directories. not sure why.

$ cd ./llvm-project
$ wget https://reviews.llvm.org/file/data/nnntgb4sunusqtlkeodo/PHID-FILE-47setcy3uz3f4swq6nko/D134396_new_.diff
$ git apply https://reviews.llvm.org/file/data/nnntgb4sunusqtlkeodo/PHID-FILE-47setcy3uz3f4swq6nko/D134396_new_.diff --whitespace=fix
error: include/llvm/Frontend/OpenMP/OMPGridValues.h: No such file or directory
error: libomptarget/CMakeLists.txt: No such file or directory
error: libomptarget/include/rtl.h: No such file or directory
error: libomptarget/plugins/cuda/dynamic_cuda/cuda.h: No such file or directory
error: libomptarget/src/rtl.cpp: No such file or directory
error: libomptarget/test/lit.cfg: No such file or directory

Can you try just git format-patch HEAD~1 and email it to me at joseph.huber@amd.com?

This revision was landed with ongoing or failed builds.Oct 27 2022, 11:10 AM
This revision was automatically updated to reflect the committed changes.