This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Build cuda plugin without cuda installed locally
ClosedPublic

Authored by JonChesterfield on Jan 21 2021, 11:21 AM.

Details

Summary

[libomptarget] Build cuda plugin without cuda installed locally

Compiles a new file, plugins/cuda/dynamic_cuda/cuda.cpp, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.

This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.

The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Jan 21 2021, 11:21 AM
Herald added a project: Restricted Project. · View Herald Transcript
openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
87

cuInit is called by deviceRTL's global constructor, so might be fine as-is. I'm not sure we're destroying the cuda dynamic library at present

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
14

Contents from D95104, with function pointer replaced with prototype.

Interesting. Should we do this instead of the pure macro stuff? Is this going to break if the argument type doesn't match the declaration parameter type, e.g.,

void f(int);

f(0.0);

Interesting. Should we do this instead of the pure macro stuff? Is this going to break if the argument type doesn't match the declaration parameter type, e.g.,

void f(int);

f(0.0);

void f(int); translates to void f(int x) { return some_trait::call(x); }, the generated function has the same type as the symbol it derived from. So I think calls all work out.

Just finished getting amdgpu to the point where it builds, interesting part renders as:

DLWRAP(hsa_amd_memory_fill, 3);
DLWRAP(hsa_amd_register_system_event_handler, 2);

DLWRAP_FINALIZE();

static const char *HSALib = "libhsa-runtime64.so";

hsa_status_t HSA_API hsa_init() {
  void *dynlib_handle = dlopen(HSALib, RTLD_NOW);
  if (!dynlib_handle) {
    return HSA_STATUS_ERROR;
  }

  size_t N = dlwrap::size();
  for (size_t i = 0; i < N; i++) {
    const char *sym = dlwrap::symbol(i);
    void *p = dlsym(dynlib_handle, sym);
    if (p == nullptr) {
      return HSA_STATUS_ERROR;
    }
    *dlwrap::pointer(i) = p;
  }

  return dlwrap_hsa_init();
}

hsa_status_t HSA_API hsa_shut_down() {
  hsa_status_t res = dlwrap_hsa_shut_down();

  void *dynlib_handle = dlopen(HSALib, RTLD_NOW);
  if (dynlib_handle) {
    dlclose(dynlib_handle);
    dlclose(dynlib_handle);
    return res;
  } else {
    return HSA_STATUS_ERROR;
  }
}

which means the dlopen'ed version behaves the same as the statically linked one, doing the setup & teardown through the same API hook, albeit with more failure modes.

  • fix copy/paste error, support higher arity functions
JonChesterfield edited the summary of this revision. (Show Details)Jan 21 2021, 3:38 PM
JonChesterfield edited the summary of this revision. (Show Details)
  • fix copy/paste error, support higher arity functions
  • rebase, tidy
JonChesterfield retitled this revision from [libomptarget] WIP build plugin without cuda.h to [libomptarget] Build cuda plugin without cuda installed locally.Jan 21 2021, 3:53 PM
JonChesterfield edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Jan 21 2021, 9:36 PM
openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
91

I would feel better if we

  1. use some mechanism to avoid racing on the checkForCUDA call, e.g., call once and a boolean global as in D95104.
  2. Don't overload cuInit but instead call it from the plugin constructor, or make it a constructor.

If 2) has inherent benefits we cannot get any other way, I guess we can keep it. I'm worried it might clash with a cuda otherwise linked into the application or confuse people, or something else bad.

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
103

@tra suggested to name the declarations as they are actually named, e.g., _v2. I don't have strong feelings either way, this is kinda messy and weird no matter how we name them.

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
91

This is called once, from a global constructor, so is correct as-is. I'll add the locking now though so that can't cause trouble later.

The cuInit and other symbols here aren't visible outside of the plugin (there's an exports list applied when linking it). Otherwise yeah, that would be bad.

Doing the extra setup in this cuInit means the RTL.cpp doesn't know whether it's linked against cuda or against this shim. This way it is an implementation of cuda, with the implementation detail that it forwards everything to a .so with the right name.

I think that's much better than passing in a C macro or adding a weak function. It might be a surprising solution to the problem.

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
103

Don't mind either way, will rebase this if RTL.cpp is changed to use the v2 functions. Change seems orthogonal.

openmp/libomptarget/plugins/cuda/CMakeLists.txt
25

This probably warrants a variable to force one choice or the other, even if cuda is available on the system.

Also, not sure what the default should be when either option is available.

Pretty much good to go. CMAKE variable, one potential change in the comment below, maybe name the declarations _v2.

openmp/libomptarget/plugins/cuda/CMakeLists.txt
25

+1 for variable, default to use installed CUDA for now.

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
64

Should we make this a global, or a CMAKE variable?

  • address review comments
JonChesterfield marked 2 inline comments as done.Jan 22 2021, 9:28 AM
JonChesterfield added inline comments.
openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
64

Some user control over this is probably good, e.g. people might want to load a specific libcuda with several installed, and passing a path would allow that.

Simple approach is probably a macro that can be passed in from cmake to override the default

jdoerfert accepted this revision.Jan 22 2021, 12:13 PM

One nit wrt. _v2 not _v2, otherwise LGTM.

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
103

I think the idea was to define it with the right name, thus _v2, but use it with the fake one, so not _v2.

This revision is now accepted and ready to land.Jan 22 2021, 12:13 PM
This revision was landed with ongoing or failed builds.Jan 22 2021, 4:15 PM
This revision was automatically updated to reflect the committed changes.
JonChesterfield marked an inline comment as done.