This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Look for plugins next to libomptarget.so
AbandonedPublic

Authored by JonChesterfield on May 6 2021, 7:39 PM.

Details

Summary

Currently libomptarget.so finds plugins through LD_LIBRARY_PATH. This
patch changes the logic to look in the same directory as libomptarget.so.

This makes no change to current behaviour if libomptarget.so is found by
LD_LIBRARY_PATH (as the plugin would be found next to it the same way)
but enables finding libomptarget.so by --runpath instead.

It's not totally NFC in that someone might have moved one of the libraries
from the default location in the install or test tree, but in that case they're
debugging or doing their own packaging thing already so this shouldn't be
a hindrance.

Main benefit is that an executable which is linked with runpath to find
libomptarget.so, without relying on environment variables, will now be
able to find the plugins. A quirk of how dlopen works from shared libraries
prevents that working today.

Side benefit is increased certainty over which plugin is found from a given
libomptarget. We could make the dladdr lookup a hard failure if we prefer
total certainty.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.May 6 2021, 7:39 PM
JonChesterfield created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2021, 7:39 PM

See D73657 D87413 for similar approaches to the same problem. dladdr appears to be available on more systems than dlinfo and link.h, and means we don't need to know what libomptarget.so was called (e.g. if it has trailing .1 versioning).

The failure mode here is dladdr() returns zero, findLibomptargetDirectory returns "" and verbose_dlopen will be called twice on the same Name. That looks cleaner than the extra test for if dladdr succeeded.

JonChesterfield added inline comments.May 6 2021, 7:45 PM
openmp/libomptarget/src/rtl.cpp
72

Would rather use llvm filesystem here but that patch hasn't landed yet. Should we use std::filesystem?

I don't think dlopen works on windows anyway so one more unix convention in this file seems acceptable.

JonChesterfield edited the summary of this revision. (Show Details)May 25 2021, 3:17 PM

It turns out some systems have lots of LLVM compilers installed, and try to use LD_LIBRARY_PATH to control which one is picked up.

I can see zero advantage to accidentally using a plugin that comes from one compiler with a libomptarget from another one. It seems a direct path to confusion in debugging. If someone wants to assemble a mashup locally, they're still totally free to do so, using cp/ln etc.

I think we should always use the plugin which is located right next to libomptarget, and use the current machinery for finding a given libomptarget.

  • Simplify, rename variables
JonChesterfield edited the summary of this revision. (Show Details)Aug 26 2021, 9:12 AM
JonChesterfield edited the summary of this revision. (Show Details)
JonChesterfield edited the summary of this revision. (Show Details)
openmp/libomptarget/src/rtl.cpp
72

Tried it locally - turns out the support lib containing sys::path isn't linked into libomptarget at present, and changing to that + llvm's dynamic library makes the code quite a lot longer. Suggest we stick with raw dl calls as that's used elsewhere in the library.

97

If the dladdr failed (which I'm pretty sure it can't) this will gracefully fall back to the unadorned plugin name which will be found (or not) via LD_LIBRARY_PATH as before

105

Could skip the loading if the plugin path is empty. I can't think of an instance where dladdr on ourself will fail but it makes for clearer debugging if we load no plugin at all in such a case instead of finding some other one through library search paths.

  • raise dladdr failure to fatal

Reconsidered the merits of trying to continue with plugin loading if dladdr on self failed. That would be a symptom of something going badly wrong, in which case certainty over which plugin is loaded is a feature and going in search of some other one is a bug.

jdoerfert added inline comments.Aug 26 2021, 10:45 AM
openmp/libomptarget/src/rtl.cpp
107

This doesn't make sense now as it will fail to load plugins if we failed to determine the path, it also fails to prefer plugins on the user path. Or am I reading this wrong?

The phab comment no longer makes sense but the code is right I think. The semantic change here is to ignore the various dynamic search mechanisms when looking for the plugin in favour of using the plugin that is next to libomptarget. That means whatever method found libomptarget also specifies the plugin, even if it was one of the methods that doesn't work transitively.

If dladdr((void *)&__tgt_register_lib.. fails, this path gives up. I don't know of a reason why it would fail, but if something is going wrong there, trying to find some other plugin instead seems unlikely to help the user debug it.

The control flow would be slightly more obvious here if the loop was unswitched, but the indentation change will hurt readability of the diff. I'd like to put if (var == "") in front of the loop as a follow up NFC patch.

The phab comment no longer makes sense but the code is right I think. The semantic change here is to ignore the various dynamic search mechanisms when looking for the plugin in favour of using the plugin that is next to libomptarget. That means whatever method found libomptarget also specifies the plugin, even if it was one of the methods that doesn't work transitively.

I c. This needs discussion Wednesday though. I am not sure there is a problem changing the default behavior but I want to hear out others.

If dladdr((void *)&__tgt_register_lib.. fails, this path gives up. I don't know of a reason why it would fail, but if something is going wrong there, trying to find some other plugin instead seems unlikely to help the user debug it.

If this should not fail, please make it a proper error output. There is no point in it silently not finding plugins and people being all confused about it.

The control flow would be slightly more obvious here if the loop was unswitched, but the indentation change will hurt readability of the diff. I'd like to put if (var == "") in front of the loop as a follow up NFC patch.

The select in the loop is not necessary, if no path for libomptarget was found, error message and out.

I c. This needs discussion Wednesday though. I am not sure there is a problem changing the default behavior but I want to hear out others.

OK, Wednesday it is. The use case is essentially machines that have N llvm toolchains installed, some in system directories, some accessed by cluster module systems. As it stands it is really easy to have libomptarget.so pick up a totally unrelated plugin, and because it's via dlopen and the print just gives the library name, working out that this is the problem would take a while to debug. Probably at the point where people are running strace on their application.

Changing to this patch retains the current control over libomptarget.so but eliminates that failure mode.

If this should not fail, please make it a proper error output. There is no point in it silently not finding plugins and people being all confused about it.

Fair enough, will do so. I'll find an analogous case in libomptarget and copy that.

gregrodgers requested changes to this revision.Aug 26 2021, 1:19 PM
gregrodgers added inline comments.
openmp/libomptarget/src/rtl.cpp
107

I agree here. instead of "nullptr", it should be "dlopen(Name, RTLD_NOW)" This will use the LIBRARY_PATH search if findLibomptargetDirectory could not find the library.

I agree with the minor change in behavior to get the plugins from the same directory as libomptarget.so first and if not found (libomptarget is statically linked), then do the LIBRARY_PATH search.

This revision now requires changes to proceed.Aug 26 2021, 1:19 PM

The dlopen search path it tries instead may have no relation to how the application found libomptarget. It'll cheerfully use one from a totally unrelated toolchain. That seems unconditionally bad to me.

JonChesterfield abandoned this revision.Sep 1 2021, 10:49 AM

D109071 adds runpath=$ORIGIN to libomptarget.so, which means it'll look at LD_LIBRARY_PATH, and if that fails, use the one next to it. That's equivalent to the consensus above therefore abandoning this patch