This is an archive of the discontinued LLVM Phabricator instance.

Support/CommandLine: replace argument mapping error with a warning
AbandonedPublic

Authored by labre on Jan 4 2023, 4:39 PM.

Details

Summary

This is a “fix something while breaking other stuff” kind of change. So it probably should not be merged. I’m bringing this up as a reminder that CommandLine needs a major rework.

On to the rationale:

The comment below that change seems to indicate, that CommandLine inconsistencies are strictly unrecoverable. However this is not the case, if two libraries link dynamically to the same libLLVM version. Applying this patch results in error-free operation of these cases. This most commonly occurs in mesa, if two libraries link to libLLVM, e.g. the Clover and RocM OpenCL backends or the radeonsi VAAPI driver, LLVM aborts the application using them.

Consequently, having both Clover and RocM installed leads to a complete breakage of OpenCL device enumeration, as outlined in the Gentoo Wiki. [1] There is also the existing use case of mpv with vapoursynth support, using a plugin like SVP [2] to post process hardware decoded video. Other use cases in that context come into mind, like video editing software.

So we have valid non-breaking use cases while for now a fatal error has been reported. Changing this into a warning definitely will break other things, so this can merely serve as a temporary solution until better and more specific error handling has been fleshed out.

Since this only is reproduced by the lib linked twice or more often, I struggle to create a regression test. Python is an interpreted language, so I can not influence the linkage and I doubt, that importing LLVM twice would cause that. I’d appreciate any hints on that. On the other hand I also do not expect this to be merged like this.

Finally, I should add, that I’m unable to convince Gentoo ebuilds to link against LLVM 16.x git, so I could only test this with 15.x git. However the patch applies to both branches, so this should not matter too much.

This probably should be discussed broader in a RFC, but I’m hesitating to create one, because I can’t really think of solutions, mostly, because I’m neither familiar with C++ nor LLVM. As an end user I can just suggest two things for the rework:

  1. (Somehow) detect and print clearly, when a LLVM version mismatch occurs.
  2. Print the library names linking against libLLVM.

This would greatly simplify handling these issues for distributors (and it would spared me from spending 3 days until I found the root cause), but it most likely does not solve the underlying problem.

[1]: https://wiki.gentoo.org/wiki/OpenCL#AMD
[2]: https://www.svp-team.com/wiki/SVP:Linux
Related bugs:
[3]: https://github.com/llvm/llvm-project/issues/23326
[4]: https://github.com/llvm/llvm-project/issues/29935

Diff Detail

Event Timeline

labre created this revision.Jan 4 2023, 4:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 4:39 PM
labre requested review of this revision.Jan 4 2023, 4:39 PM
Anastasia requested changes to this revision.Jan 6 2023, 7:19 AM

This doesn't look like a solution. Can you please explain what problem are you trying to solve:

  1. Linking multiple LLVM libraries together.
  2. Linking multiple LLVM libraries with different LLVM versions?

Any log of error would be helpful too.

Generally this needs to be discussed in wider RFC on discorse or something. However I do remember that this issue came up before so it might be possible to revive some existing discussion...

This revision now requires changes to proceed.Jan 6 2023, 7:19 AM
labre added a comment.EditedJan 6 2023, 5:02 PM

Yes, of course. The problem encoutered is, that loading two libraries via an OpenCL ICD loader [1,2] linking to the same libLLVM triggers the CommandLine error exactly at the revised code lines. I suspect, that the general problem is an abort by libLLVM when loading more than one library linking to any version including the same version. The problem I’m trying to solve, is loading two libraries linking to the same version. This patch obviously does not restrict to loading just the same version, so it has to be extended in that regard.

It should be noted, that compiling exactly one of the libraries with the revised LLVM version removes the warning/error entirely, for instance if mesa is compiled with -Dgallium-opencl=yes (Clover) and an unpatched LLVM and RocM is compiled with the revised LLVM there will be no error or warning printed. So both versions have to be compiled either with or without the change for an error/warning to occur.

Mesa+RocM:

$ ldd /usr/lib64/libMesaOpenCL.so.1.0.0
...
            libLLVM-15.so => /usr/lib/llvm/15/lib64/libLLVM-15.so
...
ldd /usr/lib64/libamd_comgr.so.2.4.0
...
            libLLVM-15.so => /usr/lib/llvm/15/lib64/libLLVM-15.so
...

With revision:

$ clinfo
warning: mesa: CommandLine Error: Option 'h' registered more than once!
Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 22.2.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP.dbg (3486.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback
  Platform Extensions function suffix             AMD
  Platform Host timer resolution                  1ns
...

Without revision:

$ clinfo
mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted

The same applies to using hardware decoding via mpv with the Vapoursynth plugin SVP, which just loads OpenCL functions via the ICD loader. There is no static linking against libLLVM involved. Otherwise the revision could not have resolved it, because the statically linked libLLVM would have triggered the error instead.

$ ldd /usr/lib64/va/drivers/radeonsi_drv_video.so
...
libLLVM-15.so => /usr/lib/llvm/15/lib64/libLLVM-15.so
...

RocM linking see above.

With revision:

$ mpv https://www.youtube.com/watch?v=VbFqA9rvxPs
(+) Video --vid=1 (*) (vp9 3840x2160 23.976fps)
 (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz)
File tags:
 Uploader: LLVM
 Channel_URL: https://www.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ
Using hardware decoding (vaapi-copy).
AO: [alsa] 48000Hz stereo 2ch float
VO: [gpu] 3840x2160 nv12
[xrandr] output DisplayPort-0 mode=5120x1440 old rate=60.00 refresh rates =    119.97 + 120.05    96.04    72.00    60.00    50.00    48.01   100.00    60.00*   59.98    30.00    25.00    24.00    23.98  scale =  1
[xrandr] container fps is 23.976024627686Hz, for output DisplayPort-0 mode 5120x1440 the best fitting display fps rate is 96.04Hz
[autoconvert] Converting nv12 -> yuv420p
warning: mesa: CommandLine Error: Option 'h' registered more than once!
VO: [gpu] 3840x2160 yuv420p
AV: 00:00:03 / 00:45:44 (0%) A-V:  0.016 DS: 1.980/3 Dropped: 5 Cache: 5.9s/3MB

Exiting... (Quit)
[xrandr] switching output DisplayPort-0 that was set for replay to mode 5120x1440 at 96.04Hz and 1.0 scaling back to mode 5120x1440 with refresh rate 60.00Hz and 1 scaling

Without revision:

(+) Video --vid=1 (*) (vp9 3840x2160 23.976fps)
 (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz)
File tags:
 Uploader: LLVM
 Channel_URL: https://www.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ
Using hardware decoding (vaapi-copy).
AO: [alsa] 48000Hz stereo 2ch float
VO: [gpu] 3840x2160 nv12
[xrandr] output DisplayPort-0 mode=5120x1440 old rate=96.04 refresh rates =    119.97 + 120.05    96.04*   72.00    60.00    50.00    48.01   100.00    60.00    59.98    30.00    25.00    24.00    23.98  scale =  1
[xrandr] container fps is 23.976024627686Hz, for output DisplayPort-0 mode 5120x1440 the best fitting display fps rate is 96.04Hz
[autoconvert] Converting nv12 -> yuv420p
mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted

I’ll open a RFC on that matter.

[1]: https://github.com/KhronosGroup/OpenCL-ICD-Loader
[2]: https://github.com/OCL-dev/ocl-icd

labre updated this revision to Diff 487048.Jan 6 2023, 10:17 PM

Patch seemed to not apply anymore against 16.x. Fixed that.

labre abandoned this revision.Jan 14 2023, 4:15 AM

The concrete issue could be tracked to a copy/paste error in RocM that registers an option explicitly without requirement and can be solved there.

CommandLine isolation (local options per library) would also remedy this issue and seems to be on the way for LLVM 16. Abandoning in favor of
https://reviews.llvm.org/D129129
https://reviews.llvm.org/D129134