This is an archive of the discontinued LLVM Phabricator instance.

[mlir][RunnerUtils] Make symbols private + implement loading mechanism.
ClosedPublic

Authored by ingomueller-net on Jun 19 2023, 12:36 AM.

Details

Summary

There are two ways to make symbols from a shared library visible in the
execution engine: exporting the symbols with public visibility or
implementing a loading/unloading mechansim that registers the exported
symbols explicitly. The latter has only been available in the JIT runner
until recently, but https://reviews.llvm.org/D153029 makes it available
in any usage of the execution engine (including the Python bindings).

This patch makes the runner utils library use the latter mechanism
instead of the former, i.e., it makes all of its symbols private and
implements the init/destroy functions of the loading mechanism to
control explicitly which symbols it registers.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2023, 12:36 AM
ingomueller-net added a comment.EditedJun 19 2023, 12:37 AM

(This still fails in CI until the issues discussed in https://reviews.llvm.org/D153029 have been resolved.)

I am not sure whether this is actually a good idea. It makes it impossible to use the library with dlopen. Some of the integration tests of the sparse compile actually do that: they run their code through lli --dlopen=%mlir_runner_utils, so the loading mechanism of the execution engine isn't involved and the symbols don't get loaded/aren't visible to the code. Not sure how to proceed now.

(Rebasing on top of https://reviews.llvm.org/D153255 to have the visibility issues fixed.)

ingomueller-net published this revision for review.Jun 19 2023, 3:25 AM

(Rabasing onto main. I believe my previous rebase didn't result in https://reviews.llvm.org/D153255 being included, so CI failed.)

It makes it impossible to use the library with dlopen

I also makes it impossible to use in AOT mode by directly linking to it I think?

It makes it impossible to use the library with dlopen

I also makes it impossible to use in AOT mode by directly linking to it I think?

I thinks so too, yes.

It makes it impossible to use the library with dlopen

I also makes it impossible to use in AOT mode by directly linking to it I think?

I thinks so too, yes.

This makes me wonder whether the loading mechanism is such a good idea after all. I looked at D94270/D94312, which introduced them, and couldn't quite figure out why they were introduced. @ezhulenev: can you shed some light?

I don't quite get what is the problem you're trying to solve here actually?

I don't quite get what is the problem you're trying to solve here actually?

I don't understand why the async runtime lib uses one loading mechanism (explicit init/destroy functions) and the runner utils lib uses another (import all exported symbols via ORC). I thought that the init/destroy mechanism was preferred (hence https://reviews.llvm.org/D153009 was rejected), so I am trying to make things consistent with this patch by using the same mechanism for the runner lib. If that mechanism has problems, though, don't they apply for the async lib as well? If so, shouldn't we use the other mechanism there as well?

Why should the explicit loading mechanism imply to make all symbol private though?

Why should the explicit loading mechanism imply to make all symbol private though?

Ha, yes! I didn't see the obvious... Thanks for helping out ;) I'll prepare a new version...

Rebasing to main and making the symbols of the library public again (but keeping all unrelated symbols hidden).

mehdi_amini accepted this revision.Jun 20 2023, 7:27 AM
mehdi_amini added inline comments.
mlir/lib/ExecutionEngine/CMakeLists.txt
167

Are these CMake directives needed?

This revision is now accepted and ready to land.Jun 20 2023, 7:27 AM
mehdi_amini added inline comments.Jun 20 2023, 7:27 AM
mlir/lib/ExecutionEngine/CMakeLists.txt
167

(Nevermind this comment)

hctim added a subscriber: hctim.Jun 21 2023, 3:25 PM

Hi folks, I did a bisect and unfortunately this broke the HWASan buildbot. It only showed up on the build after yours because your patch was included in a set that was broken for other reasons. https://lab.llvm.org/buildbot/#/builders/236/builds/4911

I'll check tomorrow and see exactly why your patch is causing a problem, but dynamic loading can play interesting with HWAddressSanitizer, because we need to ensure that global variables have the correct memory tag applied, as the address tag is synthesized in the symbols in the DSO.

If you have access to an aarch64 machine and want to reproduce, please feel free. The fastest way is to use the instructions at https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild, the script that's of note here is zorg/buildbot/builders/sanitizers/buildbot_bootstrap_hwasan.sh. You may find commenting out check_stage1_hwasan, build_stage3_hwasan, and check_stage3_hwasan, as well as only running the MLIR tests in stage2 helpful:

diff --git a/zorg/buildbot/builders/sanitizers/buildbot_functions.sh b/zorg/buildbot/builders/sanitizers/buildbot_functions.sh
index 1b64a929..e2f96a09 100755
--- a/zorg/buildbot/builders/sanitizers/buildbot_functions.sh
+++ b/zorg/buildbot/builders/sanitizers/buildbot_functions.sh
@@ -388,7 +388,7 @@ function check_stage2 {
   fi

   echo @@@BUILD_STEP stage2/$sanitizer_name check@@@
-  ninja -C ${STAGE2_DIR} check-all || build_failure
+  ninja -C ${STAGE2_DIR} check-mlir || build_failure

   if [[ "${STAGE2_SKIP_TEST_CXX:-}" != "1" ]] ; then
     echo @@@BUILD_STEP stage2/$sanitizer_name check-cxx@@@

@hctim, @mehdi_amini: Thanks for tracking down the problem and for reverting! Sorry about these issues. After looking around a bit, I decided that it is too much hassle for me to reproduce the issue in order to fix it, and currently too unimportant to spend more time. I'll leave the commit reverted as is for the time being...