Page MenuHomePhabricator

[mlir] AsyncRuntime: disable threading until test flakiness is fixed
ClosedPublic

Authored by ezhulenev on Dec 1 2020, 12:44 AM.

Details

Summary

ExecutionEngine/LLJIT do not run globals destructors in loaded dynamic libraries when destroyed, and threads managed by ThreadPool can race with program termination, and it leads to segfaults.

TODO: Re-enable threading after fixing a problem with destructors, or removing static globals from dynamic library.

Diff Detail

Event Timeline

ezhulenev created this revision.Dec 1 2020, 12:44 AM
ezhulenev requested review of this revision.Dec 1 2020, 12:44 AM
ezhulenev edited the summary of this revision. (Show Details)Dec 1 2020, 12:52 AM
ezhulenev added reviewers: mehdi_amini, jpienaar.
This revision was not accepted when it landed; it landed in state Needs Review.Dec 1 2020, 1:12 AM
This revision was automatically updated to reflect the committed changes.

Since it surfaced on our radar again, any updated on the global destructors removal?

Since it surfaced on our radar again, any updated on the global destructors removal?

The PR comment is a bit incorrect, it's not ExecutionEngine/LLJIT problem, it is the dlcose behavior and when it runs global destructors. One potential fix is to link async runtime with mlir-cpu-runner and let the runner to manage thread pool lifetime, but it will make it an exception from every other support library loaded at runtime.

Can we change the way mlir-cpu-runner loads dynamic libraries so that it expects known entry points in the library to initialize/register/deinitialize?
We could then also dlopen them in private mode instead of injecting symbols in the namespace.