Page MenuHomePhabricator

[mlir] Add a ThreadPool to MLIRContext and refactor MLIR threading usage
ClosedPublic

Authored by rriddle on Jun 18 2021, 3:30 AM.

Details

Summary

This revision refactors the usage of multithreaded utilities in MLIR to use a common
thread pool within the MLIR context, in addition to a new utility that makes writing
multi-threaded code in MLIR less error prone. Using a unified thread pool brings about
several advantages:

  • Better thread usage and more control

We currently use the static llvm threading utilities, which do not allow multiple
levels of asynchronous scheduling (even if there are open threads). This is due to
how the current TaskGroup structure works, which only allows one truly multithreaded
instance at a time. By having our own ThreadPool we gain more control and flexibility
over our job/thread scheduling, and in a followup can enable threading more parts of
the compiler.

  • The static nature of TaskGroup causes issues in certain configurations

Due to the static nature of TaskGroup, there have been quite a few problems related to
destruction that have caused several downstream projects to disable threading. See
D104207 for discussion on some related fallout. By having a ThreadPool scoped to
the context, we don't have to worry about destruction and can ensure that any
additional MLIR thread usage ends when the context is destroyed.

Diff Detail

Event Timeline

rriddle created this revision.Jun 18 2021, 3:30 AM
rriddle requested review of this revision.Jun 18 2021, 3:31 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
bondhugula added inline comments.
mlir/include/mlir/IR/ThreadingUtilities.h
30 ↗(On Diff #352961)

synchronously or sequentially?

bondhugula added inline comments.Jun 18 2021, 4:00 AM
mlir/lib/Pass/Pass.cpp
590–615

std::vector<std::atomic<bool>> activePMs(asyncExecutors.size(),

std::atomic<bool>{false});

Nice, thanks for tackling this River!

llvm/lib/Support/ThreadPool.cpp
77

Yuck but ok. This is obviously your way of trying to force me to rewrite this stuff ;-)

mlir/include/mlir/IR/ThreadingUtilities.h
1 ↗(On Diff #352961)

I'd recommend calling this just "Threading.h".

This also points out a layering issue: MLIRContext is the right place for us to scope this (so I'm ok with the patch), but this really has nothing to do with the IR!

One solution would be to split MLIRContext, give it a subclass like "MLIRNonIRContext" (better names welcome) that contain the thread pool, diagnostics utility and other stuff that would be generic, and put it in mlir/Support. MLIRContext would derive from it, and would remain the currency type, but generic utilities like this would take "MLIRNonIRContext".

Anyway, discussion for another day, just saying that general parallelism stuff would be better in Support.

48 ↗(On Diff #352961)

Please merge this into the previous "if", so the compiler doesn't have to undupe the two std::for_each calls.

85 ↗(On Diff #352961)

Could you also add a parallelForEachN while you're here? circt uses it in a couple places.

mlir/lib/Pass/Pass.cpp
590–615

Right, atomic needs to be explicitly initialized to work correctly

mlir/test/Dialect/Affine/SuperVectorize/compose_maps.mlir
1

Why do these need to disable threading?

lattner accepted this revision.Jun 19 2021, 8:58 AM
This revision is now accepted and ready to land.Jun 19 2021, 8:58 AM
rriddle marked an inline comment as done.Jun 19 2021, 6:14 PM
rriddle added inline comments.
llvm/lib/Support/ThreadPool.cpp
77

Yes, please! I don't have any time to devote to it for the next month or so, but I would really love a ThreadPool that supported the things that we need. This function is a gross band-aid over the current implementation that allows for us to use it as-is for now.

mlir/lib/Pass/Pass.cpp
590–615

Thanks Uday! My 4am brain couldn't remember how to initialize it.

did you consider shoving the existing threadpool stuff into a ManagedStatic? That is how we typically handle problems like this, they are destroyed on llvm_shutdown instead of at global deinit.

mehdi_amini accepted this revision.Jun 21 2021, 11:55 AM

Sweet!

mlir/include/mlir/IR/ThreadingUtilities.h
38 ↗(On Diff #352961)
rriddle marked 8 inline comments as done.Jun 21 2021, 6:28 PM
rriddle added inline comments.
mlir/test/Dialect/Affine/SuperVectorize/compose_maps.mlir
1

Several existing tests depended on the previous threading behavior to work, mostly getting lucky that the errs() output was laid out in the way they expect. Switched to no longer disable threading, but adding split file markers to separate tests.

rriddle updated this revision to Diff 353522.Jun 21 2021, 6:33 PM
rriddle marked 2 inline comments as done.

update

did you consider shoving the existing threadpool stuff into a ManagedStatic? That is how we typically handle problems like this, they are destroyed on llvm_shutdown instead of at global deinit.

It was mentioned in the other revision I believe, but ManagedStatic is what the LLVM threading utilities currently use. Moving to the context removed all of the environment related shutdowns that I was seeing.

mlir/include/mlir/IR/ThreadingUtilities.h
1 ↗(On Diff #352961)

I think the major sticking point is what to do about Location, which is used pervasively by Diagnostics. Most of the handlers interact with the location types, so those I suppose could stay in IR, but Diagnostic::getLocation would have to hold/return something like RawLocation? Which would have to store something opaque. I think it's doable, but the layering would have to be carefully done such that interacting with Diagnostics doesn't become ugly.

It was mentioned in the other revision I believe, but ManagedStatic is what the LLVM threading utilities currently use. Moving to the context removed all of the environment related shutdowns that I was seeing.

No, it isn't. This is the problem:

Executor *Executor::getDefaultExecutor() {
// ...
  static ManagedStatic<ThreadPoolExecutor, ThreadPoolExecutor::Creator,
                       ThreadPoolExecutor::Deleter>
      ManagedExec;
  static std::unique_ptr<ThreadPoolExecutor> Exec(&(*ManagedExec));
  return Exec.get();
}

Note that "Exec" is not a managed static. The "..." elided is some justification for this design that seems really dubious to me (but I admit I haven't mind melded with it!)

mlir/test/Dialect/Affine/SuperVectorize/compose_maps.mlir
1

Ok for this patch, but gross. Plz file a bug about these. It is a bad antipattern to emit to errs(), these tests should be emitting errors so they are pinned to a location correctly (similar to the dependence analysis tests)

I mentioned this in the other phab thread: while I'm totally ok with this patch as a way to unblock progress, it really isn't the right thing. It doesn't make sense for MLIR to have its own threadpool tied to its context's. CPU resources aren't library specific, they are global.

It was mentioned in the other revision I believe, but ManagedStatic is what the LLVM threading utilities currently use. Moving to the context removed all of the environment related shutdowns that I was seeing.

No, it isn't. This is the problem:

Executor *Executor::getDefaultExecutor() {
// ...
  static ManagedStatic<ThreadPoolExecutor, ThreadPoolExecutor::Creator,
                       ThreadPoolExecutor::Deleter>
      ManagedExec;
  static std::unique_ptr<ThreadPoolExecutor> Exec(&(*ManagedExec));
  return Exec.get();
}

Note that "Exec" is not a managed static. The "..." elided is some justification for this design that seems really dubious to me (but I admit I haven't mind melded with it!)

Right right, I'm only vaguely recalling watching the original review for this (D70447). Before that we effectively couldn't use the Parallel methods on windows.

I mentioned this in the other phab thread: while I'm totally ok with this patch as a way to unblock progress, it really isn't the right thing. It doesn't make sense for MLIR to have its own threadpool tied to its context's. CPU resources aren't library specific, they are global.

I'm not completely against a static/global thread pool, but I would claim that a static/global thread pool is orthogonal to CPU resources being library specific. A lot of users today with the current LLVM global thread pool don't share threads with things non-LLVM (whether for good reasons or bad).
We've encountered a lot of "interesting" interactions surrounding threading and certain environments, and I want to say a lot of those are due to the way the current code is structured (which can be a bit obscure at times). The weird threading environment interactions have been extremely taxing
and difficult to debug, and often end with "I have no idea what's happening, guess I'll just disable multi-threading". I would strongly prefer we stay non-static until a proper solution can prove itself to be reliable enough. I don't know exactly what you've had in mind
for revamping the threading libraries, but I have a wish list if you are interested.

Ok, I'm more interested in having a parallel verifier than waiting for perfection here. We can all agree the existing threadpool stuff is "suboptimal" and needs to be replaced. This seems like a fine step in the immediate term.

This revision was landed with ongoing or failed builds.Jun 22 2021, 6:33 PM
This revision was automatically updated to reflect the committed changes.
lei added a subscriber: lei.Jun 24 2021, 9:47 AM

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

Are those related to this patch? Those all look like orc failures.

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

Are those related to this patch? Those all look like orc failures.

Do you have a way to repro the issue?

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

Are those related to this patch? Those all look like orc failures.

I think pulling in additional code has caused a relocation overflow in the runtime linker. We might have to switch to the large code model for ORC on PPC.

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

Are those related to this patch? Those all look like orc failures.

I think pulling in additional code has caused a relocation overflow in the runtime linker. We might have to switch to the large code model for ORC on PPC.

Thanks for looking! That definitely looks like what is happening. Do you know how to adjust that? (I have very little experience/knowledge of ORC)

We can disable the JIT tests on PPC in the meantime to get the bot back green?

We can disable the JIT tests on PPC in the meantime to get the bot back green?

I think that is a good idea. I'll look into how to set the code model for ORC if you or someone else knows how to disable these (presumably not just XFAIL for each individual test), I would appreciate it.

I tried to disable them in 652f4b5140e2, hopefully I got the logic right! The bot is backlogged so unfortunately it'll take time to figure.

By the way have you considered enabling CCACHE on the bot? This dramatically helped this bot: https://lab.llvm.org/buildbot/#/builders/61

The MLIR bot, https://lab.llvm.org/buildbot/#/builders/88/builds/14468, Have been red for over 30hr when this patch landed. Please provide a fix or pull the related patches to bring the bot back to green.

Are those related to this patch? Those all look like orc failures.

I think pulling in additional code has caused a relocation overflow in the runtime linker. We might have to switch to the large code model for ORC on PPC.

Thanks for looking! That definitely looks like what is happening. Do you know how to adjust that? (I have very little experience/knowledge of ORC)

@lhames might be able to point you at the right spot!

I tried to disable them in 652f4b5140e2, hopefully I got the logic right! The bot is backlogged so unfortunately it'll take time to figure.

By the way have you considered enabling CCACHE on the bot? This dramatically helped this bot: https://lab.llvm.org/buildbot/#/builders/61

We might consider using ccache on the bot but I am not sure that would be all that useful. None of the builds on this bot seem to take any longer than 10 minutes. I am not really sure what has caused the backlog on it now - maybe some issue with master or the worker itself? This builder certainly does build every single commit so that may be something we need to change in the future. In any case, we'll get the queue cleaned up to unclog it.

lei added a comment.Mon, Jun 28, 12:26 PM

@mehdi_amini The PPC mlir bot still shows 6 overflow failures after your patch. Can you please help to disable them for PPC?

Failed Tests (6):
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/MLIRExecutionEngine.AddInteger
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/MLIRExecutionEngine.SubtractFloat
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/NativeMemRefJit.BasicMemref
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/NativeMemRefJit.JITCallback
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/NativeMemRefJit.RankOneMemref
  MLIR-Unit :: ExecutionEngine/./MLIRExecutionEngineTests/NativeMemRefJit.ZeroRankMemref

I tried to disable them in 652f4b5140e2, hopefully I got the logic right! The bot is backlogged so unfortunately it'll take time to figure.

By the way have you considered enabling CCACHE on the bot? This dramatically helped this bot: https://lab.llvm.org/buildbot/#/builders/61

lei added a comment.Mon, Jun 28, 3:56 PM

Actually never mind 🙂 I think I've isolated the issue on the PPC bot.

Actually never mind 🙂 I think I've isolated the issue on the PPC bot.

Ah it seems I only managed to disabled the lit tests but not unit-tests, do you have patch for this or another more direct fix?

lei added a comment.Tue, Jun 29, 12:51 PM

I noticed that if I export LD_LIBRARY_PATH=/usr/lib64 then all tests passes. I am thinking to revert your previous patch to enable those lit tests and add this export for the mlir bot for now to get the bot going again, but continue to investigate what the difference is between /lib64 and /usr/lib64 on our PPC machine.

@lhames might be able to point you at the right spot!

Late to the party here, sorry!

Relocation R_PPC64_REL32 overflow -- This is a PC-relative relocation overflow. Something is being allocated out of range for of the fixup address. If you're not using the large code model already then changing to the large code model for your JIT'd code may fix this, but I believe there are some known limitations of the RuntimeDyld PPC backend that mean that out-of-range issues can show up even in the large code model.

I recently landed changes to JITLink's ELF support that make it easier to write new JITLink backends -- I think the ideal fix here would be to implement a PPC64 ELF backend for JITLink and eliminates those limitations altogether.

@lhames might be able to point you at the right spot!

Late to the party here, sorry!

Relocation R_PPC64_REL32 overflow -- This is a PC-relative relocation overflow. Something is being allocated out of range for of the fixup address. If you're not using the large code model already then changing to the large code model for your JIT'd code may fix this, but I believe there are some known limitations of the RuntimeDyld PPC backend that mean that out-of-range issues can show up even in the large code model.

I recently landed changes to JITLink's ELF support that make it easier to write new JITLink backends -- I think the ideal fix here would be to implement a PPC64 ELF backend for JITLink and eliminates those limitations altogether.

This has come up in other contexts as well and it certainly seems like a worthwhile project for us to undertake as soon as resources are available. Would you be able to provide a link to some documentation to guide a developer that only has experience with the back end and no JIT experience?