This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Factor pass timing out into a dedicated timing manager
ClosedPublic

Authored by fabianschuiki on Apr 16 2021, 6:57 AM.

Details

Summary

This factors out the pass timing code into a separate TimingManager that can be plugged into the PassManager from the outside. Users are able to provide their own implementation of this manager, and use it to time additional code paths outside of the pass manager. Also allows for multiple PassManagers to run and contribute to a single timing report.

More specifically, moves most of the existing infrastructure in Pass/PassTiming.cpp into a new Support/Timing.cpp file and adds a public interface in Support/Timing.h. The PassTiming instrumentation becomes a wrapper around the new timing infrastructure which adapts the instrumentation callbacks to the new timers.

This is a concrete suggestion towards https://llvm.discourse.group/t/add-entries-to-pass-timing-from-outside-the-passmanager/3115.

Diff Detail

Event Timeline

fabianschuiki created this revision.Apr 16 2021, 6:57 AM
fabianschuiki requested review of this revision.Apr 16 2021, 6:57 AM

I support making the timing infrastructure usable and pluggable outside of the pass manager! Thanks for working on this, I looked into it at some point but didn't need it badly enough to finish it.

However I'm not sure about this approach at the moment: it seems that you're trying to keep the PM in charge of all the timing, and using it as a "timing manager" for things that aren't related to pass management.
I would think that it'd be better modeled by extracting the timing manager as a separate entities usable outside of the PM, and have the PM report to "timing manager" (through injection or similar).

I support making the timing infrastructure usable and pluggable outside of the pass manager! Thanks for working on this, I looked into it at some point but didn't need it badly enough to finish it.

However I'm not sure about this approach at the moment: it seems that you're trying to keep the PM in charge of all the timing, and using it as a "timing manager" for things that aren't related to pass management.
I would think that it'd be better modeled by extracting the timing manager as a separate entities usable outside of the PM, and have the PM report to "timing manager" (through injection or similar).

+1 to what Mehdi says here. We want to have general infra related to timing (that ideally users can inject their own implementations of), with the pass manager simply hooking into that.

I agree with Mehdi and River. it seems like the passmgr should be a client of the more general shared thing.

I'd also recommend changing the top level flyer:
... Pass execution timing report ... to just ... Execution time report ...

Thanks for the great feedback! I also had a more general timing manager in mind that we could inject into the PM, but opted for this simpler change to get a discussion rolling. Let me factor out the timing interface into some abstract base class and the current implementation of time keeping into a concrete subclass. That would allow us to default to an MLIR-provided TimingManager class as a convenience (and to not break existing code maybe?), but still allow the user to provide a custom implementation if desired.

fabianschuiki retitled this revision from [MLIR] Add API for external timers to pass manager to [MLIR] Factor pass timing out into a dedicated timing manager.
fabianschuiki edited the summary of this revision. (Show Details)
  • Moved most of the timing infrastructure in Pass/PassTiming.cpp into a new Support/Timing.cpp file
  • Added public interfaces for the timing stuff in Support/Timing.h
  • Made the pass manager use the timers provided by the new Timing.h
  • Added a new TimingManager that holds the root timer and prints timing reports (similar to the role the PassTiming instrumentation had previously)
  • Added TimingManagerOptions and corresponding registerTimingManagerCLOptions and applyTimingManagerCLOptions functions

One problem I ran into when merging async executions of the same pipeline was identifying whether two executed passes were actually "the same" from the user's perspective (i.e. just cloned for the sake of threading). Just using the pass pointer as unique identifier would cause each thread to show up as a separate set of passes. An obvious solution would have been to use pass.getTypeID().getAsOpaquePointer(), but if the user intentionally added the same pass multiple times they would all be collapsed into one. I ended up adding a passPrototype field to Pass itself to track the "pass identity" through cloning inside the PM. (The original pass timing had the same problem and opted for merging the passes in a pipeline by position.)

I'm not sure how far we want to preserve compatibility with respect to CL options. The patch as it is now preserves the old --pass-timing and --pass-timing-display options of the PM, as well as the PM's enableTiming(config) function. If no external timing manager is provided, the PM will just create its own temporary one internally, report there, and print results as before. The new --timing and --timing-display can still be used and if present take precedence over --pass-timing. I'm also happy to throw out the old --pass-timing options, or rename the new ones to --pass-timing instead.

Finally I also added reporting for the "remaining" time not covered by dedicated timers, as suggested by @lattner. It is currently only shown for the top of the hierarchy. It's easy to add for all nested timers that have children, but that adds *a lot* of entries to the pipeline.


The output from the CIRCT project's firtool with the new timing manager looks as follows:

===-------------------------------------------------------------------------===
                        ... Execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0067 seconds

  ----User Time----  ----Wall Time----  ----Name----
    0.0010 ( 13.4%)    0.0010 ( 14.9%)  Parser
    0.0012 ( 15.7%)    0.0012 ( 17.5%)  'firrtl.circuit' Pipeline
    0.0004 (  6.0%)    0.0004 (  6.6%)    LowerFIRRTLTypes
    0.0006 (  7.7%)    0.0006 (  8.5%)    'firrtl.module' Pipeline
    0.0002 (  2.1%)    0.0002 (  2.3%)      CSE
    0.0000 (  0.1%)    0.0000 (  0.1%)        (A) DominanceInfo
    0.0004 (  5.3%)    0.0004 (  6.0%)      Canonicalizer
    0.0014 ( 19.3%)    0.0014 ( 21.5%)  LowerFIRRTLToRTL
    0.0002 (  2.2%)    0.0002 (  2.5%)  RTLMemSimImpl
    0.0011 ( 14.1%)    0.0004 (  5.5%)  'rtl.module' Pipeline
    0.0003 (  4.5%)    0.0002 (  2.3%)    RTLCleanup
    0.0002 (  3.2%)    0.0001 (  1.4%)    CSE
    0.0000 (  0.2%)    0.0000 (  0.1%)      (A) DominanceInfo
    0.0003 (  4.3%)    0.0001 (  1.6%)    Canonicalizer
    0.0005 (  6.5%)    0.0005 (  7.3%)  RTLLegalizeNames
    0.0002 (  2.2%)    0.0002 (  2.4%)    (A) circt::sv::LegalNamesAnalysis
    0.0008 ( 10.6%)    0.0008 ( 11.8%)  Output
    0.0008 ( 11.2%)    0.0008 ( 12.5%)  Rest
    0.0075 (100.0%)    0.0067 (100.0%)  Total

Of these Parser and Output are timers run before and after the PM, and Rest indicates the time that is not covered by the top-level timers.

rriddle added inline comments.Apr 21 2021, 11:04 AM
mlir/include/mlir/Pass/Pass.h
154

You can't rely on this meaning that the pass is a threading sibling with the original pass.

mlir/include/mlir/Support/Timing.h
113

nit: Just spell out auto here.

121

This display mode looks specific to the current pass manager implementation, I'm not sure this is generally applicable to the other potential timer backends. Couple of related general questions:

  • Do we need a general TimingConfig anymore?

This was necessary before because of how the pass manager hid the internal timing instrumentation, but I'm not sure that it is really necessary anymore. How much of it would be shared across different potential timer backends?

  • Can you rename TimingManagerBase base to just TimingManager?

I don't think the default implementation should be called TimingManager, it should likely be called something else(not sure the name though). This would establish that TimingManager (as you currently have it) is just an implementation and not the interface.

  • Do you have an idea of what supporting another timing backend would look like?

Not that you have to do the work, but when I've thought about this before ideally I would want to be able to support something like (1) or (2) without needing a new API/interface. Ideally these would all just be implementations that a user can choose from and pass the general infra to use for timer reporting.

(1): https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/TimeProfiler.h
(2): https://github.com/wolfpld/tracy

mlir/lib/Support/Timing.cpp
134

The virtual isn't necessary when you have override.

139

nit: Drop the else after a return.

171

Drop the trivial braces here.

This is looking really great Fabian!

mlir/include/mlir/Pass/PassManager.h
306

Unless this is a widely used typedef that could occur in downstream code, I'd recommend just eliminating the PassTimingConfig name, updating clients to use TimingConfig directly.

mlir/include/mlir/Support/Timing.h
26

Please follow clang-tidy's advice here and add a virtual dtor. Also, please declare one method (e.g. the dtor) out of line in a .cpp file to give the vtable a home in a .o file.

161

I agree with River's comment that it would be great to eliminate this class.

mlir/lib/Support/Timing.cpp
107

As clang tidy suggests please declare this as a class. The difference matters for MSVC

fabianschuiki marked 10 inline comments as done.May 3 2021, 8:55 AM
fabianschuiki added inline comments.
mlir/include/mlir/Pass/Pass.h
154

Renamed this to threadingSibling and moved the tracking of copies from Pass::clone() into the OpPassManager constructor. Since the OpPassManager is only duplicated in OpToOpPassAdaptor::runOnOperationAsyncImpl, that should reliably identify clones that were made for the sake of threading.

fabianschuiki updated this revision to Diff 342412.EditedMay 3 2021, 9:12 AM
fabianschuiki marked an inline comment as done.

Thanks for the great feedback @rriddle and @lattner! Updated the diff with your suggestions and fixed the check-mlir run.

  • @rriddle's question about how easily we can spin up additional implementations for stuff like the LLVM TimeProfiler or tracy got me thinking. The previous design would require you to subclass both TimerBase and TimingManagerBase to hook into the infrastructure. Since you had to hand out TimerBase instances, you had to allocate them somewhere in memory, which was particularly bad if the external timing API you were wrapping was giving you IDs or pointers to its own data structures. This revision now simplifies the design: TimerBase has been replaced by a Timer that is not intended to be subclassed, but packages a pointer to a timing manager and an opaque ID/handle identifying the timer. TimingManagerBase is now called TimingManager and has overridable callback functions for timer start, stop, and nesting which Timer forwards to. That means there is one thing for implementations to subclass (TimingManager), and one thing for library code to interact with (Timer, and by extension TimingScope). Makes things much easier to use. Also, wrappers around external timing libs can just pass around the pointers they get from those libs as opaque handles in Timer.
  • Dropped all the TimingConfig business and made the TimingDisplayMode a characteristic of the default timing manager implementation.
  • DefaultTimingManager is maybe not the best of names; suggestions are very welcome! Maybe ConsoleTimingManager, BasicTimingManager, SimpleTimingManager?
  • The redundancy between the timing options in PassManagerOptions and the new ones in TimingManagerOptions felt awkward. I dropped the timing options from PM altogether in favor of dealing with timing config completely outside of the PM. The pass manager can own a TimingManager and there is now a applyDefaultTimingPassManagerCLOptions() function which generates a dedicated timing manager for the PM with some options in case the user does not want to deal with timing themselves. The intended usage though is through PassManager::enableTiming(TimingManager).
  • I renamed the --pass-timing and --pass-timing-display options to --timing and --timing-display, respectively, to reflect that the scope of these has broadened beyond pass timing.
  • I updated MlirOptMain and the pass-timing.mlir test to use the new timing infrastructure and options, and to act as an example on how to use them.
lattner accepted this revision.May 3 2021, 9:50 PM

This looks really nice Fabian!

This revision is now accepted and ready to land.May 3 2021, 9:50 PM

How do you envision this being used deep within the compiler? Should we have a static API that is accessible from anywhere? Should we have something on the context? What are your thoughts here?

Also, please fix the clang tidy issues.

mlir/include/mlir/Support/Timing.h
160

nit: Please make bool conversion operators explicit.

(Apologies for the delay in review, will take a pass over tomorrow)

fabianschuiki marked an inline comment as done.May 3 2021, 11:22 PM

Fixed clang-tidy issues and made bool conversion explicit. Sorry about those, having a somewhat hard time to get my local clang-tidy to reproduce the set of upstream checks.

Regarding usage within the compiler: Having to thread TimingScopes through everywhere sounds pretty unergonomic if you just want to time a few things within the compiler. Ideally we would have a per-thread stack of active timing scopes somewhere (Maybe in the context? As a thread-local static?). Pointers to the active TimingScopes would then be kept on that stack, and you could just ask for the currently-active one to nest into. That way you could get at whatever scope within the parent code you're implicitly executing in, without needing to explicitly push scope references down the call chain. Maybe context.getTimingScope(id, nameBuilder) and context.getTimer(id, nameBuilder)?

In general, I have concerns with any design that would single global data structure for book-keeping. I would look into hooking the storage onto the MLIRContext as needed.

rriddle requested changes to this revision.May 4 2021, 10:01 PM

Fixed clang-tidy issues and made bool conversion explicit. Sorry about those, having a somewhat hard time to get my local clang-tidy to reproduce the set of upstream checks.

Regarding usage within the compiler: Having to thread TimingScopes through everywhere sounds pretty unergonomic if you just want to time a few things within the compiler. Ideally we would have a per-thread stack of active timing scopes somewhere (Maybe in the context? As a thread-local static?). Pointers to the active TimingScopes would then be kept on that stack, and you could just ask for the currently-active one to nest into. That way you could get at whatever scope within the parent code you're implicitly executing in, without needing to explicitly push scope references down the call chain. Maybe context.getTimingScope(id, nameBuilder) and context.getTimer(id, nameBuilder)?

I would not add timing API to the context itself, but instead hook in a reference to a timing manager. Mehdi touches a bit on the general opinion that MLIR takes about these in terms of static vs non-static, and returning a manager object is inline with the other things hooked into the context. One important piece here though, is that it should generally be zero-cost if there is no timing manager.

mlir/include/mlir/Pass/PassManager.h
350

Why can't this be handled by the timing instrumentation? I don't think we need these fields here.

mlir/include/mlir/Support/Timing.h
17

This is an indicator that the dependencies are off. Support/ shouldn't depend on anything in IR/.

333

nit: I don't think the llvm:: here is necessary if you include LLVM.h

mlir/lib/Pass/PassTiming.cpp
95

nit: Drop the trivial braces here.

mlir/lib/Support/MlirOptMain.cpp
51 ↗(On Diff #342652)

I'm not sure about this type of integration. This seems to always enforce timing, which isn't something that I would expect. If we start integrating timing more deeply into the compiler. I would expect it to do nothing unless explicitly enabled.

This revision now requires changes to proceed.May 4 2021, 10:01 PM

I think having a reference to a TimingManager in the context would be great. That would allow us to encode timing enable/disable through the presence/absence of a such a reference (right now it's an explicit field in the TM). For the sake of keeping code that uses the timing API streamlined, we should probably have the getTimingManager() return a NullTimingManager reference that always hands out disabled Timers. Since the latter are just a wrapper around a pointer, and their functions all return immediately if disabled, the runtime cost in code should be equivalent to having manual "is timing enabled" checks.

Do you guys want to tackle the integration into MLIRContext already as part of this diff, or push that off into a separate one?

mlir/include/mlir/Pass/PassManager.h
350

Yeah good point, I'll move that into the timing instrumentation.

mlir/include/mlir/Support/Timing.h
17

Good point. Will check for some other/dedicated mechanism to map a string to a opaque pointer identifying the string content. Since we have a TimingManager now, that could just as well go in there.

mlir/lib/Support/MlirOptMain.cpp
51 ↗(On Diff #342652)

The timing is disabled in the manager upon construction. The call to applyDefaultTimingManagerCLOptions(tm) on the next line enables timing if the options are present on the command line. If the timing remains disabled, getRootScope() below returns a disabled TimingScope that has a fast path to do nothing (just a pointer comparison).

With some integration into MLIRContext and a stack of running timers, you could remove the timing variable below and instead have things like:

DefaultTimingManager tm;
applyDefaultTimingManagerCLOptions(tm);
context->setTimingManager(tm); // or some ownership transfer

auto parserTiming = context->getTimingManager()->addTimingScope("Parser");

parserTiming.stop()

auto outputTiming = context->getTimingManager()->addTimingScope("Output");

I think having a reference to a TimingManager in the context would be great. That would allow us to encode timing enable/disable through the presence/absence of a such a reference (right now it's an explicit field in the TM). For the sake of keeping code that uses the timing API streamlined, we should probably have the getTimingManager() return a NullTimingManager reference that always hands out disabled Timers. Since the latter are just a wrapper around a pointer, and their functions all return immediately if disabled, the runtime cost in code should be equivalent to having manual "is timing enabled" checks.

Do you guys want to tackle the integration into MLIRContext already as part of this diff, or push that off into a separate one?

I'm okay with shifting that a follow up, so that it is easier to review the diff.

Can you also draft up some documentation for the timing infra? with examples on users would interact with it, and how it could be used to inject custom timers? I'm fine with that being a follow up as well.

This looks really great! I'm looking forward to being able to use it.

mlir/include/mlir/Support/Timing.h
87–88
/// Implementations override this method to provide access to the top-level timer.

This method isn't virtual, what do you mean by the comment above?

mlir/lib/Support/MlirOptMain.cpp
51 ↗(On Diff #342652)

Ah, I completely missed that. Thanks for pointing it out! I like what you have here.

fabianschuiki marked 3 inline comments as done.May 5 2021, 12:27 AM
fabianschuiki added inline comments.
mlir/include/mlir/Support/Timing.h
87–88

Oops, my bad -- that's a leftover from when rootTimer() below wasn't a thing.

fabianschuiki marked 5 inline comments as done.May 5 2021, 1:48 AM
  • Moved the owned timing manager and scope into PassTiming, dropping those fields from PassManager.
  • Added a TimingIdentifier to remove the dependency on IR/Identifier.h. This new identifier is a version of Identifier/MLIRContext but stripped-down to cover only the functionality needed by the TM.
  • Other cleanup
rriddle accepted this revision.May 6 2021, 11:34 AM

Looks good to me with comments resolved.

(Also should clarify when I said docs, I was thinking something in mlir/docs/)

mlir/include/mlir/Support/Timing.h
80

demarcate

231

Can you change all of the uses of std::function<> && to function_ref? std::function generally ends up allocating memory, which we don't really want to do. Given that we aren't storing the function, we only really need a reference anyways.

403

Is this something we could make non-virtual? I think it would be nice if we could avoid an indirect call in the case where timing is disabled.

mlir/lib/Support/Timing.cpp
34

This shouldn't be necessary, StringRef is exported in the mlir namespace (see LLVM.h)

543

Can you prefix these with mlir-? I'm always hesitant about adding very commonly named things to the global command line namespace.

This revision is now accepted and ready to land.May 6 2021, 11:34 AM
fabianschuiki marked 5 inline comments as done.May 6 2021, 1:19 PM
fabianschuiki added inline comments.
mlir/include/mlir/Support/Timing.h
403

The way you use the TM at the moment is roughly:

DefaultTimingManager tm;
TimingScope ts = tm.getRootScope();
// all further timing calls go through `ts`

If timing is disabled, ts is marked as diabled through a nullptr in its tm field, which causes all subsequent interaction to short-circuit into no-ops. So in its current implementation, if timing is disabled there is just one virtual function call for the duration of the program -- that should not be visible in practice.

I do agree that it would be nicer if the setup-related interaction with the TM would be non-virtual. Especially if we start to keep per-thread stacks of active timing scopes, that should all be directly accessible. We could promote setEnabled and isEnabled into the TimingManager base class such that it can just never ask for the rootTimer if disabled. Pretty sure that all TM impls can get behind the concept of enabling/disabling.

Do you want to tackle this in this diff, or push that off to a follow-up diff alongside tracking a TM in the context?

mlir/lib/Support/Timing.cpp
543

Yeah that's a good idea. --timing did seem awfully brief.

fabianschuiki marked 2 inline comments as done.

Applied suggestions by @rriddle.

rriddle added inline comments.May 6 2021, 1:20 PM
mlir/include/mlir/Support/Timing.h
403

Followup is alright with me, thanks!

Awesome, thanks! What are the next steps (sorry, first time Phabricator user)?

git pull --rebase
ninja check-mlir # sanity check
git push

if needed:
git pull --rebase
git push

fabianschuiki set the repository for this revision to rG LLVM Github Monorepo.May 12 2021, 9:17 AM

Thanks @lattner! Wasn't sure whether the ready-to-land is the green light to push to the repo myself. Done!

Thanks again Fabian. It would also be good to make sure this gets mentioned in MLIR biweekly

Thanks again Fabian. It would also be good to make sure this gets mentioned in MLIR biweekly

Yes please! :)

(here is the link: https://llvm.discourse.group/t/work-in-progress-next-mlir-news-33rd-edition-5-1-5-14-2021/3417 )

Thanks again Fabian. It would also be good to make sure this gets mentioned in MLIR biweekly

Yes please! :)

(here is the link: https://llvm.discourse.group/t/work-in-progress-next-mlir-news-33rd-edition-5-1-5-14-2021/3417 )

Done! :-)