This is an archive of the discontinued LLVM Phabricator instance.

[llvm][inliner] Reuse the inliner pass to implement 'always inliner'
ClosedPublic

Authored by mtrofin on Nov 16 2020, 2:24 PM.

Details

Summary

Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:

  • reduce code duplication - one inliner codebase
  • open the opportunity to help the full inliner by performing additional

function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Diff Detail

Event Timeline

mtrofin created this revision.Nov 16 2020, 2:24 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 16 2020, 2:24 PM
mtrofin requested review of this revision.Nov 16 2020, 2:24 PM

Please note: the patch isn't 100% ready, there are those tests that check how the pipeline is composed, which are unpleasant to fix, so I want to defer them to after we get agreement over the larger points this patch brings (i.e. pre-performing always inlinings, value in further exploring cleanups before full inlining, etc)

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

In the NPM, we didn't run the AlwaysInliner until D86988. See also the discussion there. The normal inliner pass was, and still is, taking care of the mandatory inlinings if it finds them. Of course, if we completely upfronted those (which this patch can do), then the normal inliner wouldn't need to. I'm not suggesting changing that - meaning, it's straightforward for the normal inliner to take care of mandatory and policy-driven inlinings. The idea, though, is that if we upfront the mandatory inlinings, the shape of the call graph the inliner operates over is simpler and the effects of inlining probably more easy to glean by the decision making policy. There are trade-offs, though - we can increase that "ease of gleaning" by performing more function simplification passes between the mandatory inlinings and the full inliner.

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

In the NPM, we didn't run the AlwaysInliner until D86988. See also the discussion there. The normal inliner pass was, and still is, taking care of the mandatory inlinings if it finds them. Of course, if we completely upfronted those (which this patch can do), then the normal inliner wouldn't need to. I'm not suggesting changing that - meaning, it's straightforward for the normal inliner to take care of mandatory and policy-driven inlinings. The idea, though, is that if we upfront the mandatory inlinings, the shape of the call graph the inliner operates over is simpler and the effects of inlining probably more easy to glean by the decision making policy. There are trade-offs, though - we can increase that "ease of gleaning" by performing more function simplification passes between the mandatory inlinings and the full inliner.

OK, so if I understand correctly with the old Pass Manager there were two separate passes (always inliner and inliner - they share some code though, yeah?) and they were run in the pass pipeline but potentially (definitely?) not adjacent? New pass manager survived for quite a while with only one inlining pass, that included a mandatorily strong preference for inlining always-inline functions? But still missed some recursive cases. So D86988 made the always inliner run right next to/before the inliner in the NPM.

Now there's tihs patch, to implement the AlwaysInliner using the inliner - but is also changing the order of passes to improve optimization opportunities by doing some cleanup after always inlining?

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

In the NPM, we didn't run the AlwaysInliner until D86988. See also the discussion there. The normal inliner pass was, and still is, taking care of the mandatory inlinings if it finds them. Of course, if we completely upfronted those (which this patch can do), then the normal inliner wouldn't need to. I'm not suggesting changing that - meaning, it's straightforward for the normal inliner to take care of mandatory and policy-driven inlinings. The idea, though, is that if we upfront the mandatory inlinings, the shape of the call graph the inliner operates over is simpler and the effects of inlining probably more easy to glean by the decision making policy. There are trade-offs, though - we can increase that "ease of gleaning" by performing more function simplification passes between the mandatory inlinings and the full inliner.

OK, so if I understand correctly with the old Pass Manager there were two separate passes (always inliner and inliner - they share some code though, yeah?)

AlwaysInlinerLegacyPass does, yes. The NPM variant doesn't.

and they were run in the pass pipeline but potentially (definitely?) not adjacent?

From what I can see, the legacy one was used only in the O0/O1 cases, see clang/lib/CodeGen/BackendUtil,cpp:643. The full inliner isn't.

New pass manager survived for quite a while with only one inlining pass, that included a mandatorily strong preference for inlining always-inline functions? But still missed some recursive cases. So D86988 made the always inliner run right next to/before the inliner in the NPM.

Now there's tihs patch, to implement the AlwaysInliner using the inliner - but is also changing the order of passes to improve optimization opportunities by doing some cleanup after always inlining?

It doesn't quite change the order D86988 introduced. Specifically, D86988 ran AlwaysInliner (a module pass) first, then let the Inliner and function optimizations happen.
This patch keeps the order between doing mandatory inlinings and inlinings. But, in addition, if in the future we want to also perform some of the function passes that happen in the inliner case, to help the full inliner, we can more easily do so.

What about removing the existing AlwaysInlinerPass and replacing it with this one? Or is that something you were planning to do in a follow-up change?

open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before the full inliner

This change doesn't run the function simplification pipeline between the mandatory and full inliner though, only

if (AttributorRun & AttributorRunOption::CGSCC)
  MainCGPipeline.addPass(AttributorCGSCCPass());

if (PTO.Coroutines)
  MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));

// Now deduce any function attributes based in the current code.
MainCGPipeline.addPass(PostOrderFunctionAttrsPass());

And is there any evidence that running the function simplification pipeline between the mandatory and full inliner is helpful? It could affect compile times.

I'd think that adding the mandatory inliner right before the full inliner in the same CGSCC pass manager would do the job. e.g. add it in ModuleInlinerWrapperPass::ModuleInlinerWrapperPass() right before PM.addPass(InlinerPass());

llvm/include/llvm/Analysis/InlineAdvisor.h
27

4

What about removing the existing AlwaysInlinerPass and replacing it with this one? Or is that something you were planning to do in a follow-up change?

That's the plan, yes.

open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before the full inliner

This change doesn't run the function simplification pipeline between the mandatory and full inliner though, only

if (AttributorRun & AttributorRunOption::CGSCC)
  MainCGPipeline.addPass(AttributorCGSCCPass());

if (PTO.Coroutines)
  MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));

// Now deduce any function attributes based in the current code.
MainCGPipeline.addPass(PostOrderFunctionAttrsPass());

Right - my point was that we could more easily explore doing so by having this new AlwaysInliner separate. In this patch I included those additional passes because I thought they may be necessary or beneficial, but I can remove them as a first step if they are not necessary. @jdoerfert - should the Attributor be run post-always inlining, or is it fine to not be run?

And is there any evidence that running the function simplification pipeline between the mandatory and full inliner is helpful? It could affect compile times.

In the ML-driven -Oz case, we saw some marginal improvement. I haven't in -O3 cases using the "non-ml" inliner. I suspect that it helps in the ML policy case (both -Oz and -O3, on which we're currently working), because: 1) the current -Oz takes some global (module-wide) features, so probably simplifying out the trivial cases helps; and 2) we plan on taking into consideration regions of the call graph, and the intuition is that eliminating the trivial cases (mandatory cases) would rise the visibility (for a training algorithm) of the non-trivial cases.

I'd think that adding the mandatory inliner right before the full inliner in the same CGSCC pass manager would do the job. e.g. add it in ModuleInlinerWrapperPass::ModuleInlinerWrapperPass() right before PM.addPass(InlinerPass());

It would, and what I'm proposing here is equivalent to that, but the proposal here helps with these other explorations, with (arguably) not much of a difference cost-wise in itself (meaning, of course, if we discover there's benefit in running those additional passes, we pay with compile time, but in of itself, factoring the always inliner in its own wrapper, or in the same wrapper as the inliner, doesn't really come at much of a cost).

Now, if we determine there is no value, we can bring it back easily - wdyt?

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

In the NPM, we didn't run the AlwaysInliner until D86988. See also the discussion there. The normal inliner pass was, and still is, taking care of the mandatory inlinings if it finds them. Of course, if we completely upfronted those (which this patch can do), then the normal inliner wouldn't need to. I'm not suggesting changing that - meaning, it's straightforward for the normal inliner to take care of mandatory and policy-driven inlinings. The idea, though, is that if we upfront the mandatory inlinings, the shape of the call graph the inliner operates over is simpler and the effects of inlining probably more easy to glean by the decision making policy. There are trade-offs, though - we can increase that "ease of gleaning" by performing more function simplification passes between the mandatory inlinings and the full inliner.

OK, so if I understand correctly with the old Pass Manager there were two separate passes (always inliner and inliner - they share some code though, yeah?)

AlwaysInlinerLegacyPass does, yes. The NPM variant doesn't.

The NPM always inliner doesn't share any code with the NPM non-always inliner? (though this ( https://reviews.llvm.org/D86988 ) is the patch that added a separate always inliner to the NPM, right? And that patch doesn't look like it adds a whole new pass implementation - so looks like it's sharing some code with something?)

and they were run in the pass pipeline but potentially (definitely?) not adjacent?

From what I can see, the legacy one was used only in the O0/O1 cases, see clang/lib/CodeGen/BackendUtil,cpp:643. The full inliner isn't.

The full inliner isn't.. isn't run at -O0/-O1? So with the Legacy Pass Manager one inliner (always or non-always) was used in a given compilation, not both? (so I guess then the non-always inliner did the always-inlining in -O2 and above in the old pass manager? But didn't have the same recursive always inlining miss that the NPM non-always inliner had?)

New pass manager survived for quite a while with only one inlining pass, that included a mandatorily strong preference for inlining always-inline functions? But still missed some recursive cases. So D86988 made the always inliner run right next to/before the inliner in the NPM.

Now there's tihs patch, to implement the AlwaysInliner using the inliner - but is also changing the order of passes to improve optimization opportunities by doing some cleanup after always inlining?

It doesn't quite change the order D86988 introduced. Specifically, D86988 ran AlwaysInliner (a module pass) first, then let the Inliner and function optimizations happen.
This patch keeps the order between doing mandatory inlinings and inlinings. But, in addition, if in the future we want to also perform some of the function passes that happen in the inliner case, to help the full inliner, we can more easily do so.

I'm still a bit confused/trying to understand better - am I understanding correctly when I say: D86988 added always inlining (for the NPM) as a separate process within the non-always inliner? And this patch you're proposing. breaks always inlining out into a separate pass proper, so that at some point, if someone wanted to (but not being done in this patch) they could put some passes in between the two runs of inlining (always and non-always)?

(I guess one thing I might be especially confused about is the "reuse X to do Y" would, to me, immediately lead me to think about "so I expect to see a bunch of deleted code because X presumably was doing a bunch of stuff itself that it now doesn't have to" - but I guess that's not the case here? (at least I don't see a large bunch of deletion I'd expect to see if some kind of inlining implementation was being deleted))

Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve

That confuses me a bit - is that suggesting that we don't run the AlwaysInliner when we are running the Inliner (ie: we only run the AlwaysInliner at -O0, and use the Inliner at higher optimization levels and let the Inliner do always inlining too)?
& sounds like this is suggesting that would change? That we would now perform always inlining separately from inlining? Maybe that's an orthogonal/separate change from one implementing the always inlining using the Inliner being run in a separate mode?

In the NPM, we didn't run the AlwaysInliner until D86988. See also the discussion there. The normal inliner pass was, and still is, taking care of the mandatory inlinings if it finds them. Of course, if we completely upfronted those (which this patch can do), then the normal inliner wouldn't need to. I'm not suggesting changing that - meaning, it's straightforward for the normal inliner to take care of mandatory and policy-driven inlinings. The idea, though, is that if we upfront the mandatory inlinings, the shape of the call graph the inliner operates over is simpler and the effects of inlining probably more easy to glean by the decision making policy. There are trade-offs, though - we can increase that "ease of gleaning" by performing more function simplification passes between the mandatory inlinings and the full inliner.

OK, so if I understand correctly with the old Pass Manager there were two separate passes (always inliner and inliner - they share some code though, yeah?)

AlwaysInlinerLegacyPass does, yes. The NPM variant doesn't.

The NPM always inliner doesn't share any code with the NPM non-always inliner? (though this ( https://reviews.llvm.org/D86988 ) is the patch that added a separate always inliner to the NPM, right? And that patch doesn't look like it adds a whole new pass implementation - so looks like it's sharing some code with something?)

There was already an AlwaysInliner for the NPM, just wasn't used. So D86966 hooked that up in the NPM, basically. The implementation of that AlwaysInliner is separate from the Inliner pass. See Transforms/IPO/AlwaysInliner.cpp lines 36 - 114, vs Inliner.cpp, from 687 onwards.

and they were run in the pass pipeline but potentially (definitely?) not adjacent?

From what I can see, the legacy one was used only in the O0/O1 cases, see clang/lib/CodeGen/BackendUtil,cpp:643. The full inliner isn't.

The full inliner isn't.. isn't run at -O0/-O1? So with the Legacy Pass Manager one inliner (always or non-always) was used in a given compilation, not both? (so I guess then the non-always inliner did the always-inlining in -O2 and above in the old pass manager? But didn't have the same recursive always inlining miss that the NPM non-always inliner had?)

Yup, see BackendUtil.cpp:634. Can't comment on the latter problem.

New pass manager survived for quite a while with only one inlining pass, that included a mandatorily strong preference for inlining always-inline functions? But still missed some recursive cases. So D86988 made the always inliner run right next to/before the inliner in the NPM.

Now there's tihs patch, to implement the AlwaysInliner using the inliner - but is also changing the order of passes to improve optimization opportunities by doing some cleanup after always inlining?

It doesn't quite change the order D86988 introduced. Specifically, D86988 ran AlwaysInliner (a module pass) first, then let the Inliner and function optimizations happen.
This patch keeps the order between doing mandatory inlinings and inlinings. But, in addition, if in the future we want to also perform some of the function passes that happen in the inliner case, to help the full inliner, we can more easily do so.

I'm still a bit confused/trying to understand better - am I understanding correctly when I say: D86988 added always inlining (for the NPM) as a separate process within the non-always inliner? And this patch you're proposing. breaks always inlining out into a separate pass proper, so that at some point, if someone wanted to (but not being done in this patch) they could put some passes in between the two runs of inlining (always and non-always)?

Yes. Nit on the first sentence: it's not "a separate process *within* the non-always inliner". It's a separate module pass part of the module pass manager that wraps the full inliner and related passes.

(I guess one thing I might be especially confused about is the "reuse X to do Y" would, to me, immediately lead me to think about "so I expect to see a bunch of deleted code because X presumably was doing a bunch of stuff itself that it now doesn't have to" - but I guess that's not the case here? (at least I don't see a large bunch of deletion I'd expect to see if some kind of inlining implementation was being deleted))

See the note to @aeubanks' - indeed, we can remove the NPM AlwaysInliner as result of this change.

Thanks for the walkthroughs/help. Also stared at the code a bit. I think I get it now. Some of the confusion also came from having both LPM and NPM versions of the always inliner in the same file, though they seem to share no code.

I'll leave the more nuanced review to folks more familiar with it - sorry for any noise.

What about removing the existing AlwaysInlinerPass and replacing it with this one? Or is that something you were planning to do in a follow-up change?

That's the plan, yes.

open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before the full inliner

This change doesn't run the function simplification pipeline between the mandatory and full inliner though, only

if (AttributorRun & AttributorRunOption::CGSCC)
  MainCGPipeline.addPass(AttributorCGSCCPass());

if (PTO.Coroutines)
  MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));

// Now deduce any function attributes based in the current code.
MainCGPipeline.addPass(PostOrderFunctionAttrsPass());

Right - my point was that we could more easily explore doing so by having this new AlwaysInliner separate. In this patch I included those additional passes because I thought they may be necessary or beneficial, but I can remove them as a first step if they are not necessary. @jdoerfert - should the Attributor be run post-always inlining, or is it fine to not be run?

I'd say start off without running any passes and keeping the status quo (i.e. an NFC patch), then explore adding passes between the inliners in a future patch.

And is there any evidence that running the function simplification pipeline between the mandatory and full inliner is helpful? It could affect compile times.

In the ML-driven -Oz case, we saw some marginal improvement. I haven't in -O3 cases using the "non-ml" inliner. I suspect that it helps in the ML policy case (both -Oz and -O3, on which we're currently working), because: 1) the current -Oz takes some global (module-wide) features, so probably simplifying out the trivial cases helps; and 2) we plan on taking into consideration regions of the call graph, and the intuition is that eliminating the trivial cases (mandatory cases) would rise the visibility (for a training algorithm) of the non-trivial cases.

I'd think that adding the mandatory inliner right before the full inliner in the same CGSCC pass manager would do the job. e.g. add it in ModuleInlinerWrapperPass::ModuleInlinerWrapperPass() right before PM.addPass(InlinerPass());

It would, and what I'm proposing here is equivalent to that, but the proposal here helps with these other explorations, with (arguably) not much of a difference cost-wise in itself (meaning, of course, if we discover there's benefit in running those additional passes, we pay with compile time, but in of itself, factoring the always inliner in its own wrapper, or in the same wrapper as the inliner, doesn't really come at much of a cost).

Now, if we determine there is no value, we can bring it back easily - wdyt?

I'll run this through llvm-compile-time-tracker to see what the compile time implications are.

mtrofin updated this revision to Diff 306141.Nov 18 2020, 9:44 AM

Running just the always inliner variant, without other passes.

I'll run this through llvm-compile-time-tracker to see what the compile time implications are.

You mean for the variant where we ran some of the function passes, or you'd try running all of them? Probably the latter would be quite interesting as a 'worst case'.

I was trying the previous patch, but will also try running all function passes, definitely would be interesting.

What about removing the existing AlwaysInlinerPass and replacing it with this one? Or is that something you were planning to do in a follow-up change?

That's the plan, yes.

open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before the full inliner

This change doesn't run the function simplification pipeline between the mandatory and full inliner though, only

if (AttributorRun & AttributorRunOption::CGSCC)
  MainCGPipeline.addPass(AttributorCGSCCPass());

if (PTO.Coroutines)
  MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));

// Now deduce any function attributes based in the current code.
MainCGPipeline.addPass(PostOrderFunctionAttrsPass());

Right - my point was that we could more easily explore doing so by having this new AlwaysInliner separate. In this patch I included those additional passes because I thought they may be necessary or beneficial, but I can remove them as a first step if they are not necessary. @jdoerfert - should the Attributor be run post-always inlining, or is it fine to not be run?

I'd say start off without running any passes and keeping the status quo (i.e. an NFC patch), then explore adding passes between the inliners in a future patch.

Done

And is there any evidence that running the function simplification pipeline between the mandatory and full inliner is helpful? It could affect compile times.

In the ML-driven -Oz case, we saw some marginal improvement. I haven't in -O3 cases using the "non-ml" inliner. I suspect that it helps in the ML policy case (both -Oz and -O3, on which we're currently working), because: 1) the current -Oz takes some global (module-wide) features, so probably simplifying out the trivial cases helps; and 2) we plan on taking into consideration regions of the call graph, and the intuition is that eliminating the trivial cases (mandatory cases) would rise the visibility (for a training algorithm) of the non-trivial cases.

I'd think that adding the mandatory inliner right before the full inliner in the same CGSCC pass manager would do the job. e.g. add it in ModuleInlinerWrapperPass::ModuleInlinerWrapperPass() right before PM.addPass(InlinerPass());

It would, and what I'm proposing here is equivalent to that, but the proposal here helps with these other explorations, with (arguably) not much of a difference cost-wise in itself (meaning, of course, if we discover there's benefit in running those additional passes, we pay with compile time, but in of itself, factoring the always inliner in its own wrapper, or in the same wrapper as the inliner, doesn't really come at much of a cost).

Now, if we determine there is no value, we can bring it back easily - wdyt?

I'll run this through llvm-compile-time-tracker to see what the compile time implications are.

You mean for the variant where we ran some of the function passes, or you'd try running all of them? Probably the latter would be quite interesting as a 'worst case'.

I'll run this through llvm-compile-time-tracker to see what the compile time implications are.

You mean for the variant where we ran some of the function passes, or you'd try running all of them? Probably the latter would be quite interesting as a 'worst case'.

I was trying the previous patch, but will also try running all function passes, definitely would be interesting.

Awesome! Thanks!

If the rest of the (now NFC) patch seems reasonable, I'll go through those pesky pass manager tests to finish it up.

One thing that would be nice would be to have both inliners in the same CGSCC pass manager to avoid doing SCC construction twice, but that would require some shuffling of module/cgscc passes in ModuleInlinerWrapperPass. Maybe as a future cleanup.

One thing that would be nice would be to have both inliners in the same CGSCC pass manager to avoid doing SCC construction twice, but that would require some shuffling of module/cgscc passes in ModuleInlinerWrapperPass. Maybe as a future cleanup.

There's that benefit to simplifying the module with the always inliner before doing inlining "in earnest" I was pointing earlier at: for the ML policies work, we plan on capturing (sub)graph information. Using the same SCC would not help because the "higher" (callers) parts of the graph would have these mandatory inlinings not completed yet, and thus offer a less accurate picture of the problem space.

One thing that would be nice would be to have both inliners in the same CGSCC pass manager to avoid doing SCC construction twice, but that would require some shuffling of module/cgscc passes in ModuleInlinerWrapperPass. Maybe as a future cleanup.

There's that benefit to simplifying the module with the always inliner before doing inlining "in earnest" I was pointing earlier at: for the ML policies work, we plan on capturing (sub)graph information. Using the same SCC would not help because the "higher" (callers) parts of the graph would have these mandatory inlinings not completed yet, and thus offer a less accurate picture of the problem space.

Oh I see, caller information is useful.

For compile times: http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks.
The previous version of this patch (perf/npmalways) running a couple passes has some small but measurable overhead on some benchmarks, 0.5%.
The version of running everything (perf/npmalways2) hugely increases compile times, almost by 50% in one case.

One thing that would be nice would be to have both inliners in the same CGSCC pass manager to avoid doing SCC construction twice, but that would require some shuffling of module/cgscc passes in ModuleInlinerWrapperPass. Maybe as a future cleanup.

There's that benefit to simplifying the module with the always inliner before doing inlining "in earnest" I was pointing earlier at: for the ML policies work, we plan on capturing (sub)graph information. Using the same SCC would not help because the "higher" (callers) parts of the graph would have these mandatory inlinings not completed yet, and thus offer a less accurate picture of the problem space.

Oh I see, caller information is useful.

For compile times: http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks.
The previous version of this patch (perf/npmalways) running a couple passes has some small but measurable overhead on some benchmarks, 0.5%.
The version of running everything (perf/npmalways2) hugely increases compile times, almost by 50% in one case.

Thanks for doing this! Really good to have this data.

mtrofin updated this revision to Diff 306593.Nov 19 2020, 9:11 PM

patched up tests

From a ThinLTO perspective, no specific concerns as the buildModuleSimplificationPipeline is invoked in both the pre and post LTO link pipelines, so they both get an equivalent change. But there is an issue for regular LTO, noted below.

llvm/test/Other/new-pm-lto-defaults.ll
70

Note there is no corresponding add of an additional InlinerPass like in the other files. The reason is that PassBuilder::buildLTODefaultPipeline doesn't invoke buildModuleSimplificationPipeline, or even buildInlinerPipeline (it has a separate pipeline setup for compile time reasons due to the monolithic nature of the post-LTO link compilation), but rather directly adds ModuleInlinerWrapperPass. So you'll want to add the additional ModuleInlinerWrapperPass invocation there as well.

mtrofin updated this revision to Diff 308422.Nov 30 2020, 10:46 AM

Fixed the LTO case.

Also fixed the p46945 test, which, post - D90566, was passing without the need of a preliminary always-inlier pass.
The reason is that the order of the traversal of the functions in a SCC changed. The test requies that the 'alwaysinline'
function be processed first (to render it recursive and, thus, uninlinable).

aeubanks accepted this revision.Nov 30 2020, 11:11 AM

aside from some nits, lgtm
thanks for doing this!

clang/test/Frontend/optimization-remark-line-directive.c
5

the change on this line shouldn't be necessary, this is a legacy PM RUN line

llvm/include/llvm/Analysis/InlineAdvisor.h
27

ping

llvm/test/Transforms/Inline/pr46945.ll
1–4

maybe we should have a RUN line with -passes='default<O2>' to make sure the whole thing works

This revision is now accepted and ready to land.Nov 30 2020, 11:11 AM
mtrofin updated this revision to Diff 308441.Nov 30 2020, 11:49 AM
mtrofin marked 5 inline comments as done.

fixes

llvm/include/llvm/Analysis/InlineAdvisor.h
27

sorry - done

This revision was landed with ongoing or failed builds.Nov 30 2020, 12:03 PM
This revision was automatically updated to reflect the committed changes.