This is an archive of the discontinued LLVM Phabricator instance.

[PM] Port the always inliner to the new pass manager in a much more minimal and boring form than the old pass manager's version.
ClosedPublic

Authored by chandlerc on Aug 9 2016, 2:03 AM.

Details

Summary

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

  • Array alloca merging
    • To support the above, bottom-up inlining with careful history tracking and call graph updates
  • DCE of the functions that become dead after this inlining.
  • Inlining through call instructions with the always_inline attribute. Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
heard and understand why. Again, this is all solveable by factoring that
logic into a utility and calling it here, but I'd like to wait to do
that until there is a clear reason why the existing pass-based factoring
won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 67299.Aug 9 2016, 2:03 AM
chandlerc retitled this revision from to [PM] Port the always inliner to the new pass manager in a much more minimal and boring form than the old pass manager's version..
chandlerc updated this object.
chandlerc added a subscriber: llvm-commits.
davide added a subscriber: davide.Aug 9 2016, 12:47 PM

Some comments.

lib/Passes/PassRegistry.def
40 ↗(On Diff #67299)

Is there a particular reason why you renamed always-inline to always-inliner?
I think we should try to keep the names of the passes consistent between old and new PM (unless there's a reason not to)/

lib/Transforms/IPO/AlwaysInliner.cpp
53 ↗(On Diff #67299)

Changed |= InlineFunction(CS, IFI); , no?

94–97 ↗(On Diff #67299)

I'm not entirely sure here, but are all these dependencies actually needed?

test/Transforms/Inline/always-inline.ll
4 ↗(On Diff #67299)

These tests modification (at least part of them) can be committed separately, probably.

chandlerc marked 2 inline comments as done.Aug 9 2016, 3:48 PM

Thanks for the review. See some responses inline, but largely good catches! Updated patch momentarily.

lib/Passes/PassRegistry.def
40 ↗(On Diff #67299)

No actually... I'm not sure what the best thing to do in this particular case is...

The class is named AlwaysInliner and that seems the right thing -- it should be a noun, etc. I've adjusted the file names to match which seems the right thing.

I had adjusted the pass name to match as well, but I agree it is a bit vague whether we should do this or not. The pass names don't actually have the noun pattern necessarily. For now, I'll take this back to the old name. We can always pursue some correspondence between pass name and class name later if that's desirable. Currently (as this diff hunk illustrates) we're no where near anyways.

lib/Transforms/IPO/AlwaysInliner.cpp
53 ↗(On Diff #67299)

Well, the FIXME was about potentially asserting on this inlined variable. But yea, I guess the variable isn't serving any purpose until then. Nuked.

94–97 ↗(On Diff #67299)

For the legacy one, I think so, because of the inliner base class. I'm not changing any of it in this patch at least...

test/Transforms/Inline/always-inline.ll
4 ↗(On Diff #67299)

I don't think that really makes sense...

The only changes here are to allow one of the test cases to be omitted with the new pass manager's pass because it doesn't support that case. But none of these changes would be required without that, so it seems best to put them into this patch?

chandlerc updated this revision to Diff 67427.Aug 9 2016, 3:49 PM
chandlerc marked 2 inline comments as done.
chandlerc edited edge metadata.

Update with fixes suggested in review!

davide added a subscriber: davidxl.Aug 9 2016, 4:05 PM

+ @davidxl

I have a general concern about changing the behavior of a pass during the transition to the new PM.
I don't think it's entirely unreasonable but, dependently on how many people care about this pass you may want to confirm that:

  1. The behavioral changes you're proposing don't regress compile time/runtime performance
  2. Checking that this design you're proposing is actually faster
eraman added a subscriber: eraman.Aug 9 2016, 6:06 PM
eraman added inline comments.
include/llvm/Transforms/IPO/AlwaysInliner.h
28 ↗(On Diff #67427)

Incomplete comment.

lib/Transforms/IPO/AlwaysInliner.cpp
37 ↗(On Diff #67427)

InlineFunction makes use of IFI.GetAssumptionCache in AddAlignmentAssumptions.

Thanks!

include/llvm/Transforms/IPO/AlwaysInliner.h
28 ↗(On Diff #67427)

Hah, not only incomplete, very stale. I totally changed my mind about how to do this after writing the comment. I've updated it to match the implementation and the patch description. Thanks!

lib/Transforms/IPO/AlwaysInliner.cpp
37 ↗(On Diff #67427)

Good catch. I thought that AddAlignmentAssumptions was disabled in the face of a null assumption cache, but I see it isn't. =[ I've just added a check to that routine. For -O0 it seems better to not spend time adding assume intrinsics here, and this is really intended to be an -O0 style pass.

(Of course, we can add some of this back very easily if there are good reasons, I was just starting simple.)

Perhaps more importantly, I've added a test case that would have caught this bug. =]

chandlerc updated this revision to Diff 67450.Aug 9 2016, 6:38 PM

Address review comments.

davidxl added inline comments.Aug 10 2016, 10:34 AM
test/Transforms/Inline/always-inline.ll
137 ↗(On Diff #67450)

Some code may depend on the always-inline to be done for correctness (e.g not expecting calls at runtime for some functions). I forgot the details, but I remember cases like that.

davidxl edited edge metadata.Aug 11 2016, 10:32 AM

Why is this pass implemented as a module pass (instead of cgscc pass as before)? Is there any compile time concerns (at O0) ?

Why is this pass implemented as a module pass (instead of cgscc pass as before)? Is there any compile time concerns (at O0) ?

As I tried to explain in the patch description (and let me know if I should improve it): because a module pass is simpler, and avoids computing the call graph when we don't need it.

It might help compile time, but that wasn't my primary concern.

yes, I understand the module pass is quite simple and straightforward, but it seems to me the legacy always inliner is even simpler -- as it is simply one 'instantiation' of a common inliner implementation with one inline cost hook provided :) Have we compared the pros-and-cons of the two approaches?

The pro of using module pass include : avoid building CG so it might be a compile time win (depending on how expensive GlobalDCE is).

The cons I can see:

  1. need to insert life time marker at O0 and depend on stack coloring to be turned on at O0 (which can be expensive)
  2. need to run GlobalDCE
  3. It is likely in the future CG is needed at O0 for other reasons, then the benefit of avoiding CG will be gone.
  4. less code sharing.

Did I miss others?

For the record, I am all for a more general Module-pass based inliner. The motivation is that it allows us to do a very quick round of priority based inlining (e.g, wrapper call inliing or other top-down heuristics) to avoid the limitation of bottom-up inlining. However the general framework still needs CG (and update) though.

Why is this pass implemented as a module pass (instead of cgscc pass as before)? Is there any compile time concerns (at O0) ?

As I tried to explain in the patch description (and let me know if I should improve it): because a module pass is simpler, and avoids computing the call graph when we don't need it.

It might help compile time, but that wasn't my primary concern.

yes, I understand the module pass is quite simple and straightforward, but it seems to me the legacy always inliner is even simpler -- as it is simply one 'instantiation' of a common inliner implementation with one inline cost hook provided :) Have we compared the pros-and-cons of the two approaches?

The pro of using module pass include : avoid building CG so it might be a compile time win (depending on how expensive GlobalDCE is).

The cons I can see:

  1. need to insert life time marker at O0 and depend on stack coloring to be turned on at O0 (which can be expensive)
  2. need to run GlobalDCE
  3. It is likely in the future CG is needed at O0 for other reasons, then the benefit of avoiding CG will be gone.
  4. less code sharing.

Did I miss others?

When used in the -O1 pipeline, we might end up inlining less with the new AlwaysInliner. This is because the isInlineViable call scans the callee for conditions that disallow inlining and it matters whether some function pass gets rid of them (if they are in unreachable paths, for example). In the case of existing CGSCC AlwaysInliner pass, since we optimize a CGSCC node with function passes before moving to its parent, isInlineViable could get more precise result. No such cleanup happens in the module pass (unless we run some cleanup passes before the module pass). I don't think this difference matters much in practice though.

yes, I understand the module pass is quite simple and straightforward, but it seems to me the legacy always inliner is even simpler -- as it is simply one 'instantiation' of a common inliner implementation with one inline cost hook provided :) Have we compared the pros-and-cons of the two approaches?

I mean, yes. ;] I didn't do this without some careful thought.

The pro of using module pass include : avoid building CG so it might be a compile time win (depending on how expensive GlobalDCE is).

  • Avoid coupling the always inliner which needs no cost model to an inliner built entirely around cost modeling
  • Avoid abstractions between the always cost model (or lack there of) and a concrete cost model
  • Avoid the complexity of rigging up and potentially emitting remarks when the decision of whether or not to inline is completely predictable from source.

The cons I can see:

  1. need to insert life time marker at O0 and depend on stack coloring to be turned on at O0 (which can be expensive)

If we need to do this at all. It isn't clear that we do. I've specifically said we can add alloca merging as a follow-up if it proves necessary.

  1. need to run GlobalDCE

This adds no complexity though. The goal is to de-couple things which should make it overall more simple.

  1. It is likely in the future CG is needed at O0 for other reasons, then the benefit of avoiding CG will be gone.

I don't see how this can be a con... It seems circular.

  1. less code sharing.

I disagree. This shares all relevant logic with the existing inliner via the InlineFunction routine.

Did I miss others?

I outlined all of the ones I saw in the patch description already (and it does include a couple you didn't mention) but I also described why I don't find them compelling to use a more complex approach.

It is probably worth highlighting that this new pass requires *less code* than the old version does even if we don't count any of the common inliner code or logic shared with other inliners. Despite the fact that some of that code only exists to support this pass's use case.

For the record, I am all for a more general Module-pass based inliner. The motivation is that it allows us to do a very quick round of priority based inlining (e.g, wrapper call inliing or other top-down heuristics) to avoid the limitation of bottom-up inlining. However the general framework still needs CG (and update) though.

I think that is a completely different discussion. My goal is for the always inliner to be the simplest thing possible that merely inlines function bodies based on a single signal: the alwaysinline attribute.

Did I miss others?

When used in the -O1 pipeline, we might end up inlining less with the new AlwaysInliner.

The fact that -O1 uses the always inliner ... doesn't make much sense IMO. In fact, I suspect most don't realize that this is the case. It dates from r89464 in 2009 when this logic was added as part of some code dump of option parsing -- I suspect ported from the python driver wrapper. I don't see any particular logic for this model. I strongly suspect we should do something closer to -Os's inlining logic at -O1.

And I think we have the flexibility to do exactly this with the new PM. I'm not changing the old pass manager in any way.

This is because the isInlineViable call scans the callee for conditions that disallow inlining and it matters whether some function pass gets rid of them (if they are in unreachable paths, for example). In the case of existing CGSCC AlwaysInliner pass, since we optimize a CGSCC node with function passes before moving to its parent, isInlineViable could get more precise result. No such cleanup happens in the module pass (unless we run some cleanup passes before the module pass). I don't think this difference matters much in practice though.

Yea, this is a difference, but I strongly agree it doesn't matter in practice.

Notably, we'd really like the alwaysinline to not be a hint but a guarantee. As such, it seems like a much bigger problem if some code has this attribute and relies on DCE or something else to be viable for inlining.

davidxl accepted this revision.Aug 12 2016, 8:56 AM
davidxl edited edge metadata.

I don't know why it is a bad thing to use a super simple cost model for always-inliner (as done by Old pass manager).

Anyway, I don't think this patch should be held because of the difference in opinions here: it is not yet on by default and if we see problems, it can always be revisited (assuming implementing this as regular CG based inliner in new PM is possible) later.

I do think a follow up is needed to handle always-inline callsite attribute.

lgtm

This revision is now accepted and ready to land.Aug 12 2016, 8:56 AM

yes, I understand the module pass is quite simple and straightforward, but it seems to me the legacy always inliner is even simpler -- as it is simply one 'instantiation' of a common inliner implementation with one inline cost hook provided :) Have we compared the pros-and-cons of the two approaches?

I mean, yes. ;] I didn't do this without some careful thought.

The pro of using module pass include : avoid building CG so it might be a compile time win (depending on how expensive GlobalDCE is).

  • Avoid coupling the always inliner which needs no cost model to an inliner built entirely around cost modeling
  • Avoid abstractions between the always cost model (or lack there of) and a concrete cost model
  • Avoid the complexity of rigging up and potentially emitting remarks when the decision of whether or not to inline is completely predictable from source.

The cons I can see:

  1. need to insert life time marker at O0 and depend on stack coloring to be turned on at O0 (which can be expensive)

If we need to do this at all. It isn't clear that we do. I've specifically said we can add alloca merging as a follow-up if it proves necessary.

  1. need to run GlobalDCE

This adds no complexity though. The goal is to de-couple things which should make it overall more simple.

  1. It is likely in the future CG is needed at O0 for other reasons, then the benefit of avoiding CG will be gone.

I don't see how this can be a con... It seems circular.

  1. less code sharing.

I disagree. This shares all relevant logic with the existing inliner via the InlineFunction routine.

Did I miss others?

I outlined all of the ones I saw in the patch description already (and it does include a couple you didn't mention) but I also described why I don't find them compelling to use a more complex approach.

It is probably worth highlighting that this new pass requires *less code* than the old version does even if we don't count any of the common inliner code or logic shared with other inliners. Despite the fact that some of that code only exists to support this pass's use case.

For the record, I am all for a more general Module-pass based inliner. The motivation is that it allows us to do a very quick round of priority based inlining (e.g, wrapper call inliing or other top-down heuristics) to avoid the limitation of bottom-up inlining. However the general framework still needs CG (and update) though.

I think that is a completely different discussion. My goal is for the always inliner to be the simplest thing possible that merely inlines function bodies based on a single signal: the alwaysinline attribute.

Did I miss others?

When used in the -O1 pipeline, we might end up inlining less with the new AlwaysInliner.

The fact that -O1 uses the always inliner ... doesn't make much sense IMO. In fact, I suspect most don't realize that this is the case. It dates from r89464 in 2009 when this logic was added as part of some code dump of option parsing -- I suspect ported from the python driver wrapper. I don't see any particular logic for this model. I strongly suspect we should do something closer to -Os's inlining logic at -O1.

And I think we have the flexibility to do exactly this with the new PM. I'm not changing the old pass manager in any way.

This is because the isInlineViable call scans the callee for conditions that disallow inlining and it matters whether some function pass gets rid of them (if they are in unreachable paths, for example). In the case of existing CGSCC AlwaysInliner pass, since we optimize a CGSCC node with function passes before moving to its parent, isInlineViable could get more precise result. No such cleanup happens in the module pass (unless we run some cleanup passes before the module pass). I don't think this difference matters much in practice though.

Yea, this is a difference, but I strongly agree it doesn't matter in practice.

Notably, we'd really like the alwaysinline to not be a hint but a guarantee. As such, it seems like a much bigger problem if some code has this attribute and relies on DCE or something else to be viable for inlining.

I don't know why it is a bad thing to use a super simple cost model for always-inliner (as done by Old pass manager).

Anyway, I don't think this patch should be held because of the difference in opinions here: it is not yet on by default and if we see problems, it can always be revisited (assuming implementing this as regular CG based inliner in new PM is possible) later.

I do think a follow up is needed to handle always-inline callsite attribute.

lgtm

Thanks. I'll try to re-invigorate the always-inline refinement that was already underway.

This revision was automatically updated to reflect the committed changes.

Hi, I'm attempting to fix all clang tests that fail when enabling the new PM by default and one of the failing tests is CodeGen/flatten.c which tests the flatten attribute. According to the docs, this attribute attempts to inline function calls in functions marked with flatten. But if I understand this patch correctly, one of the intentions is only inlining for functions and not calls.

Is it set in stone that the new PM will not inline calls? Otherwise, it seems to break flatten for -O0 cases with new PM. At a quick glance, it seems that we can enable this on callsites by adding a check for AlwaysInline here and removing the F.hasFnAttribute(Attribute::AlwaysInline) here.

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2019, 5:30 PM
llvm/trunk/lib/Transforms/IPO/IPO.cpp