This is an archive of the discontinued LLVM Phabricator instance.

[PM/ThinLTO] Port the ThinLTO pipeline (both components) to the new PM.
ClosedPublic

Authored by chandlerc on May 25 2017, 2:22 AM.

Details

Summary

Based on the original patch by Davide, but I've adjusted the API exposed
to just be different entry points rather than exposing more state
parameters. I've factored all the common logic out so that we don't have
any duplicate pipelines, we just stitch them together in different ways.
I think this makes the build easier to reason about and understand.

This adds a direct method for getting the module simplification pipeline
as well as a method to get the optimization pipeline. While not my
express goal, this seems nice and gives a good place comment about the
restrictions that are imposed on them.

I did make some minor changes to the way the pipelines are structured
here, but hopefully not ones that are significant or controversial:

  1. I sunk the PGO indirect call promotion to only be run when we have PGO enabled (or as part of the special ThinLTO pipeline).
  1. I made the extra GlobalOpt ron in ThinLTO just happen all the time and at a slightly more powerful place (before we remove available externaly functions). This seems like general goodness and not a big compile time sink, so it didn't make sense to *only* use it in ThinLTO. Fewer differences in the pipeline makes everything simpler IMO.
  1. I hoisted the ThinLTO stop point pre-link above the partial inliner and the RPO function attr inference. Partial inlining isn't going to make the code smaller and it seems better to just do that once post-link. And the RPO inference won't infer anything terribly meaningful pre-link (recursiveness?) so it didn't make a lot of sense. But if teh RPO inference starts to matter, we should move it to the canonicalization phase anyways which seems like a better place for it (and there is a FIXME to this effect!). But that seemed a bridge too far for this patch.

If we ever need to parameterize these pipelines more heavily, we can
always sink the logic to helper functions with parameters to keep those
parameters out of the public API. But the changes above seemed minor
that we could possible get away without the parameters entirely.

Lastly, this factoring does introduce a nesting layer of module pass
managers in the default pipeline. I don't think this is a big deal and
the flexibility of decoupling the pipelines seems easily worth it.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.May 25 2017, 2:22 AM
tejohnson edited edge metadata.May 25 2017, 7:12 AM

I hoisted the ThinLTO stop point pre-link above the partial inliner and the RPO function attr inference. Partial inlining isn't going to make the code smaller and it seems better to just do that once post-link.

Added davidxl to comment on this change. It seems like doing some partial inlining during the pre-link for ThinLTO will have an effect on the importing. Also, it is pre-link (as well as post-link) in the old pass pipeline, so this would introduce an inconsistency between the pass pipelines. May be better to keep it in the same places, and evaluate the effect of removing it from the pre-link pass pipeline separately?

lib/Passes/PassBuilder.cpp
583 ↗(On Diff #100214)

s/if/since/ (we aren't compiling an object file for later LTO). Since we don't invoke this in the pre-link *LTO pipelines.

708 ↗(On Diff #100214)

I don't see this getting invoked.

chandlerc marked an inline comment as done.May 25 2017, 2:43 PM

I hoisted the ThinLTO stop point pre-link above the partial inliner and the RPO function attr inference. Partial inlining isn't going to make the code smaller and it seems better to just do that once post-link.

Added davidxl to comment on this change. It seems like doing some partial inlining during the pre-link for ThinLTO will have an effect on the importing. Also, it is pre-link (as well as post-link) in the old pass pipeline, so this would introduce an inconsistency between the pass pipelines. May be better to keep it in the same places, and evaluate the effect of removing it from the pre-link pass pipeline separately?

Well, partial inliner isn't run at all yet. So my hope was it can get moved around later.

I somewhat wanted to make the split line up with the simplification split that is already there in the pipeline. But if you feel strongly we need to keep these exactly the same, I can essentially copy the extra passes into the tail of the thinlto prelink stuff.

lib/Passes/PassBuilder.cpp
708 ↗(On Diff #100214)

Yeah, the original patch had a change to Clang, but I wanted to get a submittable LLVM-focused version.

I'll add support for parsing these pipelines much like we support parsing 'default<O2>' and such so that we have some coverage of stuff.

davidxl edited edge metadata.May 25 2017, 2:50 PM

For partial inlining, leave it consistent with old PM for now. We can introduce later an internal option to enable/disable pre-thin-link partial inlining and collect more performance number.

For partial inlining, leave it consistent with old PM for now. We can introduce later an internal option to enable/disable pre-thin-link partial inlining and collect more performance number.

Is there a rationale for the position in the old PM? Was this discussed somewhere? I'd almost rather adjust the position in the old PM to be consistent with here. Without data, I think the position I gave it here actually fits with the clear intent indicated by the comments. The previous position, honestly, looked like an accident to me. Maybe it isn't but I couldn't find any really clear reason written down anywhere, and the only test that fails if I reorder these things is just the one that asserts a particular order.

Anyways, I'll change this back to the old PM one for now, but I really do want to understand why this is the right initial ordering to use.

For partial inlining, leave it consistent with old PM for now. We can introduce later an internal option to enable/disable pre-thin-link partial inlining and collect more performance number.

Is there a rationale for the position in the old PM? Was this discussed somewhere? I'd almost rather adjust the position in the old PM to be consistent with here. Without data, I think the position I gave it here actually fits with the clear intent indicated by the comments. The previous position, honestly, looked like an accident to me. Maybe it isn't but I couldn't find any really clear reason written down anywhere, and the only test that fails if I reorder these things is just the one that asserts a particular order.

Anyways, I'll change this back to the old PM one for now, but I really do want to understand why this is the right initial ordering to use.

Thanks - I'd like the ability to be able to enable/disable it in the pre-link optimization step when I test its performance with ThinLTO.

I was aware that partial inlining is available in both pre and post link thinLTO pipelines. It is just not tuned. In fact, the right thing to do for ThinLTO is only do function outlining in the prelink step -- this will shrink function size and exposes more importing. The regular inliner in the post -link phase can handle those functions properly. Post thin-link needs another round of partial inlining after the inliner.

I was aware that partial inlining is available in both pre and post link thinLTO pipelines. It is just not tuned. In fact, the right thing to do for ThinLTO is only do function outlining in the prelink step -- this will shrink function size and exposes more importing. The regular inliner in the post -link phase can handle those functions properly. Post thin-link needs another round of partial inlining after the inliner.

It sounds like the work around partial inlining really needs a wider discussion around the right design and approach. I can't find any RFC or other discussion about it. Could you write up your plans here and send them out? I think that would be very helpful.

When I discussed this a month or 2 ago with Davide I mentioned the long discussion I had about 5 years ago with Owen, Jakub, Eric, and others in the LLVM community around this. At the time, Owen strongly advised against using PartialInlining at all and to focus entirely on outlining to allow the normal inliner to do the partial inlining phase. That was a *long* time ago and so it may no longer be relevant, but we should at least have a pretty clear plan, rationale, concerns/tradeoffs discussion on the dev list.

chandlerc updated this revision to Diff 100328.May 25 2017, 4:21 PM

Address comments from both Teresa and David. Moved PartialIniling to run both
before and after thin link along with a FIXME to revisit the correct placement
of this pass long term.

Added testing facilities for the ThinLTO pipelines.

mehdi_amini edited edge metadata.May 26 2017, 9:44 AM

For partial inlining, leave it consistent with old PM for now. We can introduce later an internal option to enable/disable pre-thin-link partial inlining and collect more performance number.

Is there a rationale for the position in the old PM? Was this discussed somewhere?

Do you really think I flipped a coin? git log/git blame easily find the original review that introduced the ThinLTO pipeline: http://reviews.llvm.org/D17115

I'd almost rather adjust the position in the old PM to be consistent with here.

What data do you have? Your intuition alone is not a reason to go over the extensive tests we've been doing before setting it up the way it is. The pipeline is what it is now because I started with the same "intuition" and "rational" as you do, and tried to not perform inlining. However I did a bunch of measurement before landing it, and had to realize that (at that time at least), the bitcode was much larger, there were more functions to be imported, and it slowed down the compile time significantly (I'm not saying it is still the case, I don't know, but I don't see data here).

See also later tweaking in: http://reviews.llvm.org/D19773 that shifted more work from the thin-backend to the compile phase, which helped a lot building LLVM (since the compile-phase happens once per file while the Thin-backends run for every binary that is linked).

include/llvm/Passes/PassBuilder.h
256 ↗(On Diff #100328)

It is the first time I see "prelink" to describe the compile phase. I find this confusing. The "link" happens after ThinLTO, we never link the IR in ThinLTO.

Note: the plan was to tune the inliner heuristic in the compile phase differently from the thin-backends phase, but we never got to it. It is also already challenging to maintain and evolve the existing thresholds, that I'm not sure adding another "optimization goal" is sustainable.

Indeed, looks like the buildModuleSimplificationPipeline got me confused with buildFunctionSimplificationPipeline, sorry /me -> []

For partial inlining, leave it consistent with old PM for now. We can introduce later an internal option to enable/disable pre-thin-link partial inlining and collect more performance number.

Is there a rationale for the position in the old PM? Was this discussed somewhere?

Do you really think I flipped a coin? git log/git blame easily find the original review that introduced the ThinLTO pipeline: http://reviews.llvm.org/D17115

Just to fully close the loop on this: as David said this was *only* a question about the partial inliner (I'm aware of the clear discussion around the larger ThinLTO pipeline).

And even then, I absolutely don't believe it was "flipping a coin". =] I can imagine that there was some clear rationale and it maybe didn't get written down, or that there wasn't enough information at the time which is also fine but useful to understand when making subsequent decisions.

I'd almost rather adjust the position in the old PM to be consistent with here.

What data do you have? Your intuition alone is not a reason to go over the extensive tests we've been doing before setting it up the way it is. The pipeline is what it is now because I started with the same "intuition" and "rational" as you do, and tried to not perform inlining. However I did a bunch of measurement before landing it, and had to realize that (at that time at least), the bitcode was much larger, there were more functions to be imported, and it slowed down the compile time significantly (I'm not saying it is still the case, I don't know, but I don't see data here).

Yeah, sorry, this all only makes sense for *partial inlining*, not for the rest of it. Anyways... Sorry for confusion.

And even then, I want to be clear that if there have been extensive tests that show that the partial inliner needs to have a particular location, clearly that is more important. I just wasn't finding any documentation trail pointing at these results, and I think that if we actually have something better than intuition, it is important to document it where possible.

include/llvm/Passes/PassBuilder.h
256 ↗(On Diff #100328)

We don't link the IR, but we do something called 'thin link' which this is before, and so 'pre-link' didn't seem completely crazy to me.

It also has the advantage of matching the terminology used for normal LTO...

But I'm totally open to a better set of terminology here if you have some in mind. I'm worried that 'compile' may be too generic, but maybe it isn't. Thoughts?

(FYI, I think this is ready for another look, only outstanding issue I'm aware of is Mehdi's concern around 'pre-link' terminology and that really just needs anyone else with naming thoughts to chime in...)

(FYI, I think this is ready for another look, only outstanding issue I'm aware of is Mehdi's concern around 'pre-link' terminology and that really just needs anyone else with naming thoughts to chime in...)

FWIW I like both the name and the consistency...

I've also taken a quick look and other than a complaint about boolean parameters it looks ok to me.

lib/LTO/LTOBackend.cpp
140 ↗(On Diff #100328)

Boolean arguments are terrible. I don't have any better ideas at the moment that don't involve wide refactoring though.

Yeah, sorry, this all only makes sense for *partial inlining*, not for the rest of it. Anyways... Sorry for confusion.

Well, that was my bad for misunderstanding the context of your discussion! :)

include/llvm/Passes/PassBuilder.h
256 ↗(On Diff #100328)

I can just speak to myself: when I read "link" without more precision I think about the native link, that happens after LTO completes, so pre-link makes me think about right before it. But if I'm the only one to read it this way, forget it.

I never liked any of the naming around ThinLTO anyway: the backends without more precision are confusing with the LLVM target backends as well, and as you mention compile is quite generic and confusing. I try to be consistent and say "compile phase" and "thin backends" all the time when I have to refer to these.

Ping? I think I've addressed all the comments... Other patches are kinda blocked on this...

davide accepted this revision.May 31 2017, 4:03 PM

Just few comments.

lib/Passes/PassBuilder.cpp
751 ↗(On Diff #100328)

s/te/to/ I guess.

995–997 ↗(On Diff #100328)

this and the one below could be factored out in a single helper so we don't have to change two cases. It can be a follow-up.

This revision is now accepted and ready to land.May 31 2017, 4:03 PM
chandlerc marked 2 inline comments as done.Jun 1 2017, 4:39 AM

Thanks, landing with suggested fixes. (I also confirmed with David and Mehdi that they were happy.)

This revision was automatically updated to reflect the committed changes.
llvm/trunk/test/Other/new-pm-thinlto-defaults.ll