This is an archive of the discontinued LLVM Phabricator instance.

[PassManager] add late LICM to LTO pipeline to undo InstCombine sinking
Changes PlannedPublic

Authored by spatel on Apr 19 2021, 3:06 PM.

Details

Summary

This is an alternative to D87479 (where the proposed change was to InstCombine).
InstCombine sinks code without regard to cost/loops, so we don't want that to be near the final step in the opt pipeline. In the example test, an expensive fdiv gets hoisted out of a loop.

I don't have any guess about the impact this has on compile-time, or if we can position the extra LICM somewhere else to make it better/cheaper. The regular (non-LTO) pipeline already has a late LICM, so we don't have this problem with plain -O? compiles.

Diff Detail

Event Timeline

spatel created this revision.Apr 19 2021, 3:06 PM
spatel requested review of this revision.Apr 19 2021, 3:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 3:06 PM
llvm/lib/Passes/PassBuilder.cpp
1843–1847

Can you try something like this, ...

1853
llvm/test/Other/opt-LTO-pipeline.ll
181

This one is probably the most costly here.

189

..., i wonder if we could at least inline it into the FunctionPass Manager above

spatel updated this revision to Diff 338832.Apr 20 2021, 5:26 AM
spatel marked 2 inline comments as done.

Patch updated:
Add explicit late LoopPassManager for LICM and disable MemorySSA usage from LICM pass on NewPM.

Hopefully, that matches what was suggested in the inline comment. Let me know if not. I should learn more about how the new PM works...
It's not clear to me what the analogous change would look like for OldPM - set the Cap options to zero?

lebedev.ri added inline comments.Apr 20 2021, 6:09 AM
llvm/test/Other/opt-LTO-pipeline.ll
179–190

The problem here is that we do this after the FunctionPass Manager finishes.
This seems suboptimal from cache locality perspective,
and that is my only guess as to why we are seeing such huge compile-time regression
in full LTO only, because full lto results in *huge* modules
with lots of functions.

Patch updated:
Add explicit late LoopPassManager for LICM and disable MemorySSA usage from LICM pass on NewPM.

Hopefully, that matches what was suggested in the inline comment. Let me know if not. I should learn more about how the new PM works...
It's not clear to me what the analogous change would look like for OldPM - set the Cap options to zero?

That didn't help:
https://llvm-compile-time-tracker.com/compare.php?from=8a6772f3aa92fdf6d01303adfef0e05f5651ea7d&to=b39f226e628eb2b58e63848985bf5c2af8a1a8d7&stat=instructions

I personally think the problem isn't MSSA, but the fact that we run this Loop pass manager
after main function pass manager finishes, thus losing cache locality.

Err, after rereading this twice, this only touches LTO pipelines, doesn't it :]
Sorry,

Err, after rereading this twice, this only touches LTO pipelines, doesn't it :]

Right - sorry that wasn't clear. I'm only trying to get closer to parity/consensus between the regular and LTO pipelines with this patch.

I don't know why the pipelines would diverge at this stage of the compile - there might be a good reason, or it just evolved to this state unintentionally?

So if we look at this section of the pipeline (from vectorization to proposed LICM) in non-LTO mode, we have (table formatted for Phab):

RegularLTO
LoopVectorizePassLoopVectorizePass
LoopLoadEliminationPassLoopUnrollPass???
InstCombinePassInstCombinePass
SimplifyCFGPassSimplifyCFGPass
SCCPPass???
InstCombinePass???
BDCEPass???
SLPVectorizerPassSLPVectorizerPass
VectorCombinePassVectorCombinePass
InstCombinePassInstCombinePass
LoopUnrollPassJumpThreadingPass???
InstCombinePass???
LICMPassLICMPass

Maybe we should reconcile the pass differences ahead of LICM in the pipeline before adding LICM?

The change you made for the NewPM doesn't just disabled MSSA -- it enables AST instead, which is even worse :) What you'd actually want for this case is an additional option that disables the memory optimizations in LICM entirely and uses neither MSSA nor AST. That should make this change much cheaper, as MSSA/AST should be the primary costs here.

The change you made for the NewPM doesn't just disabled MSSA -- it enables AST instead, which is even worse :)

Oops!

What you'd actually want for this case is an additional option that disables the memory optimizations in LICM entirely and uses neither MSSA nor AST. That should make this change much cheaper, as MSSA/AST should be the primary costs here.

Ok - let me step back and ask: any ideas about what is an acceptable compile-time cost here?
It seems like we might be better off unifying the behavior between LTO and non-LTO (see table in previous comment) - potentially more compile-time savings?

I wasn't actually suggesting to disable MSSA, i was more commenting about doing this after FPM finishes.
Indeed, i think we should just match the behavior with normal optimization pipeline,
which probably means keeping MSSA.

~1% sounds about reasonable to me, i guess.

Matt added a subscriber: Matt.May 5 2021, 7:58 AM
spatel planned changes to this revision.May 13 2021, 8:23 AM

Taking this off the active queue for now. D102002 is the more general unification patch, but that's going to have to happen in pieces to avoid a slow and endless chain of difficult regressions.

lebedev.ri resigned from this revision.Jan 12 2023, 4:50 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:50 PM