This is an archive of the discontinued LLVM Phabricator instance.

Turn on flag to not re-run simplification pipeline.
ClosedPublic

Authored by asbirlea on Jul 12 2022, 3:13 PM.

Details

Summary

This patch turns on the flag -enable-no-rerun-simplification-pipeline, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager.

Compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions

No meaningful run time regressions observed in the llvm test suite and
in additional internal workloads at this time.

The example test in test/Other/no-rerun-function-simplification-pipeline.ll is a good means to understand the effect of this change:

define void @f1(void()* %p) alwaysinline {
  call void %p()
  ret void
}

define void @f2() #0 {
  call void @f1(void()* @f2)
  call void @f3()
  ret void
}

define void @f3() #0 {
  call void @f2()
  ret void
}

There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3).

The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have Running analysis: ShouldNotRunFunctionPassesAnalysis on f1.

Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2().
So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2.
With the flag on, the output will have Running analysis: ShouldNotRunFunctionPassesAnalysis on f2; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2.

Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with ShouldNotRunFunctionPassesAnalysis on f3, and twice with the flag off.
The reruns occur only on f2 and f3 due to the additional ref edges.

Diff Detail

Event Timeline

asbirlea created this revision.Jul 12 2022, 3:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 3:13 PM
asbirlea requested review of this revision.Jul 12 2022, 3:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 3:13 PM

lg on my side although I'd like other people to approve after getting more metrics

I'd give a more concrete example in the summary, e.g. the example in https://reviews.llvm.org/D113947

aeubanks accepted this revision.Jul 12 2022, 3:32 PM
This revision is now accepted and ready to land.Jul 12 2022, 3:32 PM
fhahn added a comment.Jul 12 2022, 4:12 PM

Compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions

No meaningful run time regressions observed in the llvm test suite and
in additional internal workloads at this time.

Do you have any insight in what kind of binary changes to expect and what the potential sources are?

regressions would come up in the form of functions where running the function simplification pipeline more than once on it results in more simplification, i.e. phase ordering issues, and the function happened to have the function simplification get run on it more than once due to how the CGSCC pass manager visits SCCs

nikic accepted this revision.Jul 13 2022, 1:42 AM

Yay!

asbirlea edited the summary of this revision. (Show Details)Jul 13 2022, 6:16 AM

For additional context, this is a follow up of D98103. At the time we were seeing large run time regressions and the goal was to understand how to modify the pipeline to resolve them.
Today, on the same workloads, the regressions are resolved. It’s not clear what specific change led to this, and if there are other workloads that will regress due to phase ordering issues and possibly needing additional passes to be added.
The flag flip should answer this question and I will iterate if regressions are reported.

The clear benefit is the reduction in compile time due to not rerunning passes on functions where no changes occurred. I included the explanation based on the no-rerun-function-simplification-pipeline.ll test in the description.

fhahn added a comment.Jul 14 2022, 9:12 AM

Sounds good, thanks! We should have our internal performance results by EOD/tomorrow. If there are any regressions I'll let you know.