This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] Compute the savings of switch statements and SROA in the cost benefit analysis
ClosedPublic

Authored by taolq on Mar 13 2021, 3:20 AM.

Details

Summary

This patch takes switch statements and SROA to cycle savings in the cost-benifit inliner, which makes the cycle savings more considerable.

Diff Detail

Event Timeline

taolq created this revision.Mar 13 2021, 3:20 AM
taolq requested review of this revision.Mar 13 2021, 3:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2021, 3:20 AM
taolq edited the summary of this revision. (Show Details)Mar 13 2021, 3:27 AM
taolq added reviewers: kazu, mtrofin, davidxl.
taolq added a comment.Mar 13 2021, 3:42 AM

Any test?

I ran and passed the LLVM unit tests and regression tests. But I didn't run it on any benchmarks.
Or do you mean add new unit tests?

Yes, IR test or unit tests if possible. Try to comment lines for BranchInst and check a test which breaks so you can get inspiration.

But otherwise this looks fine.

kazu added a comment.Mar 24 2021, 4:24 PM

Do you think you could try some benchmarks?

I personally tried to take into account SROA, switch statements, and even DCE of instructions used to compute unused return values. None of these has resulted in measurable performance differences. When I look at savings, the call overhead usually dominates the savings. My understanding is that we always inline the most desirable call sites if we assume that the ratio of savings to cost is a reasonable indicator of desirability. Attempts to compute savings more accurately just moves the cutoff point in the long tail of call sites, without impacting the set of critically important call sites.

That said, if you have compelling numbers or simply want to take care of the TODO items, I'm happy to LGTM this patch. I'd be just as happy if you simply remove the TODO comment.

taolq updated this revision to Diff 333832.Mar 29 2021, 5:50 AM

Remove TODO comment that consider other forms of savings.

taolq added a comment.Mar 29 2021, 6:12 AM

Do you think you could try some benchmarks?

I personally tried to take into account SROA, switch statements, and even DCE of instructions used to compute unused return values. None of these has resulted in measurable performance differences. When I look at savings, the call overhead usually dominates the savings. My understanding is that we always inline the most desirable call sites if we assume that the ratio of savings to cost is a reasonable indicator of desirability. Attempts to compute savings more accurately just moves the cutoff point in the long tail of call sites, without impacting the set of critically important call sites.

That said, if you have compelling numbers or simply want to take care of the TODO items, I'm happy to LGTM this patch. I'd be just as happy if you simply remove the TODO comment.

I test this patch on SPEC2006 CINT. Here is the result.

Tests: 12
Metric: exec_time

Program                                        baseline-cint2006-pgo-O3-2 experiment-cint2006-pgo-O3 diff
 test-suite...libquantum/462.libquantum.test   323.92                     330.56                      2.0%
 test-suite...6/471.omnetpp/471.omnetpp.test   398.06                     402.95                      1.2%
 test-suite.../CINT2006/429.mcf/429.mcf.test   352.44                     354.12                      0.5%
 test-suite...T2006/456.hmmer/456.hmmer.test   279.81                     280.53                      0.3%
 test-suite...0.perlbench/400.perlbench.test   259.63                     260.13                      0.2%
 test-suite...T2006/458.sjeng/458.sjeng.test   415.59                     416.02                      0.1%
 test-suite.../CINT2006/403.gcc/403.gcc.test   260.36                     260.61                      0.1%
 test-suite...T2006/445.gobmk/445.gobmk.test   384.21                     384.22                      0.0%
 test-suite...6/464.h264ref/464.h264ref.test   372.04                     371.90                     -0.0%
 test-suite...T2006/473.astar/473.astar.test   400.26                     399.80                     -0.1%
 test-suite...T2006/401.bzip2/401.bzip2.test   429.44                     428.11                     -0.3%
 test-suite...3.xalancbmk/483.xalancbmk.test   267.12                     259.23                     -3.0%
 Geomean difference                                                                                   0.1%

The same as what you said. I think they're not compelling numbers.
I want to get familiar with inliner through these TODO items. But it seems unnecessary to add these code.
These code could not detect more critically important call sites.
So let me just remove TODO comment.

Thanks Kazu. I learned a lot from this patch.

kazu accepted this revision.Mar 29 2021, 3:22 PM

Thank you for running benchmarks!

This revision is now accepted and ready to land.Mar 29 2021, 3:22 PM