This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] Compute the savings of switch statements and SROA in the cost benefit analysis
ClosedPublic

Authored by taolq on Mar 13 2021, 3:20 AM.

Download Raw Diff

Details

Reviewers

kazu
mtrofin
davidxl

Commits

rGd2d6720a93a2: [InlineCost] Remove TODO comment that consider other forms of savings in the…

Summary

This patch takes switch statements and SROA to cycle savings in the cost-benifit inliner, which makes the cycle savings more considerable.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

taolq created this revision.Mar 13 2021, 3:20 AM

Herald added subscribers: haicheng, hiraditya, eraman. · View Herald TranscriptMar 13 2021, 3:20 AM

taolq requested review of this revision.Mar 13 2021, 3:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2021, 3:20 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

taolq edited the summary of this revision. (Show Details)Mar 13 2021, 3:27 AM

taolq added reviewers: kazu, mtrofin, davidxl.

Any test?

In D98577#2624148, @xbolva00 wrote:

Any test?

I ran and passed the LLVM unit tests and regression tests. But I didn't run it on any benchmarks.
Or do you mean add new unit tests?

Harbormaster completed remote builds in B93651: Diff 330439.Mar 13 2021, 4:04 AM

Yes, IR test or unit tests if possible. Try to comment lines for BranchInst and check a test which breaks so you can get inspiration.

But otherwise this looks fine.

Do you think you could try some benchmarks?

I personally tried to take into account SROA, switch statements, and even DCE of instructions used to compute unused return values. None of these has resulted in measurable performance differences. When I look at savings, the call overhead usually dominates the savings. My understanding is that we always inline the most desirable call sites if we assume that the ratio of savings to cost is a reasonable indicator of desirability. Attempts to compute savings more accurately just moves the cutoff point in the long tail of call sites, without impacting the set of critically important call sites.

That said, if you have compelling numbers or simply want to take care of the TODO items, I'm happy to LGTM this patch. I'd be just as happy if you simply remove the TODO comment.

Remove TODO comment that consider other forms of savings.

In D98577#2649274, @kazu wrote:

Do you think you could try some benchmarks?

I personally tried to take into account SROA, switch statements, and even DCE of instructions used to compute unused return values. None of these has resulted in measurable performance differences. When I look at savings, the call overhead usually dominates the savings. My understanding is that we always inline the most desirable call sites if we assume that the ratio of savings to cost is a reasonable indicator of desirability. Attempts to compute savings more accurately just moves the cutoff point in the long tail of call sites, without impacting the set of critically important call sites.

That said, if you have compelling numbers or simply want to take care of the TODO items, I'm happy to LGTM this patch. I'd be just as happy if you simply remove the TODO comment.

I test this patch on SPEC2006 CINT. Here is the result.

Tests: 12
Metric: exec_time

Program                                        baseline-cint2006-pgo-O3-2 experiment-cint2006-pgo-O3 diff
 test-suite...libquantum/462.libquantum.test   323.92                     330.56                      2.0%
 test-suite...6/471.omnetpp/471.omnetpp.test   398.06                     402.95                      1.2%
 test-suite.../CINT2006/429.mcf/429.mcf.test   352.44                     354.12                      0.5%
 test-suite...T2006/456.hmmer/456.hmmer.test   279.81                     280.53                      0.3%
 test-suite...0.perlbench/400.perlbench.test   259.63                     260.13                      0.2%
 test-suite...T2006/458.sjeng/458.sjeng.test   415.59                     416.02                      0.1%
 test-suite.../CINT2006/403.gcc/403.gcc.test   260.36                     260.61                      0.1%
 test-suite...T2006/445.gobmk/445.gobmk.test   384.21                     384.22                      0.0%
 test-suite...6/464.h264ref/464.h264ref.test   372.04                     371.90                     -0.0%
 test-suite...T2006/473.astar/473.astar.test   400.26                     399.80                     -0.1%
 test-suite...T2006/401.bzip2/401.bzip2.test   429.44                     428.11                     -0.3%
 test-suite...3.xalancbmk/483.xalancbmk.test   267.12                     259.23                     -3.0%
 Geomean difference                                                                                   0.1%

The same as what you said. I think they're not compelling numbers.
I want to get familiar with inliner through these TODO items. But it seems unnecessary to add these code.
These code could not detect more critically important call sites.
So let me just remove TODO comment.

Thanks Kazu. I learned a lot from this patch.

Harbormaster completed remote builds in B96093: Diff 333832.Mar 29 2021, 6:36 AM

Thank you for running benchmarks!

This revision is now accepted and ready to land.Mar 29 2021, 3:22 PM

Closed by commit rGd2d6720a93a2: [InlineCost] Remove TODO comment that consider other forms of savings in the… (authored by taolq). · Explain WhyMar 31 2021, 5:12 AM

This revision was automatically updated to reflect the committed changes.

taolq added a commit: rGd2d6720a93a2: [InlineCost] Remove TODO comment that consider other forms of savings in the….

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InlineCost.cpp

3 lines

Diff 333832

llvm/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines	for (auto &BB : F) {
CurrentSavings += InlineConstants::InstrCost;		CurrentSavings += InlineConstants::InstrCost;
}		}
} else if (Value *V = dyn_cast<Value>(&I)) {		} else if (Value *V = dyn_cast<Value>(&I)) {
// Count an instruction as savings if we can fold it.		// Count an instruction as savings if we can fold it.
if (SimplifiedValues.count(V)) {		if (SimplifiedValues.count(V)) {
CurrentSavings += InlineConstants::InstrCost;		CurrentSavings += InlineConstants::InstrCost;
}		}
}		}
// TODO: Consider other forms of savings like switch statements,
// indirect calls becoming direct, SROACostSavings, LoadEliminationCost,
// etc.
}		}

auto ProfileCount = CalleeBFI->getBlockProfileCount(&BB);		auto ProfileCount = CalleeBFI->getBlockProfileCount(&BB);
assert(ProfileCount.hasValue());		assert(ProfileCount.hasValue());
CurrentSavings *= ProfileCount.getValue();		CurrentSavings *= ProfileCount.getValue();
CycleSavings += CurrentSavings;		CycleSavings += CurrentSavings;
}		}

▲ Show 20 Lines • Show All 2,034 Lines • Show Last 20 Lines