Page MenuHomePhabricator

[PATCH] Reduce inline thresholds to compensate for cost changes
ClosedPublic

Authored by jmolloy on Nov 18 2016, 4:49 AM.

Details

Summary

In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.

As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.

The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.

For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 78501.Nov 18 2016, 4:49 AM
jmolloy retitled this revision from to [PATCH] Reduce inline thresholds to compensate for cost changes.
jmolloy updated this object.
jmolloy added reviewers: Gerolf, mzolotukhin, davidxl, eraman.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
davidxl added inline comments.Nov 18 2016, 9:20 AM
include/llvm/Analysis/InlineCost.h
39

Any reason to reduce O3 threshold?

Gerolf edited edge metadata.Nov 18 2016, 1:02 PM

The Oz case looks interesting. Can you share more details/insights about the "inaccuracies" w/ specific examples? I'm wondering if that can be fixed in general or be more triggered towards some trunk characteristics. But this is just something we can think about and discuss while moving on and celebrate the ct/cs recoveries :-). So LGTM!

mzolotukhin edited edge metadata.Nov 18 2016, 3:41 PM

Hi James,

Thanks for following up, a couple of questions inline.

Best regards,
Michael

test/Transforms/Inline/ephemeral.ll
24

Why is this change needed?

test/Transforms/Inline/inline-fp.ll
135–136

Is this one of the cases where we can not preserve the original behavior, or we intentionally want another one here (no context in the patch makes it harder to comprehend from here)?

jmolloy updated this revision to Diff 78692.Nov 21 2016, 1:25 AM
jmolloy edited edge metadata.

Hi Michael,

Sorry for the lack of context - see inline comments.

Cheers,

James

test/Transforms/Inline/ephemeral.ll
24

Simply, the inner function was doing too much work. It was inlined in minsize mode before, but isn't now (and the new behaviour is correct - the inlining would previously have increased codesize).

Reducing the amount of work the inner function does allows it to be inlined.

test/Transforms/Inline/inline-fp.ll
135–136

Apologies for the lack of context - error on my side.

The testcase is attempting to ensure that soft-float calls are more expensive to the inliner than hardware floating point ops. The test was obviously reduced from a real program, and expects an inner function to be inlined.

This inner function doesn't contain *any* function calls, and is the maximum size we would have inlined previously. We've reduced the minsize threshold, so these should no longer get inlined.

Upping the test to requiring optsize instead of minsize changes the threshold, and the original behaviour is maintained.

Hi James,

Thanks for the comments, all clear now. I didn't mean to hold the patch by them.

Michael

jmolloy accepted this revision.Nov 28 2016, 6:45 AM
jmolloy added a reviewer: jmolloy.

Committed in r288024.

This revision is now accepted and ready to land.Nov 28 2016, 6:45 AM
jmolloy closed this revision.Nov 28 2016, 6:45 AM