This is an archive of the discontinued LLVM Phabricator instance.

[thinlto] Don't decay threshold for hot callsites
ClosedPublic

Authored by Prazek on Sep 27 2016, 11:42 AM.

Details

Summary

We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Diff Detail

Repository
rL LLVM

Event Timeline

Prazek updated this revision to Diff 72683.Sep 27 2016, 11:42 AM
Prazek retitled this revision from to [thinlto] Don't decay threshold for hot callsites.
Prazek updated this object.
Prazek added reviewers: tejohnson, eraman, mehdi_amini.
Prazek added a subscriber: llvm-commits.
Prazek updated this revision to Diff 72687.Sep 27 2016, 11:57 AM

deleted debug

tejohnson edited edge metadata.Sep 27 2016, 12:11 PM

(As discussed offline, Piotr will test on SPEC and get back with performance data).

test/Transforms/FunctionImport/hotness_based_import.ll
38 ↗(On Diff #72687)

Comment doesn't make much sense - we are checking that they aren't imported.

mehdi_amini added inline comments.Sep 27 2016, 12:16 PM
lib/Transforms/IPO/FunctionImport.cpp
385 ↗(On Diff #72687)

I rather express this differently: using different factors for the hotness.
(You current proposal would be using 1 as a factor for "hot" calls chains).

Prazek added inline comments.Sep 27 2016, 1:25 PM
lib/Transforms/IPO/FunctionImport.cpp
385 ↗(On Diff #72687)

ok I will add different Factor

test/Transforms/FunctionImport/hotness_based_import.ll
38 ↗(On Diff #72687)

oh yea, I forgot to remove this.

Prazek updated this revision to Diff 72706.Sep 27 2016, 2:00 PM
Prazek marked 4 inline comments as done.
Prazek edited edge metadata.

fixes

mehdi_amini added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
350 ↗(On Diff #72706)

As mentioned in D24638, you could achieve the same result by adjusting the Threshold when pushing on the stack instead of passing more context to the next iteration and delaying the adjustment.

Prazek added inline comments.Sep 28 2016, 10:55 AM
lib/Transforms/IPO/FunctionImport.cpp
350 ↗(On Diff #72706)

just to make sure - you want to apply the threshold: lines

if (IsHotCallsite)
    Threshold = Threshold * ImportHotInstrFactor;
else
    Threshold = Threshold * ImportInstrFactor;

here, and don't do any threshold changes in the while loop? (So worklist will only contain FunctionSummary - because threshold and my bool won't be needed)

If so, then it seems like a good idea!

mehdi_amini added inline comments.Sep 28 2016, 11:35 AM
lib/Transforms/IPO/FunctionImport.cpp
350 ↗(On Diff #72706)

Right: doing all the threshold management at the same place.

I have some initial results. It looks like because inliner doesn't know how to inline hot callsites, then most of the SPEC benchmarks have the same binary as current trunk (with hot heuristic).
But on the google branch with smarter inliner there is clear win in povray, about +0.5% compared to current trunk with hot heuristic, over all other benchmarks that produced different binary are not much different.

So the question is - should I temporary set the hot-evolution-factor to the same as normal evolution factor? So this way We won't import more functions that inliner won't inline, and then when new inliner will make it to trunk, to switch it to 1.0

Prazek updated this revision to Diff 72922.Sep 28 2016, 4:03 PM

moved calculating threshold

Prazek marked 3 inline comments as done.Sep 28 2016, 4:04 PM

I have some initial results. It looks like because inliner doesn't know how to inline hot callsites, then most of the SPEC benchmarks have the same binary as current trunk (with hot heuristic).
But on the google branch with smarter inliner there is clear win in povray, about +0.5% compared to current trunk with hot heuristic, over all other benchmarks that produced different binary are not much different.

So the question is - should I temporary set the hot-evolution-factor to the same as normal evolution factor? So this way We won't import more functions that inliner won't inline, and then when new inliner will make it to trunk, to switch it to 1.0

Can you enable your importing stats to see how many extra funcs we import in trunk?

lib/Transforms/IPO/FunctionImport.cpp
275 ↗(On Diff #72922)

Change not needed now

295 ↗(On Diff #72922)

These few lines no longer need changing

391 ↗(On Diff #72922)

No need to change these lines, just remove the threshold adjustment (makes the actual functional change clearer without other changes).

mehdi_amini edited edge metadata.Sep 28 2016, 4:47 PM

So the question is - should I temporary set the hot-evolution-factor to the same as normal evolution factor? So this way We won't import more functions that inliner won't inline, and then when new inliner will make it to trunk, to switch it to 1.0

I'm fine designing the importer independently knowing that the inliner will be updated at some point to match, even if it means that in the meantime we import more than necessary when PGO is enabled.

Prazek marked 2 inline comments as done.Sep 28 2016, 4:59 PM
Prazek added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
295 ↗(On Diff #72922)

I think they need: I am ussing isHotCallsite later, and it also makes code cleaner

Prazek updated this revision to Diff 72928.Sep 28 2016, 5:00 PM
Prazek edited edge metadata.

Addressed Teresa comments

I have some initial results. It looks like because inliner doesn't know how to inline hot callsites, then most of the SPEC benchmarks have the same binary as current trunk (with hot heuristic).
But on the google branch with smarter inliner there is clear win in povray, about +0.5% compared to current trunk with hot heuristic, over all other benchmarks that produced different binary are not much different.

So the question is - should I temporary set the hot-evolution-factor to the same as normal evolution factor? So this way We won't import more functions that inliner won't inline, and then when new inliner will make it to trunk, to switch it to 1.0

Can you enable your importing stats to see how many extra funcs we import in trunk?

These are the numbers of imported functions for povray
trunk_without_hot trunk trunk_hot_chaians
1738 1863 1876

So we import 13 functions more. This is kinda expected, there should not be many hot chains

This revision was automatically updated to reflect the committed changes.