This is an archive of the discontinued LLVM Phabricator instance.

[thinlto] Add cold-callsite import heuristic
ClosedPublic

Authored by Prazek on Sep 26 2016, 2:13 PM.

Details

Summary

Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Diff Detail

Event Timeline

Prazek updated this revision to Diff 72565.Sep 26 2016, 2:13 PM
Prazek retitled this revision from to [thinlto] Add cold-callsite import heuristic.
Prazek updated this object.
Prazek added reviewers: tejohnson, mehdi_amini, eraman.
Prazek added a subscriber: llvm-commits.
Prazek updated this revision to Diff 72566.Sep 26 2016, 2:16 PM

Small typos

mehdi_amini edited edge metadata.Sep 26 2016, 5:35 PM

Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006

This is not clear to me: I would expect only a compile time impact on this, why is there a perf improvement?

lib/Transforms/IPO/FunctionImport.cpp
64

ZeroOrMore? Why isn't the default fine?

296

Too many nested ternary operator IMO (I think more than one hurt readability in general).

eraman edited edge metadata.Sep 26 2016, 5:57 PM

How does this result in improved performance? If the

Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006

This is not clear to me: I would expect only a compile time impact on this, why is there a perf improvement?

The inliner and the importer use a different metric to determine what is cold: the importer uses the ProfleSummaryInfo while the inliner checks if the entry count is less than a fraction of max_entry_count (the inliner's code is the one that needs to change, but that's dependent on porting inliner to the new PM) This would explain the performance improvement (inliner normally inlines functions that are considered cold as determined by PSI.isColdCount() but prevented from doing so if they aren't imported). If we were to fix the inliner first, then this change should be nop in terms of performance.

How does this result in improved performance? If the

Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006

This is not clear to me: I would expect only a compile time impact on this, why is there a perf improvement?

The inliner and the importer use a different metric to determine what is cold: the importer uses the ProfleSummaryInfo while the inliner checks if the entry count is less than a fraction of max_entry_count (the inliner's code is the one that needs to change, but that's dependent on porting inliner to the new PM) This would explain the performance improvement (inliner normally inlines functions that are considered cold as determined by PSI.isColdCount() but prevented from doing so if they aren't imported). If we were to fix the inliner first, then this change should be nop in terms of performance.

Well, I think there is another reason - if I won't import function, that is called only from cold block, then it won't gonna be inlined, where the inliner in the trunk doesn't have any heuristic on NOT inlining functions to cold block
(it have heuristic to not inline functions to block ending with unreachable, but it is not connected to PGO). So by not importing some functions I prevent inliner from inlining stuff that it shouldn't.

tejohnson edited edge metadata.Sep 27 2016, 12:01 PM

How does this result in improved performance? If the

Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006

This is not clear to me: I would expect only a compile time impact on this, why is there a perf improvement?

The inliner and the importer use a different metric to determine what is cold: the importer uses the ProfleSummaryInfo while the inliner checks if the entry count is less than a fraction of max_entry_count (the inliner's code is the one that needs to change, but that's dependent on porting inliner to the new PM) This would explain the performance improvement (inliner normally inlines functions that are considered cold as determined by PSI.isColdCount() but prevented from doing so if they aren't imported). If we were to fix the inliner first, then this change should be nop in terms of performance.

Well, I think there is another reason - if I won't import function, that is called only from cold block, then it won't gonna be inlined, where the inliner in the trunk doesn't have any heuristic on NOT inlining functions to cold block
(it have heuristic to not inline functions to block ending with unreachable, but it is not connected to PGO). So by not importing some functions I prevent inliner from inlining stuff that it shouldn't.

It is possible, but realistically 0.10% improvement is not enough to draw any strong conclusions.

Overall looks fine, will re-review when you've addressed Mehdi's comments.

Prazek updated this revision to Diff 72708.Sep 27 2016, 2:08 PM
Prazek marked an inline comment as done.
Prazek edited edge metadata.

nits

Prazek marked an inline comment as done.Sep 27 2016, 2:09 PM
Prazek added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
296

I was trying to avoid using not const variable here because the scope is very big. Maybe I shoudl use lambda?

mehdi_amini added inline comments.Sep 27 2016, 5:07 PM
lib/Transforms/IPO/FunctionImport.cpp
296

Oh I see why you did it. Maybe a static function close to the cl::opt?

const float BonusMultiplier = getBonusMultiplierForHotness(Edge.second.Hotness);

(Straw man naming)

Prazek marked an inline comment as done.Sep 27 2016, 5:16 PM
Prazek added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
296

what about

auto GetBonusMultiplier = [](CalleeInfo::HotnessType Hotness) {
    if (Hotness == CalleeInfo::HotnessType::Hot)
      return ImportHotMultiplier;
    if (Hotness == CalleeInfo::HotnessType::Cold)
      return ImportColdMultiplier;
    return 1.0;
  };
  
  const auto NewThreshold = Threshold * GetBonusMultiplier(Edge.second.Hotness);
mehdi_amini added inline comments.Sep 27 2016, 5:50 PM
lib/Transforms/IPO/FunctionImport.cpp
296

That's fine with me.

(I wouldn't use a lambda as there is not much value seeing this boilerplate when reading this loop body, but I don't mind if you prefer it).

Prazek marked 4 inline comments as done.Sep 28 2016, 5:05 PM
Prazek added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
296

I prefere lambda because it has smaller scope as function, and you don't have to jump in the file to read 5 simple lines.
So I will stay with it.

mehdi_amini added inline comments.Sep 28 2016, 8:29 PM
lib/Transforms/IPO/FunctionImport.cpp
302

I agree... when there is an actual value to read these lines! Otherwise apply this principles means inlining the whole program as nested lambda into the main() :p
So it is always a signal/noise ratio evaluation. Having a lambda means the code is available, but it means the body of the current function is longer.
(Here I felt I can read and understand the code in the loop without the need to see the body of a function with a name GetBonusMultiplier)

Prazek marked an inline comment as done.Sep 28 2016, 9:20 PM
Prazek added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
302

The boilerplate is actually not that big, there is actually one line more (because I would have to declare a variable to store result ).

I guess the real question is, should I hide implementation or not, because wrapping in the function/lambda is clearly win.

So if you think that function will be better then I will change it because both works for me, and I want to push it upstream before I will finish internship (Friday)

mehdi_amini added inline comments.Sep 28 2016, 9:23 PM
lib/Transforms/IPO/FunctionImport.cpp
302

I don't think it makes much of a difference here, this is why I mentioned before either way is fine, pick your favorite.
I'm fine continuing bike-shedding this though :)

I can't believe your internship is already done, time flies...

mehdi_amini added inline comments.Sep 28 2016, 9:45 PM
test/Transforms/FunctionImport/hotness_based_import.ll
32

If this test passes, I think there's something wrong (the prefix is the same as the previous one...)

This revision was automatically updated to reflect the committed changes.