Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006
Details
Diff Detail
Event Timeline
Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006
This is not clear to me: I would expect only a compile time impact on this, why is there a perf improvement?
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
64 | ZeroOrMore? Why isn't the default fine? | |
296 | Too many nested ternary operator IMO (I think more than one hurt readability in general). |
How does this result in improved performance? If the
The inliner and the importer use a different metric to determine what is cold: the importer uses the ProfleSummaryInfo while the inliner checks if the entry count is less than a fraction of max_entry_count (the inliner's code is the one that needs to change, but that's dependent on porting inliner to the new PM) This would explain the performance improvement (inliner normally inlines functions that are considered cold as determined by PSI.isColdCount() but prevented from doing so if they aren't imported). If we were to fix the inliner first, then this change should be nop in terms of performance.
Well, I think there is another reason - if I won't import function, that is called only from cold block, then it won't gonna be inlined, where the inliner in the trunk doesn't have any heuristic on NOT inlining functions to cold block
(it have heuristic to not inline functions to block ending with unreachable, but it is not connected to PGO). So by not importing some functions I prevent inliner from inlining stuff that it shouldn't.
It is possible, but realistically 0.10% improvement is not enough to draw any strong conclusions.
Overall looks fine, will re-review when you've addressed Mehdi's comments.
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
296 | I was trying to avoid using not const variable here because the scope is very big. Maybe I shoudl use lambda? |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
296 | Oh I see why you did it. Maybe a static function close to the cl::opt? const float BonusMultiplier = getBonusMultiplierForHotness(Edge.second.Hotness); (Straw man naming) |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
296 | what about auto GetBonusMultiplier = [](CalleeInfo::HotnessType Hotness) { if (Hotness == CalleeInfo::HotnessType::Hot) return ImportHotMultiplier; if (Hotness == CalleeInfo::HotnessType::Cold) return ImportColdMultiplier; return 1.0; }; const auto NewThreshold = Threshold * GetBonusMultiplier(Edge.second.Hotness); |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
296 | That's fine with me. (I wouldn't use a lambda as there is not much value seeing this boilerplate when reading this loop body, but I don't mind if you prefer it). |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
296 | I prefere lambda because it has smaller scope as function, and you don't have to jump in the file to read 5 simple lines. |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
302 | I agree... when there is an actual value to read these lines! Otherwise apply this principles means inlining the whole program as nested lambda into the main() :p |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
302 | The boilerplate is actually not that big, there is actually one line more (because I would have to declare a variable to store result ). I guess the real question is, should I hide implementation or not, because wrapping in the function/lambda is clearly win. So if you think that function will be better then I will change it because both works for me, and I want to push it upstream before I will finish internship (Friday) |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
302 | I don't think it makes much of a difference here, this is why I mentioned before either way is fine, pick your favorite. I can't believe your internship is already done, time flies... |
test/Transforms/FunctionImport/hotness_based_import.ll | ||
---|---|---|
32 | If this test passes, I think there's something wrong (the prefix is the same as the previous one...) |
ZeroOrMore? Why isn't the default fine?