This is an archive of the discontinued LLVM Phabricator instance.

We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary.
ClosedPublic

Authored by david2050 on Jan 8 2019, 5:38 AM.

Details

Summary

Here are timings (as reported by "opt -time-passes") for
sample-profile pass for some files holding hot functions from a major
service©r. Average 17% reduction. Delta column is 100*(old-new)/old.

Old    New    Delta
0.0537 0.0538 -0.2%
0.8155 0.6522 20.0%
0.0779 0.0751  3.6%
0.0727 0.0913 -25.6%
0.1622 0.1302 19.7%
0.0627 0.0594  5.3%
0.0766 0.0744  2.9%
0.6426 0.4387 31.7%
0.3521 0.2776 21.2%
0.3549 0.2721 23.3%
0.0912 0.0904  0.9%
0.1236 0.1059 14.3%
0.0854 0.0866 -1.4%
0.0757 0.0722  4.6%
0.1293 0.1147 11.3%
0.1354 0.1122 17.1%
0.0767 0.0770 -0.4%
0.1135 0.0968 14.7%
0.0524 0.0608 -16.0%
0.1279 0.1106 13.5%
==========
3.6820 3.0520 17.1% Total

Diff Detail

Event Timeline

david2050 created this revision.Jan 8 2019, 5:38 AM
david2050 updated this revision to Diff 180703.Jan 8 2019, 11:20 AM

use mutable keyword

dblaikie added inline comments.Jan 13 2019, 6:20 PM
lib/Transforms/IPO/SampleProfile.cpp
727–731

Perhaps do an unconditional try_emplace and use the bool in the returned pair to decide if the work has already been done or not? This saves two map lookups on the first lookup.

pair<iterator, bool> p = try_emplace(DIL, nullptr);
if (p.second)
  p.first->second = findFunctionSamples(DIL)
return p.first->second;
david2050 marked 3 inline comments as done.Jan 13 2019, 7:25 PM
david2050 added inline comments.
lib/Transforms/IPO/SampleProfile.cpp
221

Any guidance on the use of mutable v. removing the "const" attributes on the 3 affected method?

727–731

I was uncertain if the iterator was save over the call but will update and in parallel test the change locally.

david2050 updated this revision to Diff 181495.Jan 13 2019, 8:06 PM
david2050 marked an inline comment as done.

one hash lookup

wmi added a comment.Jan 14 2019, 9:18 AM

Seems the compile time saving is got mainly because there are multiple instructions sharing the same debug location, is my understanding correct?

If the understanding is correct, the memorized mapping seems only useful inside of each function, and is it better to clean the map at the end of runOnFunction instead of holding all the entries for the module?

wmi added a comment.Jan 14 2019, 11:58 AM

Seems the compile time saving is got mainly because there are multiple instructions sharing the same debug location, is my understanding correct?

Seems incorrect. findFunctionSamples may be called for the same instruction multiple times due to multiple iterations of hot functions inlining or profile propagation.

But we can still clean up the map at the end of runOnFunction.

clear map for every function

Resetting the function map is a small (< 2%) win

wmi accepted this revision.Jan 15 2019, 9:13 AM

LGTM.

This revision is now accepted and ready to land.Jan 15 2019, 9:13 AM
This revision was automatically updated to reflect the committed changes.