This is an archive of the discontinued LLVM Phabricator instance.

Update entry count for cold calls
ClosedPublic

Authored by david2050 on Oct 3 2018, 1:47 PM.

Details

Summary

Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.

When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.

In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.

Diff Detail

Repository
rL LLVM

Event Timeline

david2050 created this revision.Oct 3 2018, 1:47 PM
david2050 updated this revision to Diff 168168.Oct 3 2018, 1:54 PM

remove facebook marker

davidxl edited reviewers, added: wmi; removed: davidxl.Oct 3 2018, 2:01 PM
davidxl added a subscriber: davidxl.
davidxl added inline comments.Oct 3 2018, 2:13 PM
lib/Transforms/IPO/SampleProfile.cpp
770 ↗(On Diff #168168)

Probably name it differently. They are calls that are inlined in profiled binary but not in optimized binary. 'Coldness' might be just one of the reasons. At least make the comment clearer.

test/Transforms/SampleProfile/inline-cold.ll
23 ↗(On Diff #168168)

Add checking of entry profile meta data here.

david2050 updated this revision to Diff 168174.Oct 3 2018, 2:28 PM

address review comments

davidxl added inline comments.Oct 3 2018, 2:55 PM
test/Transforms/SampleProfile/inline-cold.ll
25 ↗(On Diff #168174)

it is better to match the expected profile count here using name reference in filecheck.

david2050 updated this revision to Diff 168187.Oct 3 2018, 3:08 PM

refine test

thanks for the suggestions

test/Transforms/SampleProfile/inline-cold.ll
25 ↗(On Diff #168174)

Thanks for the suggestion.

wmi added a comment.Oct 12 2018, 4:51 PM

Sorry for the late reply. I didn't notice the thread.

I admit the problem that the patch describes introduces some impreciseness to the profile annotated, but a problem is that the patch only updates the entry counts of outlined functions, but the profiles inside of those outlined functions are not updated. Ideally if the entry count of a outline function is doubled, maybe the callsite count inside of the function also should be doubled too?

I am not saying it is necessary to keep all the profiles consistent, after all it is about cold place. But I like to know whether the patch is trying to solve a theoretical problem or motivated by a real case.

Thanks,
Wei.

lib/Transforms/IPO/SampleProfile.cpp
866–867 ↗(On Diff #168187)

Can be merged into one line.

1603–1609 ↗(On Diff #168187)

Maybe merged to:
ECount = Callee->getEntryCount()
Callee->setEntryCount(pair.getSecond() + ECount ? ECount.getCount() : 0);

test/Transforms/SampleProfile/Inputs/inline-cold.prof
1–9 ↗(On Diff #168187)

Better to make the function total count equal to the sum of the counts inside of the function because such inconsistency can cause trouble sometimes.

test/Transforms/SampleProfile/inline-cold.ll
30–31 ↗(On Diff #168187)

%0, %1... should be renamed in testcase.

Thanks for the review.
This was intended to balance the function updateCalleeCount in InlineFunction.cpp which simply decrements the entry count.

wmi added a comment.Oct 13 2018, 9:42 PM

This was intended to balance the function updateCalleeCount in InlineFunction.cpp which simply decrements the entry count.

Thanks. After inlining a function, updateCalleeCount and updateCallProfile are used to update entry count of callee and meta data for call instructions. Here what the patch is doing looks like to update the profile when outlining an inline instance, which is the opposite of inlining, so should we update the meta data for call instructions inside the callee as well?

david2050 updated this revision to Diff 180834.Jan 9 2019, 6:42 AM

refactor and update

Sorry for the delay.
Refactored the code in InlineFunction for reuse and to update weights uniformly.

wmi added inline comments.Jan 14 2019, 12:12 PM
lib/IR/Function.cpp
1403–1404 ↗(On Diff #180834)

It seems to me a little bit weird to have some structure specific for inliner in the interface of llvm IR. It looks more like a transformation utility and better to stay under lib/Transforms/Utils/. We may create a new file for it if there is nowhere else to put.

And maybe make the name more explicit like updateCalleeProfile.

lib/Transforms/IPO/SampleProfile.cpp
326 ↗(On Diff #180834)

I don't find where samples is used?

848 ↗(On Diff #180834)

Indirect call is skipped in the current patch, do you think it is better to handle it as well? I don't have strong opionion on this since it is cold.

1488 ↗(On Diff #180834)

inlineResult.second is never used. What is the intention to change the interface of inlineHotFunctions?

david2050 marked 2 inline comments as done.Jan 14 2019, 12:57 PM
david2050 added inline comments.
lib/IR/Function.cpp
1403–1404 ↗(On Diff #180834)

I will try putting it back into InlineFunction and exporting it as a static method

lib/Transforms/IPO/SampleProfile.cpp
326 ↗(On Diff #180834)

part of a subsequent patch, I will remove from here.

david2050 updated this revision to Diff 181656.Jan 14 2019, 3:00 PM

remove some unneeded changes, refactor location of updateProfileData, now updateProfileCallee

wmi added inline comments.Jan 14 2019, 4:28 PM
lib/Transforms/Utils/InlineFunction.cpp
1486–1494 ↗(On Diff #181656)

Maybe use a lambda here to remove the code duplication?

test/Transforms/SampleProfile/entry_counts_cold.ll
67–68 ↗(On Diff #181656)

Add check for the updated profile of the call instruction inside of bar?

david2050 updated this revision to Diff 181685.Jan 14 2019, 5:49 PM

refactor common code, update test

david2050 marked 3 inline comments as done.Jan 14 2019, 5:50 PM
david2050 added inline comments.
lib/Transforms/Utils/InlineFunction.cpp
1486–1494 ↗(On Diff #181656)

I refactored to pull the common out of the iff and trust unswitching :-)

wmi added inline comments.Jan 15 2019, 10:28 AM
include/llvm/DebugInfo/DIContext.h
101–105 ↗(On Diff #181685)

Are the changes in this file belonging to some other patch?

lib/Transforms/IPO/SampleProfile.cpp
322–323 ↗(On Diff #181685)

Can we make the comment more precise? I think NotInlinedProfileInfo is an accumulated information from all the callsites to the same function not being inlined but with inline instance in the profile. It is not a local information for each callsite.

787 ↗(On Diff #181685)

Can we change the name to be more self-explain or give it a comment? These are not just calls not being inlined, but also with inline instance in the profile.

test/Transforms/SampleProfile/entry_counts_cold.ll
2 ↗(On Diff #181685)

Please briefly describe what the test is up for so people don't have to search the commit history to find out.

26 ↗(On Diff #181685)

bar is not inlined.

david2050 updated this revision to Diff 181871.Jan 15 2019, 1:26 PM
david2050 marked 2 inline comments as done.

address comments

wmi accepted this revision.Jan 23 2019, 2:07 PM

Sorry it takes me some time to test the change on our major benchmarks. The results are neutral.

LGTM.

This revision is now accepted and ready to land.Jan 23 2019, 2:07 PM
This revision was automatically updated to reflect the committed changes.
wenlei added a subscriber: wenlei.Oct 12 2019, 11:11 AM

@wmi @davidxl FYI, we have an internal patch to address this issue further. This patch only scales up entry count of outline function for cold callsites, thus losing context-sensitiveness. But ideally, we could reuse the entire profile from that inline context, and merge it all back to the outlined function. We changed the function annotation/inlining order to be top-down, then as we decide to not inline a cold call site, we merge the entire FunctionSamples of that inlinee back to outline counterpart, and since we process functions in top-down order (ignoring recursion for a second), annotation of the outline callee is able to use post merge FunctionSamples which is more accurate than simple scaling of entry count. (Doing context-sensitive inlining top-down has other benefits too, as it allows specialization whiling inlining which helps maximize the benefit of having context sensitive profile)

We'll upstream the changes later if you think it will be useful.

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2019, 11:11 AM

Wenlei, this sounds like a good idea. Patches are welcome!

Wenlei, this sounds like a good idea. Patches are welcome!

Sorry for the delay. I just sent D70584, D70653 and D70655. There's another one that reduces the oscillation between sample loader inlining and CGSCC inlining for small inlinees to better preserve CS profile, and I'll also upstream that later. We got small wins with these changes.