This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Add the support to split the function profiles with context into separate sections.
ClosedPublic

Authored by wmi on Jan 11 2021, 11:06 AM.

Details

Summary

For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class.

Diff Detail

Event Timeline

wmi created this revision.Jan 11 2021, 11:06 AM
wmi requested review of this revision.Jan 11 2021, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 11:06 AM
hoy added a comment.Jan 12 2021, 10:31 AM

Thanks for working on this which will help the thinLTO throughput. I guess there will be a separate patch to turn on this work, i.e, setting CtxSplitLayout?

llvm/lib/Transforms/IPO/SampleProfile.cpp
457

I'm wondering if we can just use one field for the phase it's currently are at. We may also want to check it for fullLTO in the future.

2118

F may get a chance to update its entry count with a post-inline count during top-down processing with the default layout. Maybe put this under the check against SkipNoContextProf?

wmi added a comment.Jan 12 2021, 9:11 PM

I guess there will be a separate patch to turn on this work, i.e, setting CtxSplitLayout?

I will use the resetSecLayout interface to select CtxSplitLayout. That will be done in our own branch of autofdo tool. I think for now most of the cases will still want to use the DefaultLayout.

llvm/lib/Transforms/IPO/SampleProfile.cpp
457

Good point. Will change it.

2118

I think for any case we don't want to reinitiate F's entrycount to initialEntryCount if it has already had a valid value.

Even if SkipNoContextProf is false, in LTO/ThinLTO postlink phase, it is possible that the emitAnnotations function doesn't change anything which could affect profile annotation (i.e., the variable Changed in emitAnnotations is false), so if a function getting a valid entry count in prelink phase is reinitialized to -1 before entering emitAnnotations in postlink, emitAnnotations may not be able to update it with a valid entry count again.

In addition, if F has a valid entrycount entering emitAnnotations, emitAnnotations can still update it without problem.

Thanks for the change.

If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context.

I guess the speed up is more visible with merged partial profile with explicit profile flattening? or did you see it speed up FDO with regular full profile too? For a regular FDO profile, I'd expect most function profiles to have some context except for really small functions..

hoy added inline comments.Jan 13 2021, 9:23 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2118

Oh yeah, entrycount can be set in emitAnnotations. Thanks for pointing it out.

I was wondering if F's profile can change due to the counts returning from its inliner in postlink phase, thus it may need an update in postlink though itself doesn't have context samples. That might happen with fullLTO but with thinLTO, the counts returning is already less accurate since it cannot be done cross-threads.

wmi added a comment.Jan 13 2021, 9:45 AM

Thanks for the change.

If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context.

I guess the speed up is more visible with merged partial profile with explicit profile flattening? or did you see it speed up FDO with regular full profile too? For a regular FDO profile, I'd expect most function profiles to have some context except for really small functions..

Right, it is more visible with merged partial profile. Currently the speedup with regular profile is minor. However, we saw in an experiment that by keeping context for the hottest functions and flattened the rest of the warm/cold functions in regular FDO profile can shrink profile size significantly without hurting performance. That is only an experiment for one target. We havn't done evaluation for other targets because regular full profile size is not a major issue for us for the moment. But if we can have the same conclusion for more targets, we can flatten more functions in the profile and the profile splitting will be more effective to reduce profile reading cost in LTO/ThinLTO postlink.

wmi added inline comments.Jan 13 2021, 10:01 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
457

Sent out a NFC change in https://reviews.llvm.org/D94613 as a preparation for the change here.

wmi updated this revision to Diff 316531.Jan 13 2021, 4:15 PM

Address Hongtao's comment.

hoy added inline comments.Jan 13 2021, 5:38 PM
llvm/include/llvm/ProfileData/SampleProf.h
169

Nit: how about name this SecFlagNoCalleeContext or SecFlagFlat or something? Just want to be a bit more clear as CS profile also uses the term context.

llvm/include/llvm/ProfileData/SampleProfWriter.h
159

Nit: name it ExtBinaryHdrLayoutTable?

wmi updated this revision to Diff 316569.Jan 13 2021, 10:26 PM

Address Hongtao's comment.

wmi marked an inline comment as done.Jan 13 2021, 10:28 PM
wmi added inline comments.
llvm/include/llvm/ProfileData/SampleProf.h
169

Make sense. I chose SecFlagFlat.

llvm/include/llvm/ProfileData/SampleProfWriter.h
159

That is better.

hoy accepted this revision.Jan 13 2021, 10:41 PM

LGTM, thanks.

This revision is now accepted and ready to land.Jan 13 2021, 10:41 PM
wenlei added inline comments.Jan 14 2021, 11:01 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2118

I was wondering if F's profile can change due to the counts returning from its inliner in postlink phase, thus it may need an update in postlink though itself doesn't have context samples.

I think In this case, emitAnnotations should still be the right place where the update happens, though we don't capture such profile merge as "change" today and emitAnnotations won't be triggered.

llvm/test/Transforms/SampleProfile/ctxsplit.ll
4

I noticed the input is a binary profile. Is the new layout of extended binary profile still two-way convertible with text profile? Text profile works better for small test cases?

wmi marked an inline comment as done.Jan 15 2021, 9:38 AM
wmi added inline comments.
llvm/test/Transforms/SampleProfile/ctxsplit.ll
4

Yes, I can start with a text profile, convert it to a extbinary profile then run the test. Then we need another option in llvm-profdata to switch the default layout to ctxsplit Layout. Currently I didn't add the option because I felt there were few cases using ctxsplit and adding another option in llvm-profdata may not worth it considering it already has many options. Going forward, whether we need an option in llvm-profdata for every change in extbinary format is a question.

wenlei added inline comments.Jan 18 2021, 8:20 AM
llvm/test/Transforms/SampleProfile/ctxsplit.ll
4

Ok, makes sense if your create_llvm_prof only generates extbin profile today. This is not critical.

Then we need another option in llvm-profdata to switch the default layout to ctxsplit Layout.

This is one step further, if we just want to have cxt layout, you could let create_llvm_prof produce text profile directly, or use llvm-profgen to convert extbinary with ctx layout to text with cxt layout? llvm-profgen doesn't have to support the conversion between the two layouts, which would need extra option.

I guess what I missed is that there's not going to be a text representation of the ctx layout, i.e. there's not going to be 1:1 mapping and round trip conversion between text and extbin for ctx layout?

wmi added inline comments.Jan 18 2021, 10:44 PM
llvm/test/Transforms/SampleProfile/ctxsplit.ll
4

I guess what I missed is that there's not going to be a text representation of the ctx layout, i.e. there's not going to be 1:1 mapping and round trip conversion between text and extbin for ctx layout?

Right. It is not going to have 1:1 mapping and round trip conversion between extbin and text format.

We can convert ctxlayout extbinary profile to text profile using llvm-profdata, but we cannot do that the other way around without adding an option, because the default layout in extbinary format is not ctxlayout. It means currently if we want to create a test using ctxlayout extbinary profile, we need to change a little code and rebuild llvm-profdata before we can generate ctxlayout extbinary profile. That is something not convenient, but that is a tradeoff in order to have less options in llvm-profdata.

In the test, I want to use ctxlayout extbinary profile instead of text profile converted from ctxlayout in the profile because I want to verify in thinlto postlink phase compiler won't read the part without context. That cannot be achieved if I use text profile.

wenlei accepted this revision.Jan 18 2021, 11:09 PM
wenlei added inline comments.
llvm/test/Transforms/SampleProfile/ctxsplit.ll
4

Got it, thanks for explaining.