For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class.
Thanks for working on this which will help the thinLTO throughput. I guess there will be a separate patch to turn on this work, i.e, setting CtxSplitLayout?
I'm wondering if we can just use one field for the phase it's currently are at. We may also want to check it for fullLTO in the future.
F may get a chance to update its entry count with a post-inline count during top-down processing with the default layout. Maybe put this under the check against SkipNoContextProf?
I guess there will be a separate patch to turn on this work, i.e, setting CtxSplitLayout?
I will use the resetSecLayout interface to select CtxSplitLayout. That will be done in our own branch of autofdo tool. I think for now most of the cases will still want to use the DefaultLayout.
Good point. Will change it.
I think for any case we don't want to reinitiate F's entrycount to initialEntryCount if it has already had a valid value.
Even if SkipNoContextProf is false, in LTO/ThinLTO postlink phase, it is possible that the emitAnnotations function doesn't change anything which could affect profile annotation (i.e., the variable Changed in emitAnnotations is false), so if a function getting a valid entry count in prelink phase is reinitialized to -1 before entering emitAnnotations in postlink, emitAnnotations may not be able to update it with a valid entry count again.
In addition, if F has a valid entrycount entering emitAnnotations, emitAnnotations can still update it without problem.
Thanks for the change.
If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context.
I guess the speed up is more visible with merged partial profile with explicit profile flattening? or did you see it speed up FDO with regular full profile too? For a regular FDO profile, I'd expect most function profiles to have some context except for really small functions..
Oh yeah, entrycount can be set in emitAnnotations. Thanks for pointing it out.
I was wondering if F's profile can change due to the counts returning from its inliner in postlink phase, thus it may need an update in postlink though itself doesn't have context samples. That might happen with fullLTO but with thinLTO, the counts returning is already less accurate since it cannot be done cross-threads.
Right, it is more visible with merged partial profile. Currently the speedup with regular profile is minor. However, we saw in an experiment that by keeping context for the hottest functions and flattened the rest of the warm/cold functions in regular FDO profile can shrink profile size significantly without hurting performance. That is only an experiment for one target. We havn't done evaluation for other targets because regular full profile size is not a major issue for us for the moment. But if we can have the same conclusion for more targets, we can flatten more functions in the profile and the profile splitting will be more effective to reduce profile reading cost in LTO/ThinLTO postlink.
I think In this case, emitAnnotations should still be the right place where the update happens, though we don't capture such profile merge as "change" today and emitAnnotations won't be triggered.
I noticed the input is a binary profile. Is the new layout of extended binary profile still two-way convertible with text profile? Text profile works better for small test cases?
Yes, I can start with a text profile, convert it to a extbinary profile then run the test. Then we need another option in llvm-profdata to switch the default layout to ctxsplit Layout. Currently I didn't add the option because I felt there were few cases using ctxsplit and adding another option in llvm-profdata may not worth it considering it already has many options. Going forward, whether we need an option in llvm-profdata for every change in extbinary format is a question.
Ok, makes sense if your create_llvm_prof only generates extbin profile today. This is not critical.
This is one step further, if we just want to have cxt layout, you could let create_llvm_prof produce text profile directly, or use llvm-profgen to convert extbinary with ctx layout to text with cxt layout? llvm-profgen doesn't have to support the conversion between the two layouts, which would need extra option.
I guess what I missed is that there's not going to be a text representation of the ctx layout, i.e. there's not going to be 1:1 mapping and round trip conversion between text and extbin for ctx layout?
Right. It is not going to have 1:1 mapping and round trip conversion between extbin and text format.
We can convert ctxlayout extbinary profile to text profile using llvm-profdata, but we cannot do that the other way around without adding an option, because the default layout in extbinary format is not ctxlayout. It means currently if we want to create a test using ctxlayout extbinary profile, we need to change a little code and rebuild llvm-profdata before we can generate ctxlayout extbinary profile. That is something not convenient, but that is a tradeoff in order to have less options in llvm-profdata.
In the test, I want to use ctxlayout extbinary profile instead of text profile converted from ctxlayout in the profile because I want to verify in thinlto postlink phase compiler won't read the part without context. That cannot be achieved if I use text profile.