This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] For functions without profiles, provide an option to put them in a special text section
ClosedPublic

Authored by wmi on May 28 2019, 11:41 AM.

Details

Summary

For sampleFDO, because the optimized build uses profile generated from previous release, previously we couldn't tell a function without profile was truely cold or just newly created so we had to treat them conservatively and put them in .text section instead of .text.unlikely. The result was when we persuing the best performance by locking .text.hot and .text in memory, we wasted a lot of memory to keep cold functions inside.

In https://reviews.llvm.org/D66374, we introduced profile symbol list to discriminate functions being
cold versus functions being newly added. This mechanism works quite well for regular use cases in AutoFDO. However, in some case, we can only have a partial profile when optimizing a target. The partial profile may be an aggregated profile collected from many targets. The profile symbol list method used for regular sampleFDO profile is not applicable to partial profile use case because it may be too large and introduce many false positives.

To solve the problem for partial profile use case, I want to resurrect this patch. In this patch, we provide an option called --profile-unknown-in-special-section. For functions without profile, we will still treat them conservatively in compiler optimizations -- for example, treat them as warm instead of cold in inliner. When we use profile info to add section prefix for functions, we will discriminate functions known to be not cold versus functions without profile (being unknown), and we will put functions being unknown in a special text section called .text.unknown. Runtime system will have the flexibility to decide where to put the special section in order to achieve a balance between performance and memory saving.

Diff Detail

Event Timeline

wmi created this revision.May 28 2019, 11:41 AM

I don't object to this functionality, but I wonder if this problem would be better solved in some other way. For example, why can't the sampling (post-)processing record the list of (MD5 hashes of) function names in the line table (or symbol table or whatever is relevant). If you have the list, then we don't need to guess whether a function missing from the profile is very cold or hadn't been added yet. I realize that this list might be large, but some kind of on-disk Bloom filter might work well (the Bloom filter will have some small possibility of false positives, but if rarely a cold function is treated as warm, that might be an acceptable tradeoff).

Longer term, the plan is to put a white list of the symbols into the profile data so that the compiler can decide if a function is newly created or simply cold.

Wei, if that is in place, is there a need for this patch?

wmi added a comment.May 28 2019, 2:58 PM

Longer term, the plan is to put a white list of the symbols into the profile data so that the compiler can decide if a function is newly created or simply cold.

Wei, if that is in place, is there a need for this patch?

Yes, that is the longer term plan, and that should bring extra benefit that compiler optimizations will be benefited from the extra information.

We use the current patch as a temporary solution because we saw some urgent demand to save a lot of memory from it. It won't be needed if the longer term plan is in place.

what is the main blocker for the longer term solution?

wmi added a comment.May 28 2019, 3:36 PM

what is the main blocker for the longer term solution?

Like Hal mentioned the symbol list may be large since it include all the symbols. Current autofdo profile only include symbols that are hot/warm, so the whole symbol list should be way larger than the symbol section in current autofdo profile. I havn't looked at whether bloom-filter can help here. If we use md5, we will have a problem on profile-remapping.(https://llvm.org/docs/CommandGuide/llvm-profdata.html#cmdoption-llvm-profdata-merge-remapping-file).

I don't see other blockers.

Why is symbol remapping an issue? old MD5 --> input symbol name -> output symbol name --> mapped MD5sum ?

wmi added a comment.May 28 2019, 4:09 PM

Why is symbol remapping an issue? old MD5 --> input symbol name -> output symbol name --> mapped MD5sum ?

We want to use profile remapping support when we want to find the counterparts in profile for the set of symbols being renamed during optimized build after a large scale codebase refactoring. The profile remapping support doesn't work well if the symbols in the profile are md5. If the symbol whitelist contain md5, after refactoring, profile remapping will not find old symbols being renamed in the whitelist and will think they are all new symbols.

Existing profile remapping doesn't support comparing old MD5 with the mapped MD5. profile remapping is based on C++ mangling, so only symbol names are supported during remapping.

wmi updated this revision to Diff 261975.May 4 2020, 5:44 PM
wmi edited the summary of this revision. (Show Details)
wmi added a reviewer: wenlei.
wmi added subscribers: congliu, hoy.

A new version of the patch.

wmi added a comment.May 4 2020, 5:51 PM

I updated the patch summary describing the motivation to resurrect the patch.

davidxl added inline comments.May 5 2020, 10:38 AM
llvm/include/llvm/Analysis/ProfileSummaryInfo.h
118

--> isFunctionHotnessUnknown (or isFunctionProfileUnknown) seems better.

llvm/lib/Analysis/ProfileSummaryInfo.cpp
200

assert (F && hasSampleProfile()) ?

llvm/lib/CodeGen/CodeGenPrepare.cpp
462

is this used?

wmi marked 3 inline comments as done.May 5 2020, 4:11 PM
wmi added inline comments.
llvm/include/llvm/Analysis/ProfileSummaryInfo.h
118

Ok, changed.

llvm/lib/Analysis/ProfileSummaryInfo.cpp
200

Added.

llvm/lib/CodeGen/CodeGenPrepare.cpp
462

Removed.

wmi updated this revision to Diff 262253.May 5 2020, 4:12 PM

Address David's comment.

wenlei added a comment.May 5 2020, 6:55 PM

Now that the use case is for partial profile, I think an umbrella switch to tell whether input profile is a partial profile would be helpful. That'd be complimentary to the partial flag in the profile itself, because whether a profile is partial actually depends on the use case too, e.g. a profile from service A would be a partial profile for server B, but not partial for A itself.

We could then tie all optimization tweaks for partial profile to that switch (and the partial flag from profile itself), which makes it easier to use. If we want the flexibility for individual tuning, we could have specific flags like this one as narrower override for the umbrella switch. What do you think?

wmi added a comment.May 6 2020, 11:51 AM

Now that the use case is for partial profile, I think an umbrella switch to tell whether input profile is a partial profile would be helpful. That'd be complimentary to the partial flag in the profile itself, because whether a profile is partial actually depends on the use case too, e.g. a profile from service A would be a partial profile for server B, but not partial for A itself.

We could then tie all optimization tweaks for partial profile to that switch (and the partial flag from profile itself), which makes it easier to use. If we want the flexibility for individual tuning, we could have specific flags like this one as narrower override for the umbrella switch. What do you think?

That make sense. I updated the patch to add such a flag.

wmi updated this revision to Diff 262439.May 6 2020, 12:06 PM

Address Wenlei's comment.

wenlei added inline comments.May 6 2020, 12:31 PM
llvm/lib/CodeGen/CodeGenPrepare.cpp
181

Set the default to true here so partial-profile alone can enable all related tweaks?

wmi updated this revision to Diff 262456.May 6 2020, 12:49 PM

change the test.

davidxl added inline comments.May 6 2020, 12:50 PM
llvm/lib/CodeGen/CodeGenPrepare.cpp
181

This should default to be true since the partial profile is also used as a guard -- but perhaps as a follow up after more testing.

wenlei accepted this revision.May 6 2020, 3:55 PM

LGTM, thanks.

llvm/lib/CodeGen/CodeGenPrepare.cpp
181

Ok, sounds good.

This revision is now accepted and ready to land.May 6 2020, 3:55 PM
wmi marked an inline comment as done.May 6 2020, 4:26 PM
wmi added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
181

Yes, that is the plan. We need some extra work in linker and in runtime to make the unknown section work as we expect, and we need to make sure even if there is no support in runtime, the special section should behave just like .text section. Before that is done and tested, we want to keep it off by default.

davidxl accepted this revision.May 6 2020, 8:14 PM

lgtm

wmi added a comment.May 7 2020, 1:05 PM

Thanks for the review. I am going to wait a little bit and let the linker change (https://reviews.llvm.org/D79590) go in first in case any change to the name of the output section is needed.

This revision was automatically updated to reflect the committed changes.