Page MenuHomePhabricator

[SampleFDO] For functions without profiles, provide an option to put them in a special text section
Needs ReviewPublic

Authored by wmi on May 28 2019, 11:41 AM.

Details

Reviewers
davidxl
danielcdh
Summary

For sampleFDO, because the optimized build uses profile generated from previous release, we cannot tell a function without profile is truely cold or just newly created. Currently we treat them conservatively and put them in .text section instead of .text.unlikely. The result is when we persuing the best performance by locking .text.hot and .text in memory, and use huge pages for them, we waste a lot of memory and huge pages for functions actually being cold.

We already have a related option called -profile-sample-accurate. If we uses that option, function without profile will be treated as cold and we won't have the memory/huge pages waste problem, but we are going to see performance problem if new functions actually being hot are treated as cold.

To mitigate the problem, we provide another option called --profile-unknown-in-special-section. For functions without profile, we will still treat them conservatively in compiler optimizations -- for example, treat them as warm instead of cold in inliner. When we use profile info to add section prefix for functions, we will discriminate functions known to be not cold versus functions without profile (being unknown), and we will put functions being unknown in a special text section called .text.unknown. Runtime system will have the flexibility to decide where to put the special section in order to achieve a balance between performance and memory saving.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.May 28 2019, 11:41 AM

I don't object to this functionality, but I wonder if this problem would be better solved in some other way. For example, why can't the sampling (post-)processing record the list of (MD5 hashes of) function names in the line table (or symbol table or whatever is relevant). If you have the list, then we don't need to guess whether a function missing from the profile is very cold or hadn't been added yet. I realize that this list might be large, but some kind of on-disk Bloom filter might work well (the Bloom filter will have some small possibility of false positives, but if rarely a cold function is treated as warm, that might be an acceptable tradeoff).

Longer term, the plan is to put a white list of the symbols into the profile data so that the compiler can decide if a function is newly created or simply cold.

Wei, if that is in place, is there a need for this patch?

wmi added a comment.May 28 2019, 2:58 PM

Longer term, the plan is to put a white list of the symbols into the profile data so that the compiler can decide if a function is newly created or simply cold.

Wei, if that is in place, is there a need for this patch?

Yes, that is the longer term plan, and that should bring extra benefit that compiler optimizations will be benefited from the extra information.

We use the current patch as a temporary solution because we saw some urgent demand to save a lot of memory from it. It won't be needed if the longer term plan is in place.

what is the main blocker for the longer term solution?

wmi added a comment.May 28 2019, 3:36 PM

what is the main blocker for the longer term solution?

Like Hal mentioned the symbol list may be large since it include all the symbols. Current autofdo profile only include symbols that are hot/warm, so the whole symbol list should be way larger than the symbol section in current autofdo profile. I havn't looked at whether bloom-filter can help here. If we use md5, we will have a problem on profile-remapping.(https://llvm.org/docs/CommandGuide/llvm-profdata.html#cmdoption-llvm-profdata-merge-remapping-file).

I don't see other blockers.

Why is symbol remapping an issue? old MD5 --> input symbol name -> output symbol name --> mapped MD5sum ?

wmi added a comment.May 28 2019, 4:09 PM

Why is symbol remapping an issue? old MD5 --> input symbol name -> output symbol name --> mapped MD5sum ?

We want to use profile remapping support when we want to find the counterparts in profile for the set of symbols being renamed during optimized build after a large scale codebase refactoring. The profile remapping support doesn't work well if the symbols in the profile are md5. If the symbol whitelist contain md5, after refactoring, profile remapping will not find old symbols being renamed in the whitelist and will think they are all new symbols.

Existing profile remapping doesn't support comparing old MD5 with the mapped MD5. profile remapping is based on C++ mangling, so only symbol names are supported during remapping.