This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] minimize performance impact when profile-sample-accurate is enabled
ClosedPublic

Authored by wmi on Sep 13 2019, 9:15 AM.

Details

Summary

We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression.

To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline function, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled.

At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.Sep 13 2019, 9:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2019, 9:15 AM

If we use NameTable, is ProfileSymbolList still needed? I assume NameTable is a superset ?

lib/Transforms/IPO/SampleProfile.cpp
1791

This is a linear search. What is the compile time impact?

wmi marked an inline comment as done.Sep 15 2019, 9:52 PM

If we use NameTable, is ProfileSymbolList still needed? I assume NameTable is a superset ?

NameTable is not a superset. ProfileSymbolList contains all the function symbols in the sampled binary. NameTable contains all the symbols in the profile including outline copy, inline instance and call targets. The size of NameTable is usually < 1/10 of the size of ProfileSymbolList -- a lot of function symbols are cold and won't show up in the profile.

lib/Transforms/IPO/SampleProfile.cpp
1791

I will evaluate it and get back.

wmi updated this revision to Diff 220400.Sep 16 2019, 3:59 PM

Create a StringSet for NameTable to avoid linear scanning of the table for each function.

wmi marked an inline comment as done.Sep 16 2019, 4:04 PM
wmi added inline comments.
lib/Transforms/IPO/SampleProfile.cpp
1791

I did the evaluation. I didn't see any compile time increase when compiling an internal server benchmark, but I saw 1~2% compile time increase when bootstrapping clang. I am not very sure about the increase. Although I run comparison twice and saw the increase in both times, the time to bootstrap clang on my desktop is not very stable so it could be noise.

Anyway, I think it is better to avoid the linear scan for every function, so I change NameTable to be a StringSet.

davidxl added inline comments.Sep 17 2019, 9:31 AM
include/llvm/ProfileData/SampleProfReader.h
334

Make it clearer that it includes all the names that have samples either in outline instance or an inline instance. It is a subset of the symbol list of the binary.

lib/Transforms/IPO/SampleProfile.cpp
481

Make this a member function to avoid passing the boolean parameter around.

1616

what are these two flags?

1789

If this is no symbol list, then it should not use this heuristic to be consistent with old behavior?

wmi marked 4 inline comments as done.Sep 17 2019, 10:15 AM
wmi added inline comments.
include/llvm/ProfileData/SampleProfReader.h
334

Will fix.

lib/Transforms/IPO/SampleProfile.cpp
481

Will fix.

1616

They are used to emit a warning when certain ratio of profile records or samples are not used during annotation. callsiteIsHot is used when computing the total samples (Only samples in hot callsite will be considered in total).

1789

I think it is beneficial to have the change when profile-sample-accurate is true no matter symbol list is there or not. symbol list is used to prevent new hot functions from being treated as cold. NamesInProfile is used to prevent functions are mistakenly treated as cold because inlining difference between sampled binary and the IR after early inlining in current build. These two changes are orthogonal to each other and are both beneficial individually.

aprantl removed a subscriber: aprantl.Sep 17 2019, 10:21 AM
wmi marked an inline comment as done.Sep 17 2019, 11:02 AM
wmi added inline comments.
include/llvm/ProfileData/SampleProfReader.h
334

Actually it is not a subset of symbol list of the binary. It is possible that a function is inlined everywhere and the outline instance gets deleted so it won't show up in the symbol list of the binary, but it will show up in the profile.

wmi updated this revision to Diff 220542.Sep 17 2019, 11:04 AM

Address David's comment.

This revision is now accepted and ready to land.Sep 17 2019, 11:57 AM
This revision was automatically updated to reflect the committed changes.
davidxl added inline comments.Sep 23 2019, 9:48 AM
lib/Transforms/IPO/SampleProfile.cpp
1789

Keeping the old behavior without Symbol list may be able to fix Chrome's size regression.

wmi marked an inline comment as done.Sep 23 2019, 10:28 AM
wmi added inline comments.
lib/Transforms/IPO/SampleProfile.cpp
1789

I am evaluating whether we need to do it for both changes: using NamesInProfile and adjusting threshold in callsiteIsHot. I feel maybe not using NamesInProfile without symbol list is enough to remove the size regression. I will evaluate and have a patch for it.

wmi marked an inline comment as done and an inline comment as not done.Sep 23 2019, 11:43 AM
wmi added inline comments.
lib/Transforms/IPO/SampleProfile.cpp
1789

For my experiment, adjusting threshold in callsiteIsHot only increases size by 0.09% so most of the size increase comes from treating names in NamesInProfile as not cold.