Page MenuHomePhabricator

Add prefix based function layout when profile is available.
ClosedPublic

Authored by danielcdh on Feb 19 2016, 11:55 AM.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 48540.Feb 19 2016, 11:55 AM
danielcdh retitled this revision from to Add prefix based function layout when profile is available..
danielcdh updated this object.
danielcdh added a reviewer: davidxl.
danielcdh added a subscriber: llvm-commits.
davidxl added inline comments.Feb 19 2016, 12:23 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
303
  1. please introduce placeholder query APIs in include/llvm/ProfileData/ProfileCommon.h for this purpose (can be a static method of ProfileSummary class:

    bool ProfileSummary::isFunctionHot(Function *); bool ProfileSummary::isFunctionUnlikely(Function *)
  1. using the existence of entry profile count as the criteria is wrong -- the entry count can be 0 or a very low value. The right tuning will need to look at the max Block count of the function and program summary -- so I think we need to leave that part out and wait for Easwaran's tuning work is in
  1. Before the summary based work is in, for now, implement a simpler heuristic similar to what is used in the current inliner:

    a) check attribute 'hot' of the function (in practice, this may not work well -- after ininling, the out of line copy of the original hot function may become cold) b) check attribute 'cold' for the function -- this is more reliable c) if the function's has zero bb count/sample, mark it as cold (Inliner's herustics is 0.01* maxFunctionCount) -- but I think we can be more conservative here by checking zero count ones)

In short, the initial heuristic should be conservative in the sense we only filter out those we are really confident with -- mainly cold ones never executed which can go a long way.

  1. Define two helper functions to return .hot and .unlikely suffixes instead of using hard coded strings here.
  1. We can also introduce an internal option to turn this layout optimization on |off. By default it can be off. When it is off, the query functions above will simply return 'false'.
danielcdh updated this revision to Diff 48546.Feb 19 2016, 2:05 PM

Integrate David's comments.

davidxl added inline comments.Feb 19 2016, 2:21 PM
include/llvm/ProfileData/ProfileCommon.h
73

Document this method -- what does it do?

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
248

Move these two functions into ProfileSummary.h (as a standalone function in llvm:: namespace)

252

Better name it as: GroupFunctionsByHotness

255

I think the initial value should be false. To turn it on by default, more data is needed (and announced more widely).

lib/ProfileData/ProfileSummary.cpp
82

use FIXME

88

FIXME

99

FIXME

davidxl added a subscriber: eraman.Feb 19 2016, 2:21 PM
danielcdh updated this revision to Diff 48549.Feb 19 2016, 2:39 PM
danielcdh marked 7 inline comments as done.

Integrate comments.

davidxl added inline comments.Feb 19 2016, 3:19 PM
lib/ProfileData/ProfileSummary.cpp
81

If this takes Function * as an argument, it can be a wrapper of getEntryCount call

92

I suggest changing this interface's name to be 'isFunctionRarelyCalled'. It is different from 'isFunctionCold' which should be checking the internal block count. These two interfaces are for different purposes. The later can be used to guide decisions such as optimize for size etc, while the former can be used for layout decision to reduce itlb misses.

Similarly, isFunctionHot does not mean it is frequently called either -- so the interface need to be split up too.

96

the check of hasProfile is probably redundant.

eraman added inline comments.Feb 19 2016, 3:39 PM
include/llvm/ProfileData/ProfileCommon.h
33

These two functions should be elsewhere. Perhaps in TargetLowering ?

lib/ProfileData/ProfileSummary.cpp
92

This and isFunctionHot should be made as a member functions of Function.

96

I think what David says is true (!F->genEntryCount() implies !hasProfile()).

davidxl added inline comments.Feb 19 2016, 3:42 PM
include/llvm/ProfileData/ProfileCommon.h
33

I think it is fine to be here -- it is generic enough and it is profile related.

lib/ProfileData/ProfileSummary.cpp
92

I think it is better to keep all/most of the profile related query APIs here in this file.

eraman added inline comments.Feb 19 2016, 4:03 PM
include/llvm/ProfileData/ProfileCommon.h
33

The strings themselves are not profile related. One might want to use .unlikely for error handling routines even without profile data.

lib/ProfileData/ProfileSummary.cpp
92

Again, the function itself is not fully profile dependent. I think ProfileCommon should only have APIs like isCountConsideredHot(uint64_t Count).

davidxl added inline comments.Feb 19 2016, 4:13 PM
include/llvm/ProfileData/ProfileCommon.h
33

Aren't those also related to profile too (it is profile that can be determined statically)?

lib/ProfileData/ProfileSummary.cpp
92

You have a point about Function being not profile dependent, but I think we should not restrict the scope of ProfileCommon.h too much like this. It is reasonable for ProfileCommon to reference IR related headers.

danielcdh updated this revision to Diff 48558.Feb 19 2016, 4:40 PM

update

lib/ProfileData/ProfileSummary.cpp
81

!getEntryCount() does not mean no profile for the binary, but no profile for the function. We need to have this function to indicate if this is an FDO/AFDO build.

92

After a second thought, I think it still makes sense to group "hot" functions together instead of "frequently called" This is because the hot path of an "infrequently called" hot function may have many calls to a frequently called function. So if these two functions are far apart, it will waste an ITLB entry. Additionally, this will pollute the heat map and makes performance tuning more difficult.

96

I've updated the logic, PTAL.

davidxl added inline comments.Feb 19 2016, 4:51 PM
include/llvm/ProfileData/ProfileCommon.h
74

If it is used to indicate profile availability at program level, it should take Module as argument.

lib/ProfileData/ProfileSummary.cpp
100

Should return false -- as we don't know (no profile data)

103

This will work well for instr profile but may produce false positive for sampleFDO -- this will be fixed when summary based computation is used.

danielcdh marked an inline comment as done.Feb 19 2016, 5:09 PM
danielcdh added inline comments.
lib/ProfileData/ProfileSummary.cpp
100

For AutoFDO, F->setEntryCount() is only called when there is sample in that function. So for those 0-sampled functions, F->getEntryCount() will be false.

103

Yes, but we can always set entry-count as no less than 1 if the function has profile. Otherwise we have no way to calculate bb counts.

davidxl edited edge metadata.Feb 20 2016, 11:04 AM

Looks good in general.

  1. Need a test case covering both regular case and -ffunction-sections (-function-sections for llc)
  1. Need a follow up discussion to introduce user level option -freorder-function (and turn on by default with profile) in LLVM. Also talk to other maintainers of other main platforms (MachO, etc) to add this support there.
eraman added inline comments.Feb 21 2016, 1:43 PM
lib/ProfileData/ProfileSummary.cpp
83

return M->getMaximumFunctionCount().hasValue();

I know this doesn't work for sample profile, but it works for instrumentation profile and is better than returning false.

100

I think sample profile loader pass should set 0 to the entry count. The meaning of getEntryCount() should not be dependent on the profile format that is used. (To be clear, I am not saying that should be fixed as part of this patch)

danielcdh updated this revision to Diff 48716.Feb 22 2016, 11:38 AM
danielcdh edited edge metadata.

add test and remove hasProfile.

added the unittest. also removed hasProfile to use getEntryCount()'s boolean value to check. I will have a follow-up patch to make getEntryCount() behavior consistent between sample profile and instr profile.

davidxl added inline comments.Feb 22 2016, 1:31 PM
test/CodeGen/X86/partition-sections.ll
2

Can you also add a test without -function-sections option.

danielcdh updated this revision to Diff 48723.Feb 22 2016, 1:41 PM

add more test

davidxl accepted this revision.Feb 22 2016, 2:01 PM
davidxl edited edge metadata.

LGTM.

To repeat the follow ups needed:

  1. tuning, performance number and turning on by default for PGO
  2. support for other platforms (and consider user options if applicable).
This revision is now accepted and ready to land.Feb 22 2016, 2:01 PM
danielcdh closed this revision.Feb 22 2016, 2:18 PM

This is broken on OSX.

The test case misses mtriple option.

Dehao, we should not leave buildbot failing for long period of time.
The buildbot message should arrive at your inbox very shortly after
the commit -- can you check if there is any issue there (e.g. in spam
filter)?

thanks,

David