This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
ClosedPublic

Authored by shenhan on Jun 7 2023, 2:44 PM.

Details

Summary

The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile.

For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles.

That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples.

With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This is #1 of the two patches that enables the improvement.

Diff Detail

Event Timeline

shenhan created this revision.Jun 7 2023, 2:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 2:44 PM
shenhan requested review of this revision.Jun 7 2023, 2:44 PM
snehasish added inline comments.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

Can we use < ColdCountThreshold instead of == 0?

ColdCountThreshold defaults to 1 so it should have the same effect.

157

This should be CandidateForSplit.

Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?

162

const LoopInfo& LI?
No need to take the address and then deref it in each use below.

164

I think this new (and the one below) is leaking memory.

Maybe just use vars on the stack like --

const BranchProbabilityInfo BPI(MF.getFunction(), LI);
BlockFrequencyInfo BFI(MF.getFunction(), BPI, LI);
llvm/lib/CodeGen/TargetPassConfig.cpp
1253 ↗(On Diff #529439)

I think the changes to TargetPassConfig should be split out into a separate patch. Also needs a test to ensure that the additional discriminator passes are run where we expect them to be run.

snehasish requested changes to this revision.Jun 7 2023, 3:31 PM
snehasish added inline comments.
llvm/test/CodeGen/X86/fsafdo_mfs_test.ll
1 ↗(On Diff #529439)

I think we can write a smaller test by reusing the code in test/CodeGen/X86/machine-function-splitter.ll. The only thing we need to check there is that if -enable-fs-discriminator and -improved-fs-discriminator=true then the cold count which has value and that value is 0 is split.

The closest test case is foo4 on L65 in machine-function-splitter.ll. I would suggest copying that and adding the necessary check.

This revision now requires changes to proceed.Jun 7 2023, 3:31 PM
wenlei added a subscriber: hoy.Jun 7 2023, 4:37 PM

we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This looks promising. We saw improvements from MFS when used with CSSPGO too, but the improvements were smaller than the numbers here. You are not using Propeller or BOLT in your perf testing above, right?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

+@hoy @wlei

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
91

It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO isn't really special other than having better profile quality comparing to baseline AFDO. If we want to introduce different heuristic based on profile quality, make it explicit, i.e. use something like HasAccurateSampleProfile instead of IsFSAFDOFlavor.

Ideally it'd be best if we can unify FS-AFDO case with IRPGO, so we can use a single flag HasAccurateProfile for all heuristic tweaks for FS-AFDO, IRPGO and CSSPGO.. Maybe this should all be incorporated into PSI instead of having flags spread around.

Note that we also see performance improvements from MFS using CSSPGO, and we're going to turn it on for internal use (all with CSSPGO).

112

So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?

IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?

161

Why this heuristic isn't applicable to IRPGO profile?

llvm/lib/CodeGen/TargetPassConfig.cpp
1288 ↗(On Diff #529439)

Pass3 discriminator is already later than PassLast discriminator before this change, so all code duplication that was covered before the change should already be covered with Pass3 discriminator, do we need another even later PassLast discriminator here?

shenhan updated this revision to Diff 529746.Jun 8 2023, 3:31 PM
shenhan marked 4 inline comments as done.

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

Thanks. I was comparing "1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS". And FS-AFDO profile is collected via -improved-fs-discriminator=true.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
91

Thanks, this is sensible suggestion.

Ideally it'd be best if we can unify FS-AFDO case with IRPGO

Yes, agree. However, currently, there is still a gap between FSAFDO and IRPGO quality, so the need for a flag to separate these 2.

Replaced with HasAccurateSampleProfile.

112

So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference?

Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error. Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter.

IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false?

You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.

157

This should be CandidateForSplit.

Ah, yes.

Then you can remove the use of this var from the pass altogether?

Maybe no, because this pass later also splits ALL eh code later.

if (SplitAllEHCode)
    setDescendantEHBlocksCold(MF);
161

Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot functions where there are plenty of samples, not so if a function only has a few samples. And to be conservative, we only choose to apply split for the former hot functions. Whereas for IRPGO, we trust it when it says this function is cold, and we can proceed to split that function. (Although doing split in cold functions for IRPGO would not make a lot difference, but still, it may have some benefits.)

162

Done.

164

Done.

llvm/lib/CodeGen/TargetPassConfig.cpp
1253 ↗(On Diff #529439)

OK. Removed the file from this CL.

shenhan added inline comments.Jun 8 2023, 3:31 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

Done.

llvm/lib/CodeGen/TargetPassConfig.cpp
1288 ↗(On Diff #529439)

You are right, no need for another round of discriminator here once we hve Pass3. Removed.

Also, per Snehasish's suggest, I'll move the file to another CL.

llvm/test/CodeGen/X86/fsafdo_mfs_test.ll
1 ↗(On Diff #529439)

Thanks. That's good suggestion. I'll remove the testcase from this CL because it has to be together with changes to TargetPassConfig.cpp (so the flag "UsingAccurateSampleProfile" can be passed to MFS pass). And include it with TargetPassConfig.cpp in a new CL.

Thanks. Addressed comments. Also per Snehasish's comment, removed change to TargetPassConfig.cpp and the test case (which requires TargetPassConfig.cpp) and will have them in a separate CL.

hoy added inline comments.Jun 8 2023, 3:46 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
118

Looks like this is the logic the HasAccurateSampleProfile check is bypassing. Is the PercentileCutoff count is too high in your case? IIUC, you want more blocks to be treated as cold to lower the splitting cost. A higher cutoff count should help. Am I missing anything?

BTW, a more accurate profile should be more trustful, and thus the existing logic should work better. The term HasAccurateSampleProfile to bypass the existing logic here is a bit confusing to me.

shenhan edited the summary of this revision. (Show Details)Jun 9 2023, 10:38 AM

I'll remove the testcase from this CL

Actually I would prefer the foo4 based test case in this patch and in the same file too (test/CodeGen/X86/machine-function-splitter.ll). Also added a comment in the other patch D152577. Thanks!

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
157

Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something.

164

Change BranchProbabilityInfo on L163 too?

wenlei added inline comments.Jun 10 2023, 5:55 PM
llvm/include/llvm/CodeGen/Passes.h
67

nit: name it HasAccurateProfile since it covers IRPGO too.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
90

nit: update the comment

112

Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error.

But checking has_value isn't going to be able to differentiate inaccurate counts.

You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.

In this case, we don't need to guard it under HasAccurateProfile as it applies to IRPGO as well.

157

Can we hoist (or tail dup) setDescendantEHBlocksCold and then early return? That would make the code cleaner.

161

Makes sense, would be good to add a comment to explain.

165

We don't need to go through BPI->BFI all over again just to use PSI->isFunctionHotInCallGraph, which is more expensive. Instead, we can implement this directly with MBFI.

MachineSizeOpts has the same need and has implemented machine_size_opts_detail::isFunctionColdInCallGraph, machine_size_opts_detail::isFunctionHotInCallGraphNthPercentile already. Now that we have a second use, it makes sense to refactor these into common MIR hotness helpers.

shenhan updated this revision to Diff 533068.Jun 20 2023, 3:40 PM
shenhan marked 9 inline comments as done.

I'll remove the testcase from this CL

Actually I would prefer the foo4 based test case in this patch and in the same file too (test/CodeGen/X86/machine-function-splitter.ll). Also added a comment in the other patch D152577. Thanks!

Thanks. I see. Done merging my test into machine-function-splitter.ll.

llvm/include/llvm/CodeGen/Passes.h
67

Done.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

Thanks. I made the following change:

Firstly, per hoy's comment, I replaced "if (HasAccurateProfile) {" with "if (!HasAccurateProfile) {", noticing that HasAccurateProfile should be False when using FSAFDO or CSSPGO, and True when using IRPGO. Previously the value is negated.

Secondly, when !HasAccurateProfile && !Count.has_value() (although the latter is unlikely), this needs to return false. Whereas when HasAccurateProfile && !Count.has_value(), this returns true. So I have to guard it with HasAccureateProfile.

Thirdly, the next line "if (!Count)" is not testing "if (Count !=0)", it is testing "if(!Count.has_value())", it is little bit misleading, so I changed it to if(!Count.has_value()), so it is clear why this is guarded under HasAccurateProfile.

Hope this is clearer.

118

Yes, you are right. Sorry, this is a bit confusing. The origin name was IsUsingFSAFDO, and was later renamed to HasAccurateSampleProfile per reviewer's suggestion. And it should equal to the negated value of IsUsingFSAFDO.

So replaced HasAccurateProfile with !HasAccurateProfile and negated the initial value of HasAccurateProfile and updated the comment.

157

Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".

157

Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".

165

This is being addressed in D152758. I'll keep it as is for now (after addressing Snehasish's comments). Once D152758 is submitted, I'll change this accordingly.

shenhan updated this revision to Diff 533309.Jun 21 2023, 9:47 AM
wenlei added inline comments.Jun 23 2023, 11:46 AM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit.

Would it work if we just make a simple and explicit change:

if (!Count)
  return true;

-->

// Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
if (!Count.has_value()) {
  return HasAccurateProfile;
}
165

Once D152758 is submitted, I'll change this accordingly.

Since you stacked the patches already, you can make changes here already. It will make code cleaner and review easier.

175–181

Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true?

Can we move this up to line 177, and do the following directly?

if (UseProfileData) {
  MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
  PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
  // Comments ..
  if (!HasAccurateProfile && !PSI->isFunctionHotInCallGraph(&MF.getFunction(), MBFI)) {
    // Split all EH code and it's descendant statically by default.
    if (SplitAllEHCode)
      setDescendantEHBlocksCold(MF);
    finishAdjustingBasicBlocksAndLandingPads(MF);
    return true;
  }
}
davidxl added inline comments.Jun 23 2023, 12:37 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit.

Would it work if we just make a simple and explicit change:

if (!Count)
  return true;

-->

// Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
if (!Count.has_value()) {
  return HasAccurateProfile;
}

Sounds reasonable.

Missing count in IRPGO does mean very cold (module not even linked into the binary), but for sample PGO, it means unknown -- e.g. new code added which is not sampled. In this sense, their handling should be different.

shenhan updated this revision to Diff 534253.Jun 24 2023, 11:35 AM
shenhan marked an inline comment as done.
shenhan added inline comments.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
175–181

Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true?

Actually no. When UseProfileData == false, candidateForSplit will always be false (regardless of what HasAccurateProfile is), so it will trigger an early return in code block 182.

Can we move this up to line 177, and do the following directly?

Yes, this reads more nature.

tschuett added a subscriber: tschuett.EditedJun 24 2023, 12:23 PM

For optionals please do not use .has_value().

if (optional) {
}

if (optional.has_value()) {
}

Is identical. Please use .has_value()only in unit tests.

Furthermore *optional and optional.value() are identical. The former is used more wildly.

shenhan added a comment.EditedJun 25 2023, 9:17 AM

For optionals please do not use .has_value().

if (optional) {
}

if (optional.has_value()) {
}

Is identical. Please use .has_value()only in unit tests.

Furthermore *optional and optional.value() are identical. The former is used more wildly.

Thanks.

My rationale here to use explicit ".has_value()" is that when "std::optional<int> v" is used in if (!v) or if(v). "std::optional<int> v=0" leads to "if (v)" equals true is counter intuitive to the point of being confusing.

But as you said, to be consistent with llvm existing code practice, I'll drop ".has_value() and .value()" (but put a comment for .has_value() part).

Thanks.

shenhan updated this revision to Diff 534355.Jun 25 2023, 9:25 AM
wenlei added inline comments.Jun 25 2023, 4:32 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
209

This is no longer needed, now that we use MBFI directly.

snehasish added inline comments.Jun 26 2023, 1:21 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
161

nit: accurate profile
or HasAccurateProfile if you want to refer to the flag.

Same below.

163

nit: other hand

225

Do we need to pass this in explicitly anymore? Can we instead infer this locally from ProfileSummaryInfo in runOnMachineFunction?

I think the following is equivalent to the current behaviour

bool HasAccurateProfile = PSI->hasInstrumentationProfile() || PSI->hasCSInstrumentationProfile();
llvm/test/CodeGen/X86/machine-function-splitter.ll
442 ↗(On Diff #534355)

nit: add newline

hoy added inline comments.Jun 26 2023, 4:00 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
118

Thanks for the clarification. Another question, is it intentional to always bypass PercentileCutoff and use ColdCountThreshold instead for FS-AFDO?

shenhan updated this revision to Diff 536035.Jun 29 2023, 3:35 PM
shenhan marked 7 inline comments as done.
shenhan added inline comments.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112
// Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
if (!Count.has_value()) {
  return HasAccurateProfile;
}

I see the point here. However for NonAccurateProfile, we test that Count < ColdCountThreshold, whereas for AccurateProfile, we test that using "PSI->isColdCountNthPercentile(PercentileCutoff, *Count);",the former is more rigorous, resulting in less cold block being split than the HasAccurateProfile. With this difference, we still need to set differentiate these 2 cases by "if (HasAccureateProfile) { } else {}" like below:

if (!Count.has_value()) {
  return HasAccurateProfile;
}

if (!HasAccureateProfile) {
  return *Count < ColdCountThreshold;
}

if (PercentileCutoff > 0) {
  return PSI->isColdCountNthPercentile(PercentileCutoff, *Count);
}
return (*Count < ColdCountThreshold);
// end of function

Do you think we shall use the above version or keep the current one?

118

Yes, currently for FSAFDO, we always use ColdCountThreshold instead of PercentileCutoff.

225

Good point. Removed.

snehasish added inline comments.Jun 29 2023, 4:08 PM
llvm/test/CodeGen/X86/machine-function-splitter.ll
5 ↗(On Diff #536035)

Since we changed the code to use PSI and L417 in the test defines profile format to be "InstrProf" I expected some changes to this test.

Assuming that FSAFDO sets the profile type to "SampleProfile" [1], we can replace the "InstrProf" token using sed prior to passing this bitcode to llc and checking the output.

So the invocation would look like --

RUN: sed 's/InstrProf/SampleProfile/g' %s > %t.ll
RUN: llc < %t.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions | FileCheck %s --check-prefix=FSAFDO-MFS

There are quite a few uses of this pattern in tests in LLVM already[2].

[1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/ProfileSummary.cpp#L82
[2] https://github.com/search?q=repo%3Allvm%2Fllvm-project+path%3Atest+path%3A.ll+content%3A%22+sed+%22&type=code

14–16 ↗(On Diff #536035)

Lets move these to L377 (into the body of foo15) so that they are near the test case like the others in this file.

nit: Also the last test uses foo13, so foo14 would be the next one not foo15.

shenhan updated this revision to Diff 536054.Jun 29 2023, 4:59 PM
shenhan marked 2 inline comments as done.
shenhan added inline comments.
llvm/test/CodeGen/X86/machine-function-splitter.ll
5 ↗(On Diff #536035)

Thanks, the sed .ll file is a trick didn't think of. And thanks for the reference.

14–16 ↗(On Diff #536035)

Thanks. Done moving and change foo15 -> foo14.

shenhan updated this revision to Diff 536056.Jun 29 2023, 5:01 PM
snehasish accepted this revision.Jun 30 2023, 9:40 AM

lgtm, thanks for your patience with all the suggestions!

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
120

Maybe merge this comment with the one above and drop the braces?

llvm/test/CodeGen/X86/machine-function-splitter.ll
16 ↗(On Diff #536056)

Leftover empty line after moving the code?

407 ↗(On Diff #536056)

typo: happen

This revision is now accepted and ready to land.Jun 30 2023, 9:40 AM
wenlei added inline comments.Jul 5 2023, 2:27 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

Can we unify *Count < ColdCountThreshold and PSI->isColdCountNthPercentile(PercentileCutoff, *Count) between the two cases? Is the different handling really necessary - i.e. without the differentiation, sample PGO with MFS would not perform as good? I would keep it simple/unified unless the different handling is perf critical.

xur added inline comments.Jul 5 2023, 3:15 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

They are just two different parameters used in performance tuning: one local to this file and one from PSI.
ColdCountThreshold is local to this file and will not affect other passes.

I think it's harmless to keep as it is just for perf turning point of view.

xur added a comment.Jul 5 2023, 3:28 PM

we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This looks promising. We saw improvements from MFS when used with CSSPGO too, but the improvements were smaller than the numbers here. You are not using Propeller or BOLT in your perf testing above, right?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

+@hoy @wlei

! In D152399#4407001, @shenhan wrote:
What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

Thanks. I was comparing "1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS". And FS-AFDO profile is collected via -improved-fs-discriminator=true.

We enabled unique-internal-linkage-names. Also we used the iterative AFDO (our default release model): which is the FS-AFDO profile was generated from a previous AFDO built binary.
The performance gain can varies a bit with different profiles. For clang, it also varies a bit from source version to source versions -- we have seen 0.8% to 1.2%.

xur accepted this revision.Jul 5 2023, 3:28 PM

LGTM.

wenlei added inline comments.Jul 6 2023, 8:00 PM
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

For long term maintainability I still hope we can keep things as simple as possible, as unified as possible.

It just seems arbitrary to use *Count < ColdCountThreshold for sample PGO, and PSI->isColdCountNthPercentile(PercentileCutoff, *Count) for IRPGO.

If we really want to use different value, we could also use PSI->iisColdCountNthPercentile for both, but with different percentile cutoff.

shenhan updated this revision to Diff 538398.Jul 8 2023, 7:00 PM
shenhan marked 3 inline comments as done.
shenhan marked 2 inline comments as done.Jul 8 2023, 7:05 PM
shenhan added inline comments.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112

Revised the code so now it unifies both "HasAccurateProfile" and "no-HasAccurateProfile".

The usage of ColdCountThreshold and PSI under both cases are now combined (keep unchanged).

And now for MFS-on-FSAFDO, we need to pass -mllvm -mfs-percentile-cutoff=0 to keep the original logic unchanged.

120

Revised the code and now it is more compact, and both "HasAccurateProfile" and no-AccurateProfile are using the same logic.
Comment is also revised.

llvm/test/CodeGen/X86/machine-function-splitter.ll
16 ↗(On Diff #536056)

Deleted line.

wenlei accepted this revision.Jul 10 2023, 10:14 AM

lgtm, thanks.

xur added a comment.Jul 13 2023, 1:07 PM

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

lgtm, thanks for your patience with all the suggestions!

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

Thanks. Yes, currently to use FSAFDO MFS, we need these flags "-fsplit-machine-functions -mllvm -mfs-psi-cutoff=0", which is very unfriendly to normal users and that is not the best practice.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

I'll prepare another CL to do the suggested change.

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

Curious what happens if mfs is turned on for FSAFDO without -mfs-psi-cutoff=0? Does it actually cause regression just because different threshold is used from PSI->isColdCountNthPercentile(PercentileCutoff, *Count)?

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

Yes, querying PSI directly is better.