This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
40/46
MachineFunctionSplitter.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
7/8
machine-function-splitter.ll

Differential D152399

[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
ClosedPublic

Authored by shenhan on Jun 7 2023, 2:44 PM.

Download Raw Diff

Details

Reviewers

xur
tmsriram
snehasish
wenlei

Commits

rG8df75969ae70: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.

Summary

The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile.

For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles.

That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples.

With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This is #1 of the two patches that enables the improvement.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shenhan created this revision.Jun 7 2023, 2:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 2:44 PM

Herald added subscribers: wlei, wenlei, pengfei and 2 others. · View Herald Transcript

shenhan requested review of this revision.Jun 7 2023, 2:44 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 7 2023, 2:44 PM

snehasish added a subscriber: snehasish.Jun 7 2023, 3:26 PM

snehasish added inline comments.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	Can we use `< ColdCountThreshold` instead of `== 0`? `ColdCountThreshold` defaults to 1 so it should have the same effect.
155	This should be `CandidateForSplit`. Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether?
163	`const LoopInfo& LI`? No need to take the address and then deref it in each use below.
165	I think this new (and the one below) is leaking memory. Maybe just use vars on the stack like -- const BranchProbabilityInfo BPI(MF.getFunction(), LI); BlockFrequencyInfo BFI(MF.getFunction(), BPI, LI);
llvm/lib/CodeGen/TargetPassConfig.cpp
1253 ↗	(On Diff #529439)	I think the changes to TargetPassConfig should be split out into a separate patch. Also needs a test to ensure that the additional discriminator passes are run where we expect them to be run.

Harbormaster completed remote builds in B237372: Diff 529439.Jun 7 2023, 3:27 PM

snehasish requested changes to this revision.Jun 7 2023, 3:31 PM

snehasish added inline comments.

llvm/test/CodeGen/X86/fsafdo_mfs_test.ll
1 ↗	(On Diff #529439)	I think we can write a smaller test by reusing the code in test/CodeGen/X86/machine-function-splitter.ll. The only thing we need to check there is that if `-enable-fs-discriminator` and `-improved-fs-discriminator=true` then the cold count which has value and that value is 0 is split. The closest test case is foo4 on L65 in machine-function-splitter.ll. I would suggest copying that and adding the necessary check.

This revision now requires changes to proceed.Jun 7 2023, 3:31 PM

we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This looks promising. We saw improvements from MFS when used with CSSPGO too, but the improvements were smaller than the numbers here. You are not using Propeller or BOLT in your perf testing above, right?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

+@hoy @wlei

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
88	It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO isn't really special other than having better profile quality comparing to baseline AFDO. If we want to introduce different heuristic based on profile quality, make it explicit, i.e. use something like `HasAccurateSampleProfile` instead of `IsFSAFDOFlavor`. Ideally it'd be best if we can unify FS-AFDO case with IRPGO, so we can use a single flag `HasAccurateProfile` for all heuristic tweaks for FS-AFDO, IRPGO and CSSPGO.. Maybe this should all be incorporated into PSI instead of having flags spread around. Note that we also see performance improvements from MFS using CSSPGO, and we're going to turn it on for internal use (all with CSSPGO).
112	So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`?
162	Why this heuristic isn't applicable to IRPGO profile?
llvm/lib/CodeGen/TargetPassConfig.cpp
1288 ↗	(On Diff #529439)	Pass3 discriminator is already later than PassLast discriminator before this change, so all code duplication that was covered before the change should already be covered with Pass3 discriminator, do we need another even later PassLast discriminator here?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

Thanks. I was comparing "1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS". And FS-AFDO profile is collected via -improved-fs-discriminator=true.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
88	Thanks, this is sensible suggestion. Ideally it'd be best if we can unify FS-AFDO case with IRPGO Yes, agree. However, currently, there is still a gap between FSAFDO and IRPGO quality, so the need for a flag to separate these 2. Replaced with HasAccurateSampleProfile.
112	So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error. Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter. IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false? You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown.
155	This should be CandidateForSplit. Ah, yes. Then you can remove the use of this var from the pass altogether? Maybe no, because this pass later also splits ALL eh code later. if (SplitAllEHCode) setDescendantEHBlocksCold(MF);
162	Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot functions where there are plenty of samples, not so if a function only has a few samples. And to be conservative, we only choose to apply split for the former hot functions. Whereas for IRPGO, we trust it when it says this function is cold, and we can proceed to split that function. (Although doing split in cold functions for IRPGO would not make a lot difference, but still, it may have some benefits.)
163	Done.
165	Done.
llvm/lib/CodeGen/TargetPassConfig.cpp
1253 ↗	(On Diff #529439)	OK. Removed the file from this CL.

shenhan added inline comments.Jun 8 2023, 3:31 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	Done.
llvm/lib/CodeGen/TargetPassConfig.cpp
1288 ↗	(On Diff #529439)	You are right, no need for another round of discriminator here once we hve Pass3. Removed. Also, per Snehasish's suggest, I'll move the file to another CL.
llvm/test/CodeGen/X86/fsafdo_mfs_test.ll
1 ↗	(On Diff #529439)	Thanks. That's good suggestion. I'll remove the testcase from this CL because it has to be together with changes to TargetPassConfig.cpp (so the flag "UsingAccurateSampleProfile" can be passed to MFS pass). And include it with TargetPassConfig.cpp in a new CL.

Thanks. Addressed comments. Also per Snehasish's comment, removed change to TargetPassConfig.cpp and the test case (which requires TargetPassConfig.cpp) and will have them in a separate CL.

hoy added inline comments.Jun 8 2023, 3:46 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
117	Looks like this is the logic the `HasAccurateSampleProfile` check is bypassing. Is the `PercentileCutoff` count is too high in your case? IIUC, you want more blocks to be treated as cold to lower the splitting cost. A higher cutoff count should help. Am I missing anything? BTW, a more accurate profile should be more trustful, and thus the existing logic should work better. The term `HasAccurateSampleProfile` to bypass the existing logic here is a bit confusing to me.

Harbormaster completed remote builds in B237601: Diff 529746.Jun 8 2023, 4:13 PM

shenhan edited the summary of this revision. (Show Details)Jun 9 2023, 10:38 AM

shenhan mentioned this in D152577: Part 2 of Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..Jun 9 2023, 12:31 PM

shenhan added a child revision: D152577: Part 2 of Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..

I'll remove the testcase from this CL

Actually I would prefer the foo4 based test case in this patch and in the same file too (test/CodeGen/X86/machine-function-splitter.ll). Also added a comment in the other patch D152577. Thanks!

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
155	Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something.
165	Change BranchProbabilityInfo on L163 too?

wenlei added inline comments.Jun 10 2023, 5:55 PM

llvm/include/llvm/CodeGen/Passes.h
67 ↗	(On Diff #529746)	nit: name it `HasAccurateProfile` since it covers IRPGO too.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
87	nit: update the comment
112	Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error. But checking `has_value` isn't going to be able to differentiate inaccurate counts. You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. In this case, we don't need to guard it under `HasAccurateProfile` as it applies to IRPGO as well.
155	Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make the code cleaner.
162	Makes sense, would be good to add a comment to explain.
166	We don't need to go through BPI->BFI all over again just to use `PSI->isFunctionHotInCallGraph`, which is more expensive. Instead, we can implement this directly with MBFI. MachineSizeOpts has the same need and has implemented `machine_size_opts_detail::isFunctionColdInCallGraph`, `machine_size_opts_detail::isFunctionHotInCallGraphNthPercentile` already. Now that we have a second use, it makes sense to refactor these into common MIR hotness helpers.

shenhan mentioned this in D152758: [NFC] Refactor MBB hotness/coldness into templated PSI functions.Jun 12 2023, 4:40 PM

shenhan added a parent revision: D152758: [NFC] Refactor MBB hotness/coldness into templated PSI functions.

shenhan updated this revision to Diff 533068.Jun 20 2023, 3:40 PM

shenhan marked 9 inline comments as done.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 20 2023, 3:40 PM

In D152399#4410073, @snehasish wrote:

I'll remove the testcase from this CL

Actually I would prefer the foo4 based test case in this patch and in the same file too (test/CodeGen/X86/machine-function-splitter.ll). Also added a comment in the other patch D152577. Thanks!

Thanks. I see. Done merging my test into machine-function-splitter.ll.

llvm/include/llvm/CodeGen/Passes.h
67 ↗	(On Diff #529746)	Done.
llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	Thanks. I made the following change: Firstly, per hoy's comment, I replaced "if (HasAccurateProfile) {" with "if (!HasAccurateProfile) {", noticing that HasAccurateProfile should be False when using FSAFDO or CSSPGO, and True when using IRPGO. Previously the value is negated. Secondly, when !HasAccurateProfile && !Count.has_value() (although the latter is unlikely), this needs to return false. Whereas when HasAccurateProfile && !Count.has_value(), this returns true. So I have to guard it with HasAccureateProfile. Thirdly, the next line "if (!Count)" is not testing "if (Count !=0)", it is testing "if(!Count.has_value())", it is little bit misleading, so I changed it to if(!Count.has_value()), so it is clear why this is guarded under HasAccurateProfile. Hope this is clearer.
117	Yes, you are right. Sorry, this is a bit confusing. The origin name was IsUsingFSAFDO, and was later renamed to HasAccurateSampleProfile per reviewer's suggestion. And it should equal to the negated value of IsUsingFSAFDO. So replaced HasAccurateProfile with !HasAccurateProfile and negated the initial value of HasAccurateProfile and updated the comment.
155	Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".
155	Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads".
166	This is being addressed in D152758. I'll keep it as is for now (after addressing Snehasish's comments). Once D152758 is submitted, I'll change this accordingly.

Harbormaster completed remote builds in B240110: Diff 533068.Jun 20 2023, 4:37 PM

shenhan updated this revision to Diff 533309.Jun 21 2023, 9:47 AM

Harbormaster completed remote builds in B240282: Diff 533309.Jun 21 2023, 10:49 AM

wenlei added inline comments.Jun 23 2023, 11:46 AM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp

112

If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit.

Would it work if we just make a simple and explicit change:

if (!Count)
  return true;

-->

// Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. 
if (!Count.has_value()) {
  return HasAccurateProfile;
}

166

Once D152758 is submitted, I'll change this accordingly.

Since you stacked the patches already, you can make changes here already. It will make code cleaner and review easier.

176–182

Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true?

Can we move this up to line 177, and do the following directly?

if (UseProfileData) {
  MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
  PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
  // Comments ..
  if (!HasAccurateProfile && !PSI->isFunctionHotInCallGraph(&MF.getFunction(), MBFI)) {
    // Split all EH code and it's descendant statically by default.
    if (SplitAllEHCode)
      setDescendantEHBlocksCold(MF);
    finishAdjustingBasicBlocksAndLandingPads(MF);
    return true;
  }
}

+@davidxl

davidxl added inline comments.Jun 23 2023, 12:37 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit. Would it work if we just make a simple and explicit change: if (!Count) return true; --> // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. if (!Count.has_value()) { return HasAccurateProfile; } Sounds reasonable. Missing count in IRPGO does mean very cold (module not even linked into the binary), but for sample PGO, it means unknown -- e.g. new code added which is not sampled. In this sense, their handling should be different.

shenhan updated this revision to Diff 534253.Jun 24 2023, 11:35 AM

shenhan marked an inline comment as done.

shenhan added inline comments.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
176–182	Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true? Actually no. When UseProfileData == false, candidateForSplit will always be false (regardless of what HasAccurateProfile is), so it will trigger an early return in code block 182. Can we move this up to line 177, and do the following directly? Yes, this reads more nature.

For optionals please do not use .has_value().

if (optional) {
}

if (optional.has_value()) {
}

Is identical. Please use .has_value()only in unit tests.

Furthermore *optional and optional.value() are identical. The former is used more wildly.

Harbormaster completed remote builds in B240995: Diff 534253.Jun 24 2023, 1:45 PM

In D152399#4446768, @tschuett wrote:
For optionals please do not use .has_value().
if (optional) {
}

if (optional.has_value()) {
}
Is identical. Please use .has_value()only in unit tests.

Furthermore *optional and optional.value() are identical. The former is used more wildly.

Thanks.

My rationale here to use explicit ".has_value()" is that when "std::optional<int> v" is used in if (!v) or if(v). "std::optional<int> v=0" leads to "if (v)" equals true is counter intuitive to the point of being confusing.

But as you said, to be consistent with llvm existing code practice, I'll drop ".has_value() and .value()" (but put a comment for .has_value() part).

Thanks.

shenhan updated this revision to Diff 534355.Jun 25 2023, 9:25 AM

Harbormaster completed remote builds in B241038: Diff 534355.Jun 25 2023, 12:03 PM

wenlei added inline comments.Jun 25 2023, 4:32 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
213	This is no longer needed, now that we use MBFI directly.

snehasish added inline comments.Jun 26 2023, 1:21 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
162	nit: accurate profile or HasAccurateProfile if you want to refer to the flag. Same below.
164	nit: other hand
229	Do we need to pass this in explicitly anymore? Can we instead infer this locally from ProfileSummaryInfo in runOnMachineFunction? I think the following is equivalent to the current behaviour bool HasAccurateProfile = PSI->hasInstrumentationProfile() \|\| PSI->hasCSInstrumentationProfile();
llvm/test/CodeGen/X86/machine-function-splitter.ll
475	nit: add newline

hoy added inline comments.Jun 26 2023, 4:00 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
117	Thanks for the clarification. Another question, is it intentional to always bypass `PercentileCutoff` and use `ColdCountThreshold` instead for FS-AFDO?

shenhan mentioned this in rGc3e33720403c: [NFC] Refactor MBB hotness/coldness into templated PSI functions..Jun 26 2023, 9:58 PM

shenhan mentioned this in D153927: Resubmit: [Analysis] Refactor MBB hotness/coldness into templated PSI functions.Jun 27 2023, 3:40 PM

shenhan mentioned this in rG4d8cf2ae6804: Resubmit with fix: [NFC] Refactor MBB hotness/coldness into templated PSI….Jun 27 2023, 10:03 PM

shenhan updated this revision to Diff 536035.Jun 29 2023, 3:35 PM

shenhan marked 7 inline comments as done.

shenhan added inline comments.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	// Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. if (!Count.has_value()) { return HasAccurateProfile; } I see the point here. However for NonAccurateProfile, we test that Count < ColdCountThreshold, whereas for AccurateProfile, we test that using "PSI->isColdCountNthPercentile(PercentileCutoff, Count);",the former is more rigorous, resulting in less cold block being split than the HasAccurateProfile. With this difference, we still need to set differentiate these 2 cases by "if (HasAccureateProfile) { } else {}" like below: if (!Count.has_value()) { return HasAccurateProfile; } if (!HasAccureateProfile) { return Count < ColdCountThreshold; } if (PercentileCutoff > 0) { return PSI->isColdCountNthPercentile(PercentileCutoff, Count); } return (Count < ColdCountThreshold); // end of function Do you think we shall use the above version or keep the current one?
117	Yes, currently for FSAFDO, we always use ColdCountThreshold instead of PercentileCutoff.
229	Good point. Removed.

snehasish added inline comments.Jun 29 2023, 4:08 PM

llvm/test/CodeGen/X86/machine-function-splitter.ll
5	Since we changed the code to use PSI and L417 in the test defines profile format to be "InstrProf" I expected some changes to this test. Assuming that FSAFDO sets the profile type to "SampleProfile" [1], we can replace the "InstrProf" token using sed prior to passing this bitcode to llc and checking the output. So the invocation would look like -- RUN: sed 's/InstrProf/SampleProfile/g' %s > %t.ll RUN: llc < %t.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s --check-prefix=FSAFDO-MFS There are quite a few uses of this pattern in tests in LLVM already[2]. [1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/ProfileSummary.cpp#L82 [2] https://github.com/search?q=repo%3Allvm%2Fllvm-project+path%3Atest+path%3A.ll+content%3A%22+sed+%22&type=code
16–18	Lets move these to L377 (into the body of foo15) so that they are near the test case like the others in this file. nit: Also the last test uses foo13, so foo14 would be the next one not foo15.

shenhan updated this revision to Diff 536054.Jun 29 2023, 4:59 PM

shenhan marked 2 inline comments as done.

shenhan added inline comments.

llvm/test/CodeGen/X86/machine-function-splitter.ll
5	Thanks, the sed .ll file is a trick didn't think of. And thanks for the reference.
16–18	Thanks. Done moving and change foo15 -> foo14.

shenhan updated this revision to Diff 536056.Jun 29 2023, 5:01 PM

Harbormaster completed remote builds in B242286: Diff 536056.Jun 29 2023, 8:11 PM

lgtm, thanks for your patience with all the suggestions!

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
119	Maybe merge this comment with the one above and drop the braces?
llvm/test/CodeGen/X86/machine-function-splitter.ll
16	Leftover empty line after moving the code?
406	typo: happen

This revision is now accepted and ready to land.Jun 30 2023, 9:40 AM

wenlei added inline comments.Jul 5 2023, 2:27 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	Can we unify `Count < ColdCountThreshold` and `PSI->isColdCountNthPercentile(PercentileCutoff, Count)` between the two cases? Is the different handling really necessary - i.e. without the differentiation, sample PGO with MFS would not perform as good? I would keep it simple/unified unless the different handling is perf critical.

xur added inline comments.Jul 5 2023, 3:15 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	They are just two different parameters used in performance tuning: one local to this file and one from PSI. ColdCountThreshold is local to this file and will not affect other passes. I think it's harmless to keep as it is just for perf turning point of view.

In D152399#4404698, @wenlei wrote:

we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.

This looks promising. We saw improvements from MFS when used with CSSPGO too, but the improvements were smaller than the numbers here. You are not using Propeller or BOLT in your perf testing above, right?

What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

+@hoy @wlei

! In D152399#4407001, @shenhan wrote:
What is the baseline in your experiments? are you comparing 1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS, all using this change; or comparing 2) after vs before this patch, using FS-AFDO + MFS on both sides?

Thanks. I was comparing "1) FS-AFDO w/ MFS vs FS-AFDO w/o MFS". And FS-AFDO profile is collected via -improved-fs-discriminator=true.

We enabled unique-internal-linkage-names. Also we used the iterative AFDO (our default release model): which is the FS-AFDO profile was generated from a previous AFDO built binary.
The performance gain can varies a bit with different profiles. For clang, it also varies a bit from source version to source versions -- we have seen 0.8% to 1.2%.

LGTM.

wenlei added inline comments.Jul 6 2023, 8:00 PM

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	For long term maintainability I still hope we can keep things as simple as possible, as unified as possible. It just seems arbitrary to use `Count < ColdCountThreshold` for sample PGO, and `PSI->isColdCountNthPercentile(PercentileCutoff, Count)` for IRPGO. If we really want to use different value, we could also use `PSI->iisColdCountNthPercentile` for both, but with different percentile cutoff.

shenhan updated this revision to Diff 538398.Jul 8 2023, 7:00 PM

shenhan marked 3 inline comments as done.

shenhan marked 2 inline comments as done.Jul 8 2023, 7:05 PM

shenhan added inline comments.

llvm/lib/CodeGen/MachineFunctionSplitter.cpp
112	Revised the code so now it unifies both "HasAccurateProfile" and "no-HasAccurateProfile". The usage of ColdCountThreshold and PSI under both cases are now combined (keep unchanged). And now for MFS-on-FSAFDO, we need to pass -mllvm -mfs-percentile-cutoff=0 to keep the original logic unchanged.
119	Revised the code and now it is more compact, and both "HasAccurateProfile" and no-AccurateProfile are using the same logic. Comment is also revised.
llvm/test/CodeGen/X86/machine-function-splitter.ll
16	Deleted line.

Harbormaster completed remote builds in B243961: Diff 538398.Jul 8 2023, 7:56 PM

lgtm, thanks.

Closed by commit rG8df75969ae70: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. (authored by shenhan). · Explain WhyJul 10 2023, 4:02 PM

This revision was automatically updated to reflect the committed changes.

shenhan added a commit: rG8df75969ae70: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

In D152399#4464394, @snehasish wrote:

lgtm, thanks for your patience with all the suggestions!

In D152399#4498827, @xur wrote:

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

Thanks. Yes, currently to use FSAFDO MFS, we need these flags "-fsplit-machine-functions -mllvm -mfs-psi-cutoff=0", which is very unfriendly to normal users and that is not the best practice.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

I'll prepare another CL to do the suggested change.

In D152399#4498827, @xur wrote:

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

Curious what happens if mfs is turned on for FSAFDO without -mfs-psi-cutoff=0? Does it actually cause regression just because different threshold is used from PSI->isColdCountNthPercentile(PercentileCutoff, *Count)?

I know the newer change was to unify the heuristic code in isColdBlock. But I don't like the change here.

I would propose to replace HasAccurateProfile parameter in isColdBlock with an enum of profileKind (InstruProfile, Sample Profile, or Sample Profile with FSAFDO etc). And we can have different heuristic for different profile kind.

Yes, querying PSI directly is better.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineFunctionSplitter.cpp

52 lines

test/

CodeGen/

X86/

machine-function-splitter.ll

68 lines

Diff 538864

llvm/lib/CodeGen/MachineFunctionSplitter.cpp

Show All 18 Lines
// the overall changes to the binary size are negligible; only a small number of		// the overall changes to the binary size are negligible; only a small number of
// additional jump instructions may be introduced.		// additional jump instructions may be introduced.
//		//
// For the original RFC of this pass please see		// For the original RFC of this pass please see
// https://groups.google.com/d/msg/llvm-dev/RUegaMg-iqc/wFAVxa6fCgAJ		// https://groups.google.com/d/msg/llvm-dev/RUegaMg-iqc/wFAVxa6fCgAJ
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/EHUtils.h"		#include "llvm/Analysis/EHUtils.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/CodeGen/BasicBlockSectionUtils.h"		#include "llvm/CodeGen/BasicBlockSectionUtils.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	StringRef getPassName() const override {
return "Machine Function Splitter Transformation";		return "Machine Function Splitter Transformation";
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;
};		};
} // end anonymous namespace		} // end anonymous namespace

		wenleiUnsubmitted Done Reply Inline Actions nit: update the comment wenlei: nit: update the comment
/// setDescendantEHBlocksCold - This splits all EH pads and blocks reachable		/// setDescendantEHBlocksCold - This splits all EH pads and blocks reachable
		wenleiUnsubmitted Done Reply Inline Actions It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO isn't really special other than having better profile quality comparing to baseline AFDO. If we want to introduce different heuristic based on profile quality, make it explicit, i.e. use something like `HasAccurateSampleProfile` instead of `IsFSAFDOFlavor`. Ideally it'd be best if we can unify FS-AFDO case with IRPGO, so we can use a single flag `HasAccurateProfile` for all heuristic tweaks for FS-AFDO, IRPGO and CSSPGO.. Maybe this should all be incorporated into PSI instead of having flags spread around. Note that we also see performance improvements from MFS using CSSPGO, and we're going to turn it on for internal use (all with CSSPGO). wenlei: It'd be better to not couple this with FS-AFDO. IIUC, the key here is profile quality, FS-AFDO…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Thanks, this is sensible suggestion. Ideally it'd be best if we can unify FS-AFDO case with IRPGO Yes, agree. However, currently, there is still a gap between FSAFDO and IRPGO quality, so the need for a flag to separate these 2. Replaced with HasAccurateSampleProfile. shenhan: Thanks, this is sensible suggestion. > Ideally it'd be best if we can unify FS-AFDO case with…
/// only by EH pad as cold. This will help mark EH pads statically cold		/// only by EH pad as cold. This will help mark EH pads statically cold
/// instead of relying on profile data.		/// instead of relying on profile data.
static void setDescendantEHBlocksCold(MachineFunction &MF) {		static void setDescendantEHBlocksCold(MachineFunction &MF) {
DenseSet<MachineBasicBlock *> EHBlocks;		DenseSet<MachineBasicBlock *> EHBlocks;
computeEHOnlyBlocks(MF, EHBlocks);		computeEHOnlyBlocks(MF, EHBlocks);
for (auto Block : EHBlocks) {		for (auto Block : EHBlocks) {
Block->setSectionID(MBBSectionID::ColdSectionID);		Block->setSectionID(MBBSectionID::ColdSectionID);
}		}
}		}

		static void finishAdjustingBasicBlocksAndLandingPads(MachineFunction &MF) {
		auto Comparator = [](const MachineBasicBlock &X, const MachineBasicBlock &Y) {
		return X.getSectionID().Type < Y.getSectionID().Type;
		};
		llvm::sortBasicBlocksAndUpdateBranches(MF, Comparator);
		llvm::avoidZeroOffsetLandingPad(MF);
		}

static bool isColdBlock(const MachineBasicBlock &MBB,		static bool isColdBlock(const MachineBasicBlock &MBB,
const MachineBlockFrequencyInfo *MBFI,		const MachineBlockFrequencyInfo *MBFI,
ProfileSummaryInfo *PSI) {		ProfileSummaryInfo *PSI, bool HasAccurateProfile) {
std::optional<uint64_t> Count = MBFI->getBlockProfileCount(&MBB);		std::optional<uint64_t> Count = MBFI->getBlockProfileCount(&MBB);
		// If using accurate profile, no count means cold.
		// If no accurate profile, no count means "do not judge
		snehasishUnsubmitted Done Reply Inline Actions Can we use `< ColdCountThreshold` instead of `== 0`? `ColdCountThreshold` defaults to 1 so it should have the same effect. snehasish: Can we use `< ColdCountThreshold` instead of `== 0`? `ColdCountThreshold` defaults to 1 so it…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Done. shenhan: Done.
		wenleiUnsubmitted Done Reply Inline Actions So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? IIRC after profile inference, not sampled block will still get zero counts. What produces `Count.has_value() == false`? wenlei: So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the…
		shenhanAuthorUnsubmitted Done Reply Inline Actions So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the difference? Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error. Since the cost of splitting a non-cold block is high, we only split when we are sure of the block counter. IIRC after profile inference, not sampled block will still get zero counts. What produces Count.has_value() == false? You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. shenhan: > So for IRPGO, we want to treat unknown as zero/cold, but not so for FSAFDO? Why is the…
		wenleiUnsubmitted Done Reply Inline Actions Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher confidence, but less so for FSAFDO. The latter could be a miss sampling or inference / propagation error. But checking `has_value` isn't going to be able to differentiate inaccurate counts. You are right. This is an extra caution in (unlikely) case when Count has no value and we don't want an exception to be thrown. In this case, we don't need to guard it under `HasAccurateProfile` as it applies to IRPGO as well. wenlei: > Yes, the reason is that when IRPGO dictates a zero block counter, we trust it with higher…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Thanks. I made the following change: Firstly, per hoy's comment, I replaced "if (HasAccurateProfile) {" with "if (!HasAccurateProfile) {", noticing that HasAccurateProfile should be False when using FSAFDO or CSSPGO, and True when using IRPGO. Previously the value is negated. Secondly, when !HasAccurateProfile && !Count.has_value() (although the latter is unlikely), this needs to return false. Whereas when HasAccurateProfile && !Count.has_value(), this returns true. So I have to guard it with HasAccureateProfile. Thirdly, the next line "if (!Count)" is not testing "if (Count !=0)", it is testing "if(!Count.has_value())", it is little bit misleading, so I changed it to if(!Count.has_value()), so it is clear why this is guarded under HasAccurateProfile. Hope this is clearer. shenhan: Thanks. I made the following change: Firstly, per hoy's comment, I replaced "if…
		wenleiUnsubmitted Not Done Reply Inline Actions If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit. Would it work if we just make a simple and explicit change: if (!Count) return true; --> // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. if (!Count.has_value()) { return HasAccurateProfile; } wenlei: If the key difference is how we treat unknown/missing counts, I'd suggest let's make it…
		davidxlUnsubmitted Done Reply Inline Actions If the key difference is how we treat unknown/missing counts, I'd suggest let's make it explicit. Would it work if we just make a simple and explicit change: if (!Count) return true; --> // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. if (!Count.has_value()) { return HasAccurateProfile; } Sounds reasonable. Missing count in IRPGO does mean very cold (module not even linked into the binary), but for sample PGO, it means unknown -- e.g. new code added which is not sampled. In this sense, their handling should be different. davidxl: > If the key difference is how we treat unknown/missing counts, I'd suggest let's make it…
		shenhanAuthorUnsubmitted Done Reply Inline Actions // Treat unknown/missing counts as cold if profile is accurate, but not if profile is inaccurate. if (!Count.has_value()) { return HasAccurateProfile; } I see the point here. However for NonAccurateProfile, we test that Count < ColdCountThreshold, whereas for AccurateProfile, we test that using "PSI->isColdCountNthPercentile(PercentileCutoff, Count);",the former is more rigorous, resulting in less cold block being split than the HasAccurateProfile. With this difference, we still need to set differentiate these 2 cases by "if (HasAccureateProfile) { } else {}" like below: if (!Count.has_value()) { return HasAccurateProfile; } if (!HasAccureateProfile) { return Count < ColdCountThreshold; } if (PercentileCutoff > 0) { return PSI->isColdCountNthPercentile(PercentileCutoff, Count); } return (Count < ColdCountThreshold); // end of function Do you think we shall use the above version or keep the current one? shenhan: ``` // Treat unknown/missing counts as cold if profile is accurate, but not if profile is…
		wenleiUnsubmitted Done Reply Inline Actions Can we unify `Count < ColdCountThreshold` and `PSI->isColdCountNthPercentile(PercentileCutoff, Count)` between the two cases? Is the different handling really necessary - i.e. without the differentiation, sample PGO with MFS would not perform as good? I would keep it simple/unified unless the different handling is perf critical. wenlei: Can we unify `*Count < ColdCountThreshold` and `PSI->isColdCountNthPercentile(PercentileCutoff…
		xurUnsubmitted Done Reply Inline Actions They are just two different parameters used in performance tuning: one local to this file and one from PSI. ColdCountThreshold is local to this file and will not affect other passes. I think it's harmless to keep as it is just for perf turning point of view. xur: They are just two different parameters used in performance tuning: one local to this file and…
		wenleiUnsubmitted Not Done Reply Inline Actions For long term maintainability I still hope we can keep things as simple as possible, as unified as possible. It just seems arbitrary to use `Count < ColdCountThreshold` for sample PGO, and `PSI->isColdCountNthPercentile(PercentileCutoff, Count)` for IRPGO. If we really want to use different value, we could also use `PSI->iisColdCountNthPercentile` for both, but with different percentile cutoff. wenlei: For long term maintainability I still hope we can keep things as simple as possible, as unified…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Revised the code so now it unifies both "HasAccurateProfile" and "no-HasAccurateProfile". The usage of ColdCountThreshold and PSI under both cases are now combined (keep unchanged). And now for MFS-on-FSAFDO, we need to pass -mllvm -mfs-percentile-cutoff=0 to keep the original logic unchanged. shenhan: Revised the code so now it unifies both "HasAccurateProfile" and "no-HasAccurateProfile". The…
		// coldness".
if (!Count)		if (!Count)
return true;		return HasAccurateProfile;

if (PercentileCutoff > 0) {		if (PercentileCutoff > 0)
		hoyUnsubmitted Not Done Reply Inline Actions Looks like this is the logic the `HasAccurateSampleProfile` check is bypassing. Is the `PercentileCutoff` count is too high in your case? IIUC, you want more blocks to be treated as cold to lower the splitting cost. A higher cutoff count should help. Am I missing anything? BTW, a more accurate profile should be more trustful, and thus the existing logic should work better. The term `HasAccurateSampleProfile` to bypass the existing logic here is a bit confusing to me. hoy: Looks like this is the logic the `HasAccurateSampleProfile` check is bypassing. Is the…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Yes, you are right. Sorry, this is a bit confusing. The origin name was IsUsingFSAFDO, and was later renamed to HasAccurateSampleProfile per reviewer's suggestion. And it should equal to the negated value of IsUsingFSAFDO. So replaced HasAccurateProfile with !HasAccurateProfile and negated the initial value of HasAccurateProfile and updated the comment. shenhan: Yes, you are right. Sorry, this is a bit confusing. The origin name was IsUsingFSAFDO, and was…
		hoyUnsubmitted Not Done Reply Inline Actions Thanks for the clarification. Another question, is it intentional to always bypass `PercentileCutoff` and use `ColdCountThreshold` instead for FS-AFDO? hoy: Thanks for the clarification. Another question, is it intentional to always bypass…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Yes, currently for FSAFDO, we always use ColdCountThreshold instead of PercentileCutoff. shenhan: Yes, currently for FSAFDO, we always use ColdCountThreshold instead of PercentileCutoff.
return PSI->isColdCountNthPercentile(PercentileCutoff, *Count);		return PSI->isColdCountNthPercentile(PercentileCutoff, *Count);
}
return (*Count < ColdCountThreshold);		return (*Count < ColdCountThreshold);
		snehasishUnsubmitted Done Reply Inline Actions Maybe merge this comment with the one above and drop the braces? snehasish: Maybe merge this comment with the one above and drop the braces?
		shenhanAuthorUnsubmitted Done Reply Inline Actions Revised the code and now it is more compact, and both "HasAccurateProfile" and no-AccurateProfile are using the same logic. Comment is also revised. shenhan: Revised the code and now it is more compact, and both "HasAccurateProfile" and no…
}		}

bool MachineFunctionSplitter::runOnMachineFunction(MachineFunction &MF) {		bool MachineFunctionSplitter::runOnMachineFunction(MachineFunction &MF) {
// We target functions with profile data. Static information in the form		// We target functions with profile data. Static information in the form
// of exception handling code may be split to cold if user passes the		// of exception handling code may be split to cold if user passes the
// mfs-split-ehcode flag.		// mfs-split-ehcode flag.
bool UseProfileData = MF.getFunction().hasProfileData();		bool UseProfileData = MF.getFunction().hasProfileData();
if (!UseProfileData && !SplitAllEHCode)		if (!UseProfileData && !SplitAllEHCode)
Show All 19 Lines	bool MachineFunctionSplitter::runOnMachineFunction(MachineFunction &MF) {
// sortBasicBlocksAndUpdateBranches uses the numeric identifier to sort		// sortBasicBlocksAndUpdateBranches uses the numeric identifier to sort
// blocks. Preserving the order of blocks is essential to retaining decisions		// blocks. Preserving the order of blocks is essential to retaining decisions
// made by prior passes such as MachineBlockPlacement.		// made by prior passes such as MachineBlockPlacement.
MF.RenumberBlocks();		MF.RenumberBlocks();
MF.setBBSectionsType(BasicBlockSection::Preset);		MF.setBBSectionsType(BasicBlockSection::Preset);

MachineBlockFrequencyInfo *MBFI = nullptr;		MachineBlockFrequencyInfo *MBFI = nullptr;
ProfileSummaryInfo *PSI = nullptr;		ProfileSummaryInfo *PSI = nullptr;
		// Whether this pass is using FSAFDO profile (not accurate) or IRPGO
		snehasishUnsubmitted Done Reply Inline Actions This should be `CandidateForSplit`. Also since if CandidateForSplit is false, we can just return early from L166 since we are not going to split anything in this function. Then you can remove the use of this var from the pass altogether? snehasish: This should be `CandidateForSplit`. Also since if CandidateForSplit is false, we can just…
		shenhanAuthorUnsubmitted Done Reply Inline Actions This should be CandidateForSplit. Ah, yes. Then you can remove the use of this var from the pass altogether? Maybe no, because this pass later also splits ALL eh code later. if (SplitAllEHCode) setDescendantEHBlocksCold(MF); shenhan: > This should be CandidateForSplit. Ah, yes. > Then you can remove the use of this var from…
		snehasishUnsubmitted Done Reply Inline Actions Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can think of something. snehasish: Hmm, I wonder if we can restructure this to simplify the conditions. I'll respond here if I can…
		wenleiUnsubmitted Done Reply Inline Actions Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make the code cleaner. wenlei: Can we hoist (or tail dup) `setDescendantEHBlocksCold` and then early return? That would make…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads". shenhan: Done by tail-duping "setDescendantEHBlockCold" with extracted helper function…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Done by tail-duping "setDescendantEHBlockCold" with extracted helper function "finishAdjustingBasicBLocksAndLandingPads". shenhan: Done by tail-duping "setDescendantEHBlockCold" with extracted helper function…
		// (accurate). HasAccurateProfile is only used when UseProfileData is true,
		// but giving it a default value to silent any possible warning.
		bool HasAccurateProfile = false;
if (UseProfileData) {		if (UseProfileData) {
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
		// "HasAccurateProfile" is false for FSAFDO, true when using IRPGO
		wenleiUnsubmitted Done Reply Inline Actions Why this heuristic isn't applicable to IRPGO profile? wenlei: Why this heuristic isn't applicable to IRPGO profile?
		shenhanAuthorUnsubmitted Done Reply Inline Actions Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot functions where there are plenty of samples, not so if a function only has a few samples. And to be conservative, we only choose to apply split for the former hot functions. Whereas for IRPGO, we trust it when it says this function is cold, and we can proceed to split that function. (Although doing split in cold functions for IRPGO would not make a lot difference, but still, it may have some benefits.) shenhan: Yes, the reasoning is similar to above. For FSAFDO, we have much higher confidence in hot…
		wenleiUnsubmitted Done Reply Inline Actions Makes sense, would be good to add a comment to explain. wenlei: Makes sense, would be good to add a comment to explain.
		snehasishUnsubmitted Done Reply Inline Actions nit: accurate profile or HasAccurateProfile if you want to refer to the flag. Same below. snehasish: nit: accurate profile or HasAccurateProfile if you want to refer to the flag. Same below.
		// (traditional instrumented FDO) or CSPGO profiles.
		snehasishUnsubmitted Done Reply Inline Actions `const LoopInfo& LI`? No need to take the address and then deref it in each use below. snehasish: `const LoopInfo& LI`? No need to take the address and then deref it in each use below.
		shenhanAuthorUnsubmitted Done Reply Inline Actions Done. shenhan: Done.
		HasAccurateProfile =
		snehasishUnsubmitted Done Reply Inline Actions nit: other hand snehasish: nit: other hand
		PSI->hasInstrumentationProfile() \|\| PSI->hasCSInstrumentationProfile();
		snehasishUnsubmitted Done Reply Inline Actions I think this new (and the one below) is leaking memory. Maybe just use vars on the stack like -- const BranchProbabilityInfo BPI(MF.getFunction(), LI); BlockFrequencyInfo BFI(MF.getFunction(), BPI, LI); snehasish: I think this new (and the one below) is leaking memory. Maybe just use vars on the stack like…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Done. shenhan: Done.
		snehasishUnsubmitted Done Reply Inline Actions Change BranchProbabilityInfo on L163 too? snehasish: Change BranchProbabilityInfo on L163 too?
		// If HasAccurateProfile is false, we only trust hot functions,
		wenleiUnsubmitted Not Done Reply Inline Actions We don't need to go through BPI->BFI all over again just to use `PSI->isFunctionHotInCallGraph`, which is more expensive. Instead, we can implement this directly with MBFI. MachineSizeOpts has the same need and has implemented `machine_size_opts_detail::isFunctionColdInCallGraph`, `machine_size_opts_detail::isFunctionHotInCallGraphNthPercentile` already. Now that we have a second use, it makes sense to refactor these into common MIR hotness helpers. wenlei: We don't need to go through BPI->BFI all over again just to use `PSI->isFunctionHotInCallGraph`…
		shenhanAuthorUnsubmitted Done Reply Inline Actions This is being addressed in D152758. I'll keep it as is for now (after addressing Snehasish's comments). Once D152758 is submitted, I'll change this accordingly. shenhan: This is being addressed in D152758. I'll keep it as is for now (after addressing Snehasish's…
		wenleiUnsubmitted Not Done Reply Inline Actions Once D152758 is submitted, I'll change this accordingly. Since you stacked the patches already, you can make changes here already. It will make code cleaner and review easier. wenlei: > Once D152758 is submitted, I'll change this accordingly. Since you stacked the patches…
		// which have many samples, and consider them as split
		// candidates. On the other hand, if HasAccurateProfile (likeIRPGO), we
		// trust both cold and hot functions.
		if (!HasAccurateProfile && !PSI->isFunctionHotInCallGraph(&MF, *MBFI)) {
		// Split all EH code and it's descendant statically by default.
		if (SplitAllEHCode)
		setDescendantEHBlocksCold(MF);
		finishAdjustingBasicBlocksAndLandingPads(MF);
		return true;
		}
}		}

SmallVector<MachineBasicBlock *, 2> LandingPads;		SmallVector<MachineBasicBlock *, 2> LandingPads;
for (auto &MBB : MF) {		for (auto &MBB : MF) {
if (MBB.isEntryBlock())		if (MBB.isEntryBlock())
continue;		continue;
		wenleiUnsubmitted Done Reply Inline Actions Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true? Can we move this up to line 177, and do the following directly? if (UseProfileData) { MBFI = &getAnalysis<MachineBlockFrequencyInfo>(); PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI(); // Comments .. if (!HasAccurateProfile && !PSI->isFunctionHotInCallGraph(&MF.getFunction(), MBFI)) { // Split all EH code and it's descendant statically by default. if (SplitAllEHCode) setDescendantEHBlocksCold(MF); finishAdjustingBasicBlocksAndLandingPads(MF); return true; } } wenlei: Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true?
		shenhanAuthorUnsubmitted Done Reply Inline Actions Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true? Actually no. When UseProfileData == false, candidateForSplit will always be false (regardless of what HasAccurateProfile is), so it will trigger an early return in code block 182. Can we move this up to line 177, and do the following directly? Yes, this reads more nature. shenhan: > Does this change the behavior for UseProfileData == false, even if HasAccurateProfile == true?

if (MBB.isEHPad())		if (MBB.isEHPad())
LandingPads.push_back(&MBB);		LandingPads.push_back(&MBB);
else if (UseProfileData && isColdBlock(MBB, MBFI, PSI) && !SplitAllEHCode)		else if (UseProfileData &&
		isColdBlock(MBB, MBFI, PSI, HasAccurateProfile) && !SplitAllEHCode)
MBB.setSectionID(MBBSectionID::ColdSectionID);		MBB.setSectionID(MBBSectionID::ColdSectionID);
}		}

// Split all EH code and it's descendant statically by default.		// Split all EH code and it's descendant statically by default.
if (SplitAllEHCode)		if (SplitAllEHCode)
setDescendantEHBlocksCold(MF);		setDescendantEHBlocksCold(MF);
// We only split out eh pads if all of them are cold.		// We only split out eh pads if all of them are cold.
else {		else {
		// Here we have UseProfileData == true.
bool HasHotLandingPads = false;		bool HasHotLandingPads = false;
for (const MachineBasicBlock *LP : LandingPads) {		for (const MachineBasicBlock *LP : LandingPads) {
if (!isColdBlock(*LP, MBFI, PSI))		if (!isColdBlock(*LP, MBFI, PSI, HasAccurateProfile))
HasHotLandingPads = true;		HasHotLandingPads = true;
}		}
if (!HasHotLandingPads) {		if (!HasHotLandingPads) {
for (MachineBasicBlock *LP : LandingPads)		for (MachineBasicBlock *LP : LandingPads)
LP->setSectionID(MBBSectionID::ColdSectionID);		LP->setSectionID(MBBSectionID::ColdSectionID);
}		}
}		}
auto Comparator = [](const MachineBasicBlock &X, const MachineBasicBlock &Y) {
return X.getSectionID().Type < Y.getSectionID().Type;		finishAdjustingBasicBlocksAndLandingPads(MF);
};
llvm::sortBasicBlocksAndUpdateBranches(MF, Comparator);
llvm::avoidZeroOffsetLandingPad(MF);
return true;		return true;
}		}

void MachineFunctionSplitter::getAnalysisUsage(AnalysisUsage &AU) const {		void MachineFunctionSplitter::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<MachineModuleInfoWrapperPass>();		AU.addRequired<MachineModuleInfoWrapperPass>();
		wenleiUnsubmitted Done Reply Inline Actions This is no longer needed, now that we use MBFI directly. wenlei: This is no longer needed, now that we use MBFI directly.
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
}		}

char MachineFunctionSplitter::ID = 0;		char MachineFunctionSplitter::ID = 0;
INITIALIZE_PASS(MachineFunctionSplitter, "machine-function-splitter",		INITIALIZE_PASS(MachineFunctionSplitter, "machine-function-splitter",
"Split machine functions using profile information", false,		"Split machine functions using profile information", false,
false)		false)

MachineFunctionPass *llvm::createMachineFunctionSplitterPass() {		MachineFunctionPass *llvm::createMachineFunctionSplitterPass() {
return new MachineFunctionSplitter();		return new MachineFunctionSplitter();
}		}
		snehasishUnsubmitted Done Reply Inline Actions Do we need to pass this in explicitly anymore? Can we instead infer this locally from ProfileSummaryInfo in runOnMachineFunction? I think the following is equivalent to the current behaviour bool HasAccurateProfile = PSI->hasInstrumentationProfile() \|\| PSI->hasCSInstrumentationProfile(); snehasish: Do we need to pass this in explicitly anymore? Can we instead infer this locally from…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Good point. Removed. shenhan: Good point. Removed.

llvm/test/CodeGen/X86/machine-function-splitter.ll

; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s -check-prefix=MFS-DEFAULTS		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s -check-prefix=MFS-DEFAULTS
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-psi-cutoff=0 -mfs-count-threshold=2000 \| FileCheck %s --dump-input=always -check-prefix=MFS-OPTS1		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-psi-cutoff=0 -mfs-count-threshold=2000 \| FileCheck %s --dump-input=always -check-prefix=MFS-OPTS1
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-psi-cutoff=950000 \| FileCheck %s -check-prefix=MFS-OPTS2		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-psi-cutoff=950000 \| FileCheck %s -check-prefix=MFS-OPTS2
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-split-ehcode \| FileCheck %s -check-prefix=MFS-EH-SPLIT		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -split-machine-functions -mfs-split-ehcode \| FileCheck %s -check-prefix=MFS-EH-SPLIT
		; RUN: sed 's/InstrProf/SampleProfile/g' %s > %t.ll
		snehasishUnsubmitted Done Reply Inline Actions Since we changed the code to use PSI and L417 in the test defines profile format to be "InstrProf" I expected some changes to this test. Assuming that FSAFDO sets the profile type to "SampleProfile" [1], we can replace the "InstrProf" token using sed prior to passing this bitcode to llc and checking the output. So the invocation would look like -- RUN: sed 's/InstrProf/SampleProfile/g' %s > %t.ll RUN: llc < %t.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s --check-prefix=FSAFDO-MFS There are quite a few uses of this pattern in tests in LLVM already[2]. [1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/ProfileSummary.cpp#L82 [2] https://github.com/search?q=repo%3Allvm%2Fllvm-project+path%3Atest+path%3A.ll+content%3A%22+sed+%22&type=code snehasish: Since we changed the code to use PSI and L417 in the test defines profile format to be…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Thanks, the sed .ll file is a trick didn't think of. And thanks for the reference. shenhan: Thanks, the sed .ll file is a trick didn't think of. And thanks for the reference.
		; RUN: llc < %t.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s --check-prefix=FSAFDO-MFS
		; RUN: llc < %t.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions \| FileCheck %s --check-prefix=FSAFDO-MFS2

define void @foo1(i1 zeroext %0) nounwind !prof !14 !section_prefix !15 {		define void @foo1(i1 zeroext %0) nounwind !prof !14 !section_prefix !15 {
;; Check that cold block is moved to .text.split.		;; Check that cold block is moved to .text.split.
; MFS-DEFAULTS-LABEL: foo1		; MFS-DEFAULTS-LABEL: foo1
; MFS-DEFAULTS: .section .text.split.foo1		; MFS-DEFAULTS: .section .text.split.foo1
; MFS-DEFAULTS-NEXT: foo1.cold:		; MFS-DEFAULTS-NEXT: foo1.cold:
; MFS-DEFAULTS-NOT: callq bar		; MFS-DEFAULTS-NOT: callq bar
; MFS-DEFAULTS-NEXT: callq baz		; MFS-DEFAULTS-NEXT: callq baz
br i1 %0, label %2, label %4, !prof !17		br i1 %0, label %2, label %4, !prof !17
		snehasishUnsubmitted Done Reply Inline Actions Leftover empty line after moving the code? snehasish: Leftover empty line after moving the code?
		shenhanAuthorUnsubmitted Done Reply Inline Actions Deleted line. shenhan: Deleted line.

2: ; preds = %1		2: ; preds = %1
		snehasishUnsubmitted Done Reply Inline Actions Lets move these to L377 (into the body of foo15) so that they are near the test case like the others in this file. nit: Also the last test uses foo13, so foo14 would be the next one not foo15. snehasish: Lets move these to L377 (into the body of foo15) so that they are near the test case like the…
		shenhanAuthorUnsubmitted Done Reply Inline Actions Thanks. Done moving and change foo15 -> foo14. shenhan: Thanks. Done moving and change foo15 -> foo14.
%3 = call i32 @bar()		%3 = call i32 @bar()
br label %6		br label %6

4: ; preds = %1		4: ; preds = %1
%5 = call i32 @baz()		%5 = call i32 @baz()
br label %6		br label %6

6: ; preds = %4, %2		6: ; preds = %4, %2
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	6: ; preds = %4, %2
%7 = tail call i32 @qux()		%7 = tail call i32 @qux()
br i1 %0, label %2, label %try.cont, !prof !17		br i1 %0, label %2, label %try.cont, !prof !17

try.cont: ; preds = %entry		try.cont: ; preds = %entry
%8 = call i32 @bam()		%8 = call i32 @bam()
ret i32 %8		ret i32 %8
}		}

		define void @foo14(i1 zeroext %0, i1 zeroext %1) nounwind !prof !24 {
		; FSAFDO-MFS: .section .text.split.foo14,"ax"
		; FSAFDO-MFS: foo14.cold:
		br i1 %0, label %3, label %7, !prof !25

		3:
		%4 = call i32 @bar()
		br label %7

		5:
		%6 = call i32 @baz()
		br label %7

		7:
		br i1 %1, label %8, label %10, !prof !26

		8:
		%9 = call i32 @bam()
		br label %12

		10:
		%11 = call i32 @baz()
		br label %12

		12:
		%13 = tail call i32 @qux()
		ret void
		}

		define void @foo15(i1 zeroext %0, i1 zeroext %1) nounwind !prof !27 {
		;; HasAccurateProfile is false, foo15 is hot, but no profile data for
		;; blocks, no split should happen.
		snehasishUnsubmitted Done Reply Inline Actions typo: happen snehasish: typo: happen
		; FSAFDO-MFS2-NOT: .section .text.split.foo15,"ax"
		; FSAFDO-MFS2-NOT: foo15.cold:
		br i1 %0, label %3, label %7

		3:
		%4 = call i32 @bar()
		br label %7

		5:
		%6 = call i32 @baz()
		br label %7

		7:
		br i1 %1, label %8, label %10

		8:
		%9 = call i32 @bam()
		br label %12

		10:
		%11 = call i32 @baz()
		br label %12

		12:
		%13 = tail call i32 @qux()
		ret void
		}

declare i32 @bar()		declare i32 @bar()
declare i32 @baz()		declare i32 @baz()
declare i32 @bam()		declare i32 @bam()
declare i32 @qux()		declare i32 @qux()
declare void @_Z1fv()		declare void @_Z1fv()
declare i32 @__gxx_personality_v0(...)		declare i32 @__gxx_personality_v0(...)

@_ZTIi = external constant ptr		@_ZTIi = external constant ptr
Show All 20 Lines
!16 = !{!"function_section_prefix", !"unlikely"}		!16 = !{!"function_section_prefix", !"unlikely"}
!17 = !{!"branch_weights", i32 7000, i32 0}		!17 = !{!"branch_weights", i32 7000, i32 0}
!18 = !{!"branch_weights", i32 3000, i32 4000}		!18 = !{!"branch_weights", i32 3000, i32 4000}
!19 = !{!"branch_weights", i32 1000, i32 6000}		!19 = !{!"branch_weights", i32 1000, i32 6000}
!20 = !{!"function_entry_count", i64 10000}		!20 = !{!"function_entry_count", i64 10000}
!21 = !{!"branch_weights", i32 6000, i32 4000}		!21 = !{!"branch_weights", i32 6000, i32 4000}
!22 = !{!"branch_weights", i32 80, i32 9920}		!22 = !{!"branch_weights", i32 80, i32 9920}
!23 = !{!"function_entry_count", i64 7}		!23 = !{!"function_entry_count", i64 7}
		!24 = !{!"function_entry_count", i64 10000}
		!25 = !{!"branch_weights", i32 0, i32 7000}
		!26 = !{!"branch_weights", i32 1000, i32 6000}
		!27 = !{!"function_entry_count", i64 10000}
		snehasishUnsubmitted Not Done Reply Inline Actions nit: add newline snehasish: nit: add newline

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 538864

llvm/lib/CodeGen/MachineFunctionSplitter.cpp

llvm/test/CodeGen/X86/machine-function-splitter.ll

[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
ClosedPublic