This is an archive of the discontinued LLVM Phabricator instance.

Disable hoisting MI to hotter basic blocks
ClosedPublic

Authored by NeHuang on Jun 21 2019, 2:57 PM.

Details

Summary

In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to. There is a chance that instruction is hoisted from a cold to a hot basic block.

In this patch, we add options to disable machine instruction hoisting if destination block is hotter.

Diff Detail

Event Timeline

NeHuang created this revision.Jun 21 2019, 2:57 PM

Added a couple more reviewers that either recently modified this file or I think may be interested in the proposed change.

In the future, please try to ping more frequently than once a month; we don't want to lose patches, even if there are some areas where it takes a while to get review.

The general idea of avoiding hoisting that would increase the frequency makes sense.

Could you give some idea what effect this has in practice? How often does this trigger on SPEC? Does it affect SPEC performance or size numbers? (Replace SPEC with some similar benchmark if you prefer.)

Is there anything special about the number "100"?

NeHuang added a comment.EditedAug 15 2019, 3:42 PM

Thanks @efriedma for the review and remind.

This patch adds an option to disable non-profitable hoisting from a cold to a hot basic block.

The stats below is collected based on Google benchmark. (The 3rd column indicates number of hoisting is disabled due to block frequency hotness, results varies from the loads of different benchmarks)

BenchmarkNumber of machine instructions hoisted out of loopsNumber of instructions not hoisted due to block frequency hotness
Protobuf209311706
Brotli993517616
Gipfeli471879
snappy8861154
TCMalloc6435891
BoringSSL6115501
FFMPEG832417

With the patch enabled, we found SPEC2017 performance gain for PPC.

'100' is the block-freq-ratio-threshold used in our experiment to determine when to disable the hoisting, which is a configurable option.

Whitney added inline comments.Aug 21 2019, 6:27 AM
llvm/lib/CodeGen/MachineLICM.cpp
53 ↗(On Diff #206082)

Please run clang-format on your change. The include is not added at the right place.

1585 ↗(On Diff #206082)

Change to ++ NumNotHoistedDueToHotness

NeHuang updated this revision to Diff 216684.Aug 22 2019, 12:06 PM
NeHuang marked 4 inline comments as done.Aug 22 2019, 12:09 PM
This comment was removed by NeHuang.
llvm/lib/CodeGen/MachineLICM.cpp
53 ↗(On Diff #206082)

Done.

1585 ↗(On Diff #206082)

Done

jsji added a comment.Aug 22 2019, 12:47 PM

machinelicm_clangformat_modv1.cpp ? I think you uploaded wrong patch.

NeHuang updated this revision to Diff 216693.Aug 22 2019, 12:55 PM
NeHuang marked 2 inline comments as done.

@jsji Yes, please ignore diff 216684. Thanks you for the remind.

Thanks @Whitney for the review. Addressed the comment and uploaded diff 216693.

lei added inline comments.Sep 11 2019, 7:28 AM
llvm/lib/CodeGen/MachineLICM.cpp
92 ↗(On Diff #216693)

indentation

111 ↗(On Diff #216693)

nit: please follow the same indentation as the lines above.

1572 ↗(On Diff #216693)

nit: Most of the functions here use either BB or MBB for MachineBasicBlock variables. IMHO unless there are diff BBs that are being tracked, I don't see why we need to specify this as "current"BB. Just makes it easier to read when similar variables are being used consistently.

Just passing-by remark: i find naming schemes in this patch very confusing:

  • The patch subject should talk about not hoisting into hotter blocks, not about early out
  • due to block hotness is very meaningless to me. I suspect most of the occurrences of that phrase should actually be closer to because destination block is hotter
llvm/lib/CodeGen/MachineLICM.cpp
81–82 ↗(On Diff #216693)

It isn't really obvious whether larger value relaxes this check or the other way around

NeHuang retitled this revision from Early exit from Hoist() in machine licm pass based on block hotness to Disable hosting MI to hotter basic blocks.Sep 12 2019, 6:44 PM
NeHuang edited the summary of this revision. (Show Details)Sep 12 2019, 6:46 PM
NeHuang updated this revision to Diff 220032.EditedSep 12 2019, 7:12 PM

Thanks @lei for the review. Updated the patch to r

  1. Resolve the indentation issues
  2. Modify the variable name to be consistent with current naming convention.

Thanks @lebedev.ri for your review. Subject and summary have been updated accordingly.
For the BlockFrequencyRatioThreshold, larger value will relax the check. For example, by default, it will disable MI hoisting if destination block is 100 times hotter than source block.

NeHuang marked 5 inline comments as done.Sep 12 2019, 7:15 PM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
92 ↗(On Diff #216693)

Done

111 ↗(On Diff #216693)

Done

NeHuang marked 5 inline comments as done.Sep 12 2019, 7:25 PM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
1572 ↗(On Diff #216693)

Done

asbirlea removed a subscriber: asbirlea.Sep 12 2019, 9:26 PM
NeHuang marked 2 inline comments as done.Sep 13 2019, 5:03 AM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
1572 ↗(On Diff #216693)

Done

nemanjai requested changes to this revision.Sep 13 2019, 7:41 AM

Some minor changes are required to make it more clear what this does.

llvm/lib/CodeGen/MachineLICM.cpp
85 ↗(On Diff #220032)

Similarly to below, the naming is a bit strange. Perhaps:

enum DisableHoistingToHotterBlocks { None, PGO, All };

or

enum DisableHoistingToHotterBlocks { None, PGO_Only, Static };

(although I prefer the former).

88 ↗(On Diff #220032)

The name sounds very strange. Perhaps:

DisableHoistingToHotterBlocks("disable-hoisting-to-hotter-blocks",
112 ↗(On Diff #220032)

"block frequency hotness" sounds strange. Please use either "hotness" or "block frequency".

275 ↗(On Diff #220032)

I think it is clearer to just use SrcBlock and TgtBlock for the names of the blocks.

1471 ↗(On Diff #220032)

Presumably this function is called many times for the same function and we are querying the function for profile data every time. Would it be possible to just have a class member flag for whether the function has profile data or not?

1581 ↗(On Diff #220032)

We don't want to increment the statistic here?

Honestly, I don't think we should be incrementing the statistic in this function at all - this is just a query for whether or not this is profitable. Strictly speaking, the increment to the statistic really belongs in the function that would actually do the hoisting (i.e. Hoist()).

1585 ↗(On Diff #220032)

If we move the update to the statistic out of this function, this can probably just be changed to:
return Ratio > BlockFrequencyRatioThreshold;

81–82 ↗(On Diff #216693)

We should probably be more forthcoming with this description so that it is clear to the user.
Perhaps:

Do not hoist instructions if the target block is N times hotter than the source.

Also, we should add a comment here along the lines of:

// The default threshold of 100 (i.e. if target block is 100 times hotter)
// is based on empirical data on a single target and is subject to tuning.
This revision now requires changes to proceed.Sep 13 2019, 7:41 AM
NeHuang updated this revision to Diff 221407.Sep 23 2019, 2:04 PM

Addressed @nemanjai 's review comment.

NeHuang marked 9 inline comments as done.Sep 23 2019, 2:14 PM

Thanks @nemanjai 's review. Addressed all the comments and updated the test cases accordingly.

llvm/lib/CodeGen/MachineLICM.cpp
85 ↗(On Diff #220032)
  • Using "DisableHositingToHotterBlocksType" for enum
  • Using"NONE" instead of "None" to avoid ambiguity.
NeHuang marked an inline comment as done.Sep 23 2019, 2:14 PM
NeHuang added a comment.EditedSep 23 2019, 2:27 PM

Collected stats below for SPECInt and SPECFP benchmarks (SPEC2017) with this feature enabled, compiled with -O3 and PGO.

Benchmark Number of machine instructions hoisted out of loops Number of instructions not hoisted due to block frequency
SPECInt129269459891
SPECFP3604770075

Collected stats below for SPECInt and SPECFP benchmarks (SPEC2017) with this feature enabled, compiled with -O3 and PGO.

Benchmark Number of machine instructions hoisted out of loops Number of instructions not hoisted due to block frequency
SPECInt129269459891
SPECFP3604770075

I think this might be a bit context-less, as in what are the numbers without the patch?

  • Merged to latest code base and compiled with -O3 and PGO
  • Collected stats for SPECInt and SPECFP benchmarks (SPEC2017) with baseline and patch.
Benchmark Number of machine instructions hoisted out of loops (Baseline) Number of machine instructions hoisted out of loops (Patch) Number of instructions not hoisted to hotter destination (Patch)
SPECInt206914132007476157
SPECFP425913638368215
  • With the feature enabled, found performance gain for SPEC benchmarks, e.g. 2.1% for perlbench_r and 1.6% for povray_r.
nemanjai accepted this revision.Oct 21 2019, 5:57 PM

With the minor comments addressed, this LGTM.
However before you update and commit, it would be good to hear from at least @lebedev.ri about whether his requests were satisfactorily met.

llvm/lib/CodeGen/MachineLICM.cpp
87 ↗(On Diff #221407)

It is common practice to differentiate enumerators in one of two ways:

  1. Prefix each enumerator with an acronym of the enumeration
  2. Use scoped enumerations

While we're at it, you might as well simplify the naming to just UseBFI
Examples (with simplified name):

  1. enum UseBFI { UseBFI_None, UseBFI_PGO, UseBFI_All }
  2. enum class UseBFI { None, PGO, All }

My preference is 2.

91 ↗(On Diff #221407)

Please replace all uses of the word hosting with the word hoisting since we are changing the decisions about hoisting instructions out of loops. In addition to correcting it in code, please correct it in the description, title, commit message, etc.

126 ↗(On Diff #221407)

Nit: naming convention (variables still start with uppercase letters).

278 ↗(On Diff #221407)

Nit: naming convention - functions start with a lowercase letter. And since this doesn't do any hoisting, I think a simpler name would be more appropriate. Perhaps

// Is the target basic block at least "BlockFrequencyRatioThreshold" times
// hotter than the source basic block.
bool isTgtHotterThanSrc(MachineBasicBlock *SrcBlock,
                        MachineBasicBlock *TgtBlock);
1578 ↗(On Diff #221407)

This indentation is suspicious. If that's where clang-format put this line, so be it, but it just seems odd.

This revision is now accepted and ready to land.Oct 21 2019, 5:57 PM
NeHuang retitled this revision from Disable hosting MI to hotter basic blocks to Disable hoisting MI to hotter basic blocks.Oct 22 2019, 10:06 AM
NeHuang updated this revision to Diff 226063.EditedOct 22 2019, 11:27 AM
NeHuang marked 5 inline comments as done.

Addressed review comments from @nemanjai

@lebedev.ri I have re-collected stats for SPECInt and SPECFP benchmarks with baseline and patch. Table summarized above.

Since the patch is approved, I will hold it for another 3 days if any comment and commit it on Monday.

This is disabled by default for non-PGO builds, right? Did you try to benchmark the patch for non PGO mode? There are various heuristic already in LLVM which are enabled for non-PGO mode too if they generally improve the perf.

Thanks @xbolva00 for your comment.

"This is disabled by default for non-PGO builds, right?"
Yes, it is disabled for non-pgo by default.

Did you try to benchmark the patch for non PGO mode? There are various heuristic already in LLVM which are enabled for non-PGO mode too if they generally improve the perf.
Yes, I did. For non-pgo mode, there are no significant improvements and one noticeable degradation for benchmark povray (4.7% degradation)

Ok, thanks for info!

This revision was automatically updated to reflect the committed changes.