Page MenuHomePhabricator

Disable hosting MI to hotter basic blocks
Needs ReviewPublic

Authored by NeHuang on Jun 21 2019, 2:57 PM.

Details

Summary

In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to. There is a chance that instruction is hoisted from a cold to a hot basic block.

In this patch, we add options to disable machine instruction hoisting if destination block is hotter.

Diff Detail

Event Timeline

NeHuang created this revision.Jun 21 2019, 2:57 PM

Added a couple more reviewers that either recently modified this file or I think may be interested in the proposed change.

In the future, please try to ping more frequently than once a month; we don't want to lose patches, even if there are some areas where it takes a while to get review.

The general idea of avoiding hoisting that would increase the frequency makes sense.

Could you give some idea what effect this has in practice? How often does this trigger on SPEC? Does it affect SPEC performance or size numbers? (Replace SPEC with some similar benchmark if you prefer.)

Is there anything special about the number "100"?

Thanks @efriedma for the review and remind.

This patch adds an option to disable non-profitable hoisting from a cold to a hot basic block.

The stats below is collected based on Google benchmark. (The 3rd column indicates number of hoisting is disabled due to block frequency hotness, results varies from the loads of different benchmarks)

BenchmarkNumber of machine instructions hoisted out of loopsNumber of instructions not hoisted due to block frequency hotness
Protobuf209311706
Brotli993517616
Gipfeli471879
snappy8861154
TCMalloc6435891
BoringSSL6115501
FFMPEG832417

With the patch enabled, we found SPEC performance improvement, e.g. perlbench speeding up by 2.5%.

'100' is the block-freq-ratio-threshold used in our experiment to determine when to disable the hoisting, which is a configurable option.

Whitney added inline comments.Aug 21 2019, 6:27 AM
llvm/lib/CodeGen/MachineLICM.cpp
53 ↗(On Diff #206082)

Please run clang-format on your change. The include is not added at the right place.

1585 ↗(On Diff #206082)

Change to ++ NumNotHoistedDueToHotness

NeHuang updated this revision to Diff 216684.Aug 22 2019, 12:06 PM
NeHuang marked 4 inline comments as done.Aug 22 2019, 12:09 PM
This comment was removed by NeHuang.
llvm/lib/CodeGen/MachineLICM.cpp
53 ↗(On Diff #206082)

Done.

1585 ↗(On Diff #206082)

Done

jsji added a comment.Aug 22 2019, 12:47 PM

machinelicm_clangformat_modv1.cpp ? I think you uploaded wrong patch.

NeHuang updated this revision to Diff 216693.Aug 22 2019, 12:55 PM
NeHuang marked 2 inline comments as done.

@jsji Yes, please ignore diff 216684. Thanks you for the remind.

Thanks @Whitney for the review. Addressed the comment and uploaded diff 216693.

lei added inline comments.Sep 11 2019, 7:28 AM
llvm/lib/CodeGen/MachineLICM.cpp
92 ↗(On Diff #216693)

indentation

111 ↗(On Diff #216693)

nit: please follow the same indentation as the lines above.

1572 ↗(On Diff #216693)

nit: Most of the functions here use either BB or MBB for MachineBasicBlock variables. IMHO unless there are diff BBs that are being tracked, I don't see why we need to specify this as "current"BB. Just makes it easier to read when similar variables are being used consistently.

Just passing-by remark: i find naming schemes in this patch very confusing:

  • The patch subject should talk about not hoisting into hotter blocks, not about early out
  • due to block hotness is very meaningless to me. I suspect most of the occurrences of that phrase should actually be closer to because destination block is hotter
llvm/lib/CodeGen/MachineLICM.cpp
81–82 ↗(On Diff #216693)

It isn't really obvious whether larger value relaxes this check or the other way around

NeHuang retitled this revision from Early exit from Hoist() in machine licm pass based on block hotness to Disable hosting MI to hotter basic blocks.Sep 12 2019, 6:44 PM
NeHuang edited the summary of this revision. (Show Details)Sep 12 2019, 6:46 PM
NeHuang updated this revision to Diff 220032.EditedSep 12 2019, 7:12 PM

Thanks @lei for the review. Updated the patch to r

  1. Resolve the indentation issues
  2. Modify the variable name to be consistent with current naming convention.

Thanks @lebedev.ri for your review. Subject and summary have been updated accordingly.
For the BlockFrequencyRatioThreshold, larger value will relax the check. For example, by default, it will disable MI hoisting if destination block is 100 times hotter than source block.

NeHuang marked 5 inline comments as done.Sep 12 2019, 7:15 PM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
92 ↗(On Diff #216693)

Done

111 ↗(On Diff #216693)

Done

NeHuang marked 5 inline comments as done.Sep 12 2019, 7:25 PM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
1572 ↗(On Diff #216693)

Done

asbirlea removed a subscriber: asbirlea.Sep 12 2019, 9:26 PM
NeHuang marked 2 inline comments as done.Sep 13 2019, 5:03 AM
NeHuang added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
1572 ↗(On Diff #216693)

Done

nemanjai requested changes to this revision.Sep 13 2019, 7:41 AM

Some minor changes are required to make it more clear what this does.

llvm/lib/CodeGen/MachineLICM.cpp
81–82 ↗(On Diff #216693)

We should probably be more forthcoming with this description so that it is clear to the user.
Perhaps:

Do not hoist instructions if the target block is N times hotter than the source.

Also, we should add a comment here along the lines of:

// The default threshold of 100 (i.e. if target block is 100 times hotter)
// is based on empirical data on a single target and is subject to tuning.
85 ↗(On Diff #220032)

Similarly to below, the naming is a bit strange. Perhaps:

enum DisableHoistingToHotterBlocks { None, PGO, All };

or

enum DisableHoistingToHotterBlocks { None, PGO_Only, Static };

(although I prefer the former).

88 ↗(On Diff #220032)

The name sounds very strange. Perhaps:

DisableHoistingToHotterBlocks("disable-hoisting-to-hotter-blocks",
112 ↗(On Diff #220032)

"block frequency hotness" sounds strange. Please use either "hotness" or "block frequency".

275 ↗(On Diff #220032)

I think it is clearer to just use SrcBlock and TgtBlock for the names of the blocks.

1471 ↗(On Diff #220032)

Presumably this function is called many times for the same function and we are querying the function for profile data every time. Would it be possible to just have a class member flag for whether the function has profile data or not?

1581 ↗(On Diff #220032)

We don't want to increment the statistic here?

Honestly, I don't think we should be incrementing the statistic in this function at all - this is just a query for whether or not this is profitable. Strictly speaking, the increment to the statistic really belongs in the function that would actually do the hoisting (i.e. Hoist()).

1585 ↗(On Diff #220032)

If we move the update to the statistic out of this function, this can probably just be changed to:
return Ratio > BlockFrequencyRatioThreshold;

This revision now requires changes to proceed.Sep 13 2019, 7:41 AM
NeHuang updated this revision to Diff 221407.Mon, Sep 23, 2:04 PM

Addressed @nemanjai 's review comment.

NeHuang marked 9 inline comments as done.Mon, Sep 23, 2:14 PM

Thanks @nemanjai 's review. Addressed all the comments and updated the test cases accordingly.

llvm/lib/CodeGen/MachineLICM.cpp
85 ↗(On Diff #220032)
  • Using "DisableHositingToHotterBlocksType" for enum
  • Using"NONE" instead of "None" to avoid ambiguity.
NeHuang marked an inline comment as done.Mon, Sep 23, 2:14 PM
NeHuang added a comment.EditedMon, Sep 23, 2:27 PM

Collected stats below for SPECInt and SPECFP benchmarks (SPEC2017) with this feature enabled, compiled with -O3 and PGO.

Benchmark Number of machine instructions hoisted out of loops Number of instructions not hoisted due to block frequency
SPECInt129269459891
SPECFP3604770075

Collected stats below for SPECInt and SPECFP benchmarks (SPEC2017) with this feature enabled, compiled with -O3 and PGO.

Benchmark Number of machine instructions hoisted out of loops Number of instructions not hoisted due to block frequency
SPECInt129269459891
SPECFP3604770075

I think this might be a bit context-less, as in what are the numbers without the patch?

  • Merged to latest code base and compiled with -O3 and PGO
  • Collected stats for SPECInt and SPECFP benchmarks (SPEC2017) with baseline and patch.
Benchmark Number of machine instructions hoisted out of loops (Baseline) Number of machine instructions hoisted out of loops (Patch) Number of instructions not hoisted to hotter destination (Patch)
SPECInt206914132007476157
SPECFP425913638368215
  • With the feature enabled, found performance gain for SPEC benchmarks, e.g. 2.1% for perlbench_r and 1.6% for povray_r.