Add BFI in constanthoisting pass and do the hoisting selectively
ClosedPublic

Authored by wmi on Jan 20 2017, 12:02 PM.

Details

Summary

Existing constant hoisting pass will merge a group of contants in a small range and hoist the const materialization code to the common dominator of their uses.

However, if the uses are all in cold pathes, we may hoist the materialization code from cold pathes to a hot place. This may hurt performance. The patch simply introduces BFI to the pass and does the hoisting only when the common dominator is colder than the sum of execution freq of all the uses to be merged.

Now usage of BFI is guarded by an option consthoist-with-block-frequency which is off by default. I did the compile time evaluation using spec2006 when the option was on and the compile time increase was in noise range: about 0.5% on averange (ranged from -2% ~ 2% for the 21 C/C++ benchmarks).

Diff Detail

Repository
rL LLVM
wmi created this revision.Jan 20 2017, 12:02 PM

Using profile data to throttle transformations in canonicalization passs is generally frowned upon. This pass does not look like a canonicalization pass (other reviewers may have different opinions). Having said that, I also wonder if a more general profile driven code sinking pass (before CGP), similar to look sink is worth the effort.

lib/Transforms/Scalar/ConstantHoisting.cpp
182 ↗(On Diff #85169)

This is a redundant check.

184 ↗(On Diff #85169)

Since we have profile information, it might be worth to find the optimal solution to the insertion problem -- for a given set of use BBs, find a second set of BBs such that

  1. a BB in the original set is either included in the second set or dominated by a BB
  2. the total frequency of second set is minimized.

The constant materialization instructions are inserted in the BBs in the solution set.

wmi added inline comments.Jan 22 2017, 11:22 AM
lib/Transforms/Scalar/ConstantHoisting.cpp
182 ↗(On Diff #85169)

Indeed.

184 ↗(On Diff #85169)

Sure. I will implement that if it is ok to use BFI in consthoisting.

wmi updated this revision to Diff 86085.Jan 27 2017, 11:33 AM

Update the patch according to suggestion from David: using BFI to find the optimal solution to the insertion problem.

davidxl added inline comments.Jan 27 2017, 1:54 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
164 ↗(On Diff #86085)

assert !BBs.count(Entry) is true

170 ↗(On Diff #86085)

It is simpler to document like this:

WorkSet includes any block 'BB' that is not strictly dominated by any other blocks in set 'BBs', and all nodes in the path in the dominator tree from Entry to 'BB'.

Also name WorkSet as Candidates?

178 ↗(On Diff #86085)

In the first iteration, this is always false, so perhaps change the loop into a do-while:

bool isCandidate = false;
do {

Path.insert(Node);
 if (Node == Entry) {
      isCandidate = true;
      break;
 }
 Node = getIDom(Node);

} while (!BBs.count(Node));

201 ↗(On Diff #86085)

early return when MadeChange == false?

231 ↗(On Diff #86085)

NodesInBBs can not be true for Entry node.

234 ↗(On Diff #86085)

Why return empty BBs?

wmi added inline comments.Jan 27 2017, 4:53 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
164 ↗(On Diff #86085)

Ok.

170 ↗(On Diff #86085)

Ok, will fix it.

178 ↗(On Diff #86085)

That looks simpler. Will fix it.

201 ↗(On Diff #86085)

MadeChange == false here just means we didn't find a BB from BBs dominated by another one in BBs, but we may still do hoisting work in the following, like hoist a const materialization out of loop.

234 ↗(On Diff #86085)

BBs return the locations where to insert hoisted const materialization. If we decide not to hoist anything, i.e., MadeChange is false, we will return empty BBs saying nothing will be inserted.

davidxl added inline comments.Jan 29 2017, 9:32 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
230 ↗(On Diff #86085)

The exit handling code can probably be simplified a little:

if (Node == Entry) {

   BBs.clear();
   if (InsertPtsFreq >= BFI.getBlockFreq(Node)) {
        BBs.insert(Entry);
         MadeChange = true;
    } else if (MadeChange) {
        BBs.insert(InsertPts.begin(), InsertPts.end());
    }
   break;
}
wmi updated this revision to Diff 86306.Jan 30 2017, 9:54 AM

Addressed David's comments.

davidxl added inline comments.Feb 1 2017, 4:52 PM
include/llvm/Transforms/Scalar/ConstantHoisting.h
130 ↗(On Diff #86306)

Add a documentation about what it returns and what it means when empty set is returned.

lib/Transforms/Scalar/ConstantHoisting.cpp
704 ↗(On Diff #86306)

It does not seem correct to skip base const materialization. Also the second condition should always be true?

wmi added inline comments.Feb 1 2017, 5:12 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
704 ↗(On Diff #86306)

You are right. I realize it is not right to skip base const materialization when we need to rebase some use. Will fix it.

The second condition is not necessarily true. We have multiple uses and multiple insertion points chosen. Every use will be dominated by exactly one insertion point but we don't know which one.

wmi updated this revision to Diff 86888.Feb 2 2017, 3:13 PM

Remove MadeChange. We will still need to insert const materialization code even if there is no hoisting or merging for BBs.

davidxl added inline comments.Feb 3 2017, 11:58 AM
lib/Transforms/Scalar/ConstantHoisting.cpp
673 ↗(On Diff #86888)

I am not sure why this is needed. If nothing is changed, it should be fine to skip the transformation for the constant (as well as rebase in uses) completely. Without skipping, it basically force rebasing?

678 ↗(On Diff #86888)

There are three cases:

  1. new insertion point introduced
  2. the BB originally has a base constant use
  3. the BB originally has a constant use but need rebase.

For case 1) and 3), new insertion is needed, but for 2) it can be skipped.

684 ↗(On Diff #86888)

The right way to fix is to move the rebasing code out of the IPSet traversal loop after all new base constant materialization insertion are done. Only those that got dropped in findConstantInsertionPoint need to be handled.

davidxl added inline comments.Feb 3 2017, 12:17 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
678 ↗(On Diff #86888)

Correction: 2) can only be skipped when there are no other uses dominated by it -- so it is probably better to not to skip and rewrite the original use

684 ↗(On Diff #86888)

Discard the second part of the last comment.

wmi added inline comments.Feb 3 2017, 5:16 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
678 ↗(On Diff #86888)

I thought it was not going to change the generated code if we do a rebase for a base constant use. That is why for the code simplicity, I choose to insert materialization and rebase code anyway.

It will change

BB:
%inc = add i64 %5, 214748364701

to

BB:
%const = bitcase i64 214748364701 to i64
...
%inc = add i64 %5, %const

Because 214748364701 is a very large number (The precondition it is treated as constant hoisting candidate) and it cannot be encoded in add instruction, we need the materialization code %const = ... anyway in the end.

davidxl added inline comments.Feb 5 2017, 2:48 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
678 ↗(On Diff #86888)

Ok makes sense.

694 ↗(On Diff #86888)

Should !BFI check removed? It just improves compile time a little for the non-default path. To save compile time for more generally, do 'if (IPSet->size() == 1 || DT->dominates(...)) instead?

Also is it better to keep track of the base constants rewritten and assert at the end that all uses are rewritten.

wmi added inline comments.Feb 5 2017, 11:01 PM
lib/Transforms/Scalar/ConstantHoisting.cpp
694 ↗(On Diff #86888)

Good suggestions. Fixed as suggested.

wmi updated this revision to Diff 87189.Feb 5 2017, 11:16 PM

Make change to save compile time and add assertion per David's suggestions.

davidxl accepted this revision.Apr 19 2017, 2:49 PM

lgtm

This revision is now accepted and ready to land.Apr 19 2017, 2:49 PM
This revision was automatically updated to reflect the committed changes.