This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining. This implementation is the optimistic approach
that Danny asked us to implement: we start from the set of all known equivalent
computations, and we discard unsafe hoisting situations.
Here is the algorithm Danny asked us to work on last time:
Make a multimap from VN to each expression with that VN (VNTable is not currently this) over the entire program. For each VN in the table: if (size (expressions with a given VN) > 1): For each block in domtree in DFS order: For each expression, if DFSin/DFSOut(expression->parent) within range of DFSin/DFSOut(block), push(possiblehoistlist (a list), expression (current expression), block (insertion point)) If you have 2 or more things on possiblehoistlist, calculate availability of operands for each expression in block (you can cache the highest point you can move them to for later to checking again and again. Since you are using dominance, you know they can only be hoisted to blocks dominated by the highest point you can hoist to). If 2 or more things are still available, hoist Note you can also likely skip any domtree block that does not have two or more children, i just haven't proven it to myself yet.
Because we "value number everything up front":
- we VN things that do not matter for hoisting,
- we VN all computations dependent on loads and that would forbid hoisting the whole dependence chain as each load gets a different VN (in the current GVN.cpp implementation.) The temporary solution is to hoist twice and invalidate the VN table in between.
The harder parts compared to the previous patch in http://reviews.llvm.org/D18798
are the detection of safety properties of hoisting:
- no side effects on all paths between insert point and the original location of all hoistable expressions: we use a traversal of the inverse CFG from the expression to be hoisted to the insertion point to gather all the BBs on the execution paths that we have to check for side-effects.
- to make hoisting efficient for scalars (and safe for hoisting load expressions) we need to prove that from the insertion point to the end of the function the expressions are needed on all paths.
There are still a few improvements to be implemented:
- using memory SSA as a minimal data dependence analysis to improve the accuracy of the side effects analysis in loops.
- computing hoisting subsets: for the moment we try to hoist all expressions with the same VN without checking whether a subset would be safe to hoist.
Passes llvm regression test, test-suite, and SPEC Cpu2006.
Over the c/c++ SPEC 2006 benchmarks the GVN-hoisting pass removes 55012 instructions.
- Without the patch: Number of call sites deleted, not inlined: 20394 Number of functions deleted because all callers found: 70497 Number of functions inlined: 182119 Number of allocas merged together: 225 Number of caller-callers analyzed: 200361 Number of call sites analyzed: 445806
- With the patch: Number of call sites deleted, not inlined: 20412 Number of functions deleted because all callers found: 70495 Number of functions inlined: 182122 Number of allocas merged together: 225 Number of caller-callers analyzed: 200360 Number of call sites analyzed: 445624 Number of hoisted instructions: 5435 Number of instructions removed: 55012
Spec2006 results are quite noisy:
(positive is an increase of the spec score in percent)
400.perlbench: 0 401.bzip2: 0 403.gcc: -2.00 429.mcf: -4.00 445.gobmk: 0 456.hmmer: 0 458.sjeng: 0 462.libquantum: -2.00 464.h264ref: 4.00 471.omnetpp: -1.00 473.astar: -1.00 433.milc: 15.00 444.namd: 0 447.dealII: 0 450.soplex: 0 453.povray: 0 470.lbm: 0 482.sphinx3: 1.00
Pass written by:
Sebastian Pop Aditya Kumar Xiaoyu Hu Brian Rzycki Daniel Berlin