[CloneFunction] Simplify previously unsimplifiable instructions
Needs ReviewPublic

Authored by anna on Sep 21 2018, 8:24 AM.

Details

Summary

Sometimes during clone function, we may not be able to simplify during
the first phase of cloning: CloneBlock. This is because the instruction
may not be tied to any basic block yet (as the case when cloning is
called during inlining).

This patch adds all instructions from the cloned function into the
second pass of simplification, which is done after the instructions are
tied to basic block. We reuse the second pass which was previously handling phi
nodes and their uses.

The benefit of this patch is to simplify the instructions that are
inlined from the callee to the caller, so that later passes after
inlining have fewer instructions to churn on (see test case for example).

Diff Detail

anna created this revision.Sep 21 2018, 8:24 AM

Is there any compile-time impact? Should we just simplify all instructions at this later point instead of having two passes?

anna added a comment.Sep 21 2018, 1:54 PM

Is there any compile-time impact?

I haven't checked the compile time impact. Will do. However, this is just a linear addition of instructions and it's uses - we were already looking at phis and their uses. Also, later passes (that iterate over the inlined instructions) will anyway iterate over all instructions. So, I think it should be either neutral or a win. Win when we were wastefully iterating over instructions through multiple passes until we reached the pass which could simplify these instructions.

Should we just simplify all instructions at this later point instead of having two passes?

I'll take a look, but it seems a more invasive change. We first cloneBlocks, then iterate over all instructions in the callee and insert it in the caller in right order (now the instructions have a parent block), and then have two passes over the PHINodes. The second pass over the PHINode already looks over the uses and does recursive simplification. To this list, now that all instructions have basic block parents, we add them for recursive simplification.

Is there any compile-time impact?

I haven't checked the compile time impact. Will do. However, this is just a linear addition of instructions and it's uses - we were already looking at phis and their uses. Also, later passes (that iterate over the inlined instructions) will anyway iterate over all instructions. So, I think it should be either neutral or a win. Win when we were wastefully iterating over instructions through multiple passes until we reached the pass which could simplify these instructions.

Should we just simplify all instructions at this later point instead of having two passes?

I'll take a look, but it seems a more invasive change. We first cloneBlocks, then iterate over all instructions in the callee and insert it in the caller in right order (now the instructions have a parent block), and then have two passes over the PHINodes. The second pass over the PHINode already looks over the uses and does recursive simplification. To this list, now that all instructions have basic block parents, we add them for recursive simplification.

That makes sense. SimplyInstruction can, potentially, get expensive. If there's no compile-time impact of the change, then this probably doesn't matter. If there does seem to be a compile-time impact, then you might want to look into making it, overall, use only one simplification phase.

anna added a comment.Thu, Sep 27, 9:10 AM

Results on CTMark does show negative compile time impact on couple of benchmarks - run on a haswell machine over 4 runs with and without patch:

Tests: 10
Metric: compile_time

Program                                         Base    Patch   diff 
                                                                 
Bullet/bullet                                  59.27  67.88 14.5%
tramp3d-v4/tramp3d-v4                          56.98  64.87 13.8%
7zip/7zip-benchmark                            91.63 103.92 13.4%
kimwitu++/kc                                   31.60  35.39 12.0%
lencod/lencod                                  41.75  43.57  4.4%
sqlite3/sqlite3                                40.47  40.02 -1.1%
SPASS/SPASS                                    34.11  34.33  0.6%
mafft/pairlocalalign                           23.81  23.71 -0.4%
consumer-typeset/consumer-typeset              26.28  26.22 -0.2%
ClamAV/clamscan                                37.51  37.54  0.1%

Maybe we're hitting on the degenerate case - Ran SimplifyInstruction, but no change in simplification. So future passes still had to iterate on same number of instructions...