Page MenuHomePhabricator

[NVPTX] run LSR before straight-line optimizations
ClosedPublic

Authored by jingyue on Jul 17 2015, 11:03 AM.

Details

Summary

Straight-line optimizations can simplify the loop body and make LSR's
cost analysis more precise. This significantly improves several Eigen3
CUDA benchmarks.

With this change, EigenContractionKernel runs up to 40% faster
(https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502).
EigenConvolutionKernel2D runs up to 10% faster
(https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605).

I have some difficulties writing small tests that benefit from this
reordering due to a seemingly issue with LSR (being discussed at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html).

See the review thread for the compilation time impact of GVN.

Diff Detail

Repository
rL LLVM

Event Timeline

jingyue updated this revision to Diff 30015.Jul 17 2015, 11:03 AM
jingyue retitled this revision from to [NVPTX] run LSR before straight-line optimizations.
jingyue updated this object.
jingyue added reviewers: jholewinski, eliben.
jingyue added a subscriber: llvm-commits.
jholewinski edited edge metadata.Jul 17 2015, 12:22 PM

Looks reasonable to me.

What is the impact on compile time of adding this extra GVN pass?

Below is the compilation time breakdown of running "opt -O3 and llc" on one of our GPU program. It leads to >100k lines of PTX.

This extra GVN takes 2.6% of the time. There are three GVNs in the list. The first one (4.8%) happens in the target-independent stage. The other two happen in NVPTX's private pipeline.

I'll add a check to enable it for -O3 only.

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 8.3537 seconds (8.3467 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.9868 ( 11.9%)   0.0161 ( 32.1%)   1.0029 ( 12.0%)   1.0048 ( 12.0%)  NVPTX DAG->DAG Pattern Instruction Selection
   0.9987 ( 12.0%)   0.0000 (  0.0%)   0.9987 ( 12.0%)   0.9995 ( 12.0%)  Straight line strength reduction
   0.4514 (  5.4%)   0.0000 (  0.0%)   0.4514 (  5.4%)   0.4490 (  5.4%)  Function Integration/Inlining
   0.4348 (  5.2%)   0.0000 (  0.0%)   0.4348 (  5.2%)   0.4354 (  5.2%)  Nary reassociation
   0.4033 (  4.9%)   0.0000 (  0.0%)   0.4033 (  4.8%)   0.4002 (  4.8%)  Global Value Numbering
   0.2823 (  3.4%)   0.0001 (  0.1%)   0.2824 (  3.4%)   0.2780 (  3.3%)  Combine redundant instructions
   0.2696 (  3.2%)   0.0002 (  0.3%)   0.2697 (  3.2%)   0.2647 (  3.2%)  Combine redundant instructions
   0.2423 (  2.9%)   0.0000 (  0.0%)   0.2423 (  2.9%)   0.2387 (  2.9%)  Combine redundant instructions
   0.2328 (  2.8%)   0.0000 (  0.1%)   0.2328 (  2.8%)   0.2291 (  2.7%)  Combine redundant instructions
   0.2232 (  2.7%)   0.0000 (  0.0%)   0.2232 (  2.7%)   0.2223 (  2.7%)  Global Value Numbering
   0.2123 (  2.6%)   0.0001 (  0.1%)   0.2124 (  2.5%)   0.2161 (  2.6%)  Global Value Numbering
   0.2000 (  2.4%)   0.0002 (  0.4%)   0.2001 (  2.4%)   0.1944 (  2.3%)  Loop Invariant Code Motion
   0.1929 (  2.3%)   0.0001 (  0.3%)   0.1931 (  2.3%)   0.1927 (  2.3%)  Combine redundant instructions
   0.1928 (  2.3%)   0.0000 (  0.0%)   0.1928 (  2.3%)   0.1924 (  2.3%)  Combine redundant instructions
   0.1907 (  2.3%)   0.0000 (  0.0%)   0.1907 (  2.3%)   0.1919 (  2.3%)  Value Propagation
   0.1759 (  2.1%)   0.0006 (  1.1%)   0.1764 (  2.1%)   0.1715 (  2.1%)  Induction Variable Simplification
   0.1735 (  2.1%)   0.0001 (  0.1%)   0.1736 (  2.1%)   0.1714 (  2.1%)  Loop Invariant Code Motion
   0.1671 (  2.0%)   0.0010 (  2.0%)   0.1682 (  2.0%)   0.1714 (  2.1%)  Combine redundant instructions
   0.1426 (  1.7%)   0.0000 (  0.0%)   0.1426 (  1.7%)   0.1415 (  1.7%)  Loop Invariant Code Motion
   0.1302 (  1.6%)   0.0001 (  0.2%)   0.1304 (  1.6%)   0.1302 (  1.6%)  Loop Strength Reduction
   0.1226 (  1.5%)   0.0000 (  0.0%)   0.1226 (  1.5%)   0.1287 (  1.5%)  Unroll loops
   0.1248 (  1.5%)   0.0002 (  0.4%)   0.1251 (  1.5%)   0.1269 (  1.5%)  SROA
   0.0994 (  1.2%)   0.0000 (  0.0%)   0.0994 (  1.2%)   0.0988 (  1.2%)  Value Propagation
   0.0991 (  1.2%)   0.0000 (  0.0%)   0.0991 (  1.2%)   0.0979 (  1.2%)  Combine redundant instructions
   0.0748 (  0.9%)   0.0039 (  7.8%)   0.0787 (  0.9%)   0.0752 (  0.9%)  Simple Register Coalescing
   0.0659 (  0.8%)   0.0000 (  0.0%)   0.0659 (  0.8%)   0.0664 (  0.8%)  Induction Variable Users
   0.0659 (  0.8%)   0.0038 (  7.6%)   0.0697 (  0.8%)   0.0633 (  0.8%)  Early CSE
   0.0639 (  0.8%)   0.0000 (  0.0%)   0.0639 (  0.8%)   0.0631 (  0.8%)  Sparse Conditional Constant Propagation
   0.0632 (  0.8%)   0.0000 (  0.1%)   0.0632 (  0.8%)   0.0625 (  0.7%)  Early CSE
   0.0579 (  0.7%)   0.0000 (  0.0%)   0.0579 (  0.7%)   0.0580 (  0.7%)  NVPTX Assembly Printer
   0.0565 (  0.7%)   0.0000 (  0.0%)   0.0565 (  0.7%)   0.0566 (  0.7%)  CodeGen Prepare
   0.0564 (  0.7%)   0.0000 (  0.0%)   0.0564 (  0.7%)   0.0553 (  0.7%)  Live Interval Analysis
   0.0574 (  0.7%)   0.0001 (  0.2%)   0.0575 (  0.7%)   0.0530 (  0.6%)  Early CSE
   0.0506 (  0.6%)   0.0000 (  0.1%)   0.0506 (  0.6%)   0.0468 (  0.6%)  Dead Store Elimination
   0.0437 (  0.5%)   0.0000 (  0.0%)   0.0437 (  0.5%)   0.0436 (  0.5%)  Dead Code Elimination
   0.0438 (  0.5%)   0.0002 (  0.3%)   0.0439 (  0.5%)   0.0424 (  0.5%)  Bit-Tracking Dead Code Elimination
   0.0428 (  0.5%)   0.0000 (  0.0%)   0.0428 (  0.5%)   0.0419 (  0.5%)  Machine Loop Invariant Code Motion
   0.0415 (  0.5%)   0.0000 (  0.0%)   0.0415 (  0.5%)   0.0405 (  0.5%)  Module Verifier
   0.0413 (  0.5%)   0.0000 (  0.0%)   0.0413 (  0.5%)   0.0401 (  0.5%)  SROA
   0.0367 (  0.4%)   0.0000 (  0.0%)   0.0367 (  0.4%)   0.0373 (  0.4%)  Machine Common Subexpression Elimination
   0.0352 (  0.4%)   0.0000 (  0.0%)   0.0352 (  0.4%)   0.0351 (  0.4%)  Interprocedural Sparse Conditional Constant Propagation
   0.0339 (  0.4%)   0.0000 (  0.0%)   0.0339 (  0.4%)   0.0342 (  0.4%)  Module Verifier
   0.0349 (  0.4%)   0.0001 (  0.2%)   0.0350 (  0.4%)   0.0318 (  0.4%)  Simplify the CFG
   0.0304 (  0.4%)   0.0000 (  0.0%)   0.0304 (  0.4%)   0.0299 (  0.4%)  Live Variable Analysis
   0.0278 (  0.3%)   0.0000 (  0.0%)   0.0278 (  0.3%)   0.0274 (  0.3%)  Reassociate expressions
   0.0263 (  0.3%)   0.0039 (  7.7%)   0.0301 (  0.4%)   0.0260 (  0.3%)  Aggressive Dead Code Elimination
   0.0215 (  0.3%)   0.0000 (  0.0%)   0.0215 (  0.3%)   0.0228 (  0.3%)  Jump Threading
   0.0211 (  0.3%)   0.0000 (  0.0%)   0.0211 (  0.3%)   0.0205 (  0.2%)  Split GEPs to a variadic base and a constant offset for better CSE
   0.0193 (  0.2%)   0.0000 (  0.0%)   0.0193 (  0.2%)   0.0200 (  0.2%)  Simplify the CFG
   0.0131 (  0.2%)   0.0038 (  7.6%)   0.0169 (  0.2%)   0.0175 (  0.2%)  Unnamed pass: implement Pass::getPassName()
   0.0169 (  0.2%)   0.0000 (  0.0%)   0.0170 (  0.2%)   0.0174 (  0.2%)  Rotate Loops
   0.0166 (  0.2%)   0.0000 (  0.0%)   0.0166 (  0.2%)   0.0170 (  0.2%)  convert address space of alloca'ed memory to local
   0.0148 (  0.2%)   0.0039 (  7.8%)   0.0188 (  0.2%)   0.0168 (  0.2%)  Lower aggregate copies/intrinsics into loops
   0.0168 (  0.2%)   0.0000 (  0.0%)   0.0168 (  0.2%)   0.0164 (  0.2%)  Machine code sinking
   0.0171 (  0.2%)   0.0000 (  0.0%)   0.0171 (  0.2%)   0.0160 (  0.2%)  Unroll loops
   0.0156 (  0.2%)   0.0000 (  0.0%)   0.0156 (  0.2%)   0.0160 (  0.2%)  Simplify the CFG
   0.0107 (  0.1%)   0.0000 (  0.0%)   0.0107 (  0.1%)   0.0154 (  0.2%)  Recognize loop idioms
   0.0141 (  0.2%)   0.0000 (  0.0%)   0.0141 (  0.2%)   0.0141 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0127 (  0.2%)   0.0000 (  0.1%)   0.0128 (  0.2%)   0.0127 (  0.2%)  Unnamed pass: implement Pass::getPassName()
   0.0099 (  0.1%)   0.0000 (  0.0%)   0.0099 (  0.1%)   0.0111 (  0.1%)  Jump Threading
   0.0111 (  0.1%)   0.0000 (  0.0%)   0.0111 (  0.1%)   0.0111 (  0.1%)  Remove unused exception handling info
   0.0105 (  0.1%)   0.0000 (  0.0%)   0.0106 (  0.1%)   0.0107 (  0.1%)  Simplify the CFG
   0.0101 (  0.1%)   0.0000 (  0.0%)   0.0101 (  0.1%)   0.0102 (  0.1%)  Simplify the CFG
   0.0095 (  0.1%)   0.0000 (  0.0%)   0.0095 (  0.1%)   0.0098 (  0.1%)  Float to int
   0.0066 (  0.1%)   0.0000 (  0.0%)   0.0066 (  0.1%)   0.0095 (  0.1%)  Tail Call Elimination
   0.0094 (  0.1%)   0.0000 (  0.0%)   0.0094 (  0.1%)   0.0094 (  0.1%)  Dead Global Elimination
   0.0051 (  0.1%)   0.0001 (  0.2%)   0.0052 (  0.1%)   0.0088 (  0.1%)  Simplify the CFG
   0.0060 (  0.1%)   0.0000 (  0.0%)   0.0060 (  0.1%)   0.0084 (  0.1%)  Loop-Closed SSA Form Pass
   0.0079 (  0.1%)   0.0000 (  0.0%)   0.0079 (  0.1%)   0.0082 (  0.1%)  Promote 'by reference' arguments to scalars
   0.0074 (  0.1%)   0.0000 (  0.0%)   0.0074 (  0.1%)   0.0074 (  0.1%)  Two-Address instruction pass
   0.0068 (  0.1%)   0.0000 (  0.0%)   0.0068 (  0.1%)   0.0073 (  0.1%)  Unswitch loops
   0.0033 (  0.0%)   0.0001 (  0.1%)   0.0034 (  0.0%)   0.0072 (  0.1%)  Dominator Tree Construction
   0.0031 (  0.0%)   0.0000 (  0.0%)   0.0031 (  0.0%)   0.0068 (  0.1%)  Deduce function attributes
   0.0048 (  0.1%)   0.0000 (  0.0%)   0.0048 (  0.1%)   0.0063 (  0.1%)  Lazy Value Information Analysis
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0057 (  0.1%)  MemCpy Optimization
   0.0062 (  0.1%)   0.0000 (  0.0%)   0.0062 (  0.1%)   0.0055 (  0.1%)  Remove unnecessary non-generic-to-generic addrspacecasts
   0.0049 (  0.1%)   0.0000 (  0.0%)   0.0049 (  0.1%)   0.0052 (  0.1%)  SROA
   0.0054 (  0.1%)   0.0000 (  0.0%)   0.0054 (  0.1%)   0.0052 (  0.1%)  Peephole Optimizations
   0.0049 (  0.1%)   0.0038 (  7.6%)   0.0088 (  0.1%)   0.0050 (  0.1%)  Loop-Closed SSA Form Pass
   0.0042 (  0.1%)   0.0000 (  0.0%)   0.0042 (  0.1%)   0.0050 (  0.1%)  Slot index numbering
   0.0046 (  0.1%)   0.0000 (  0.0%)   0.0046 (  0.1%)   0.0048 (  0.1%)  CallGraph Construction
   0.0043 (  0.1%)   0.0000 (  0.0%)   0.0043 (  0.1%)   0.0047 (  0.1%)  Slot index numbering
   0.0025 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%)   0.0046 (  0.1%)  Dominator Tree Construction
   0.0050 (  0.1%)   0.0000 (  0.0%)   0.0050 (  0.1%)   0.0045 (  0.1%)  Dead Argument Elimination
   0.0049 (  0.1%)   0.0000 (  0.0%)   0.0049 (  0.1%)   0.0042 (  0.1%)  Remove dead machine instructions
   0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0041 (  0.0%)  Natural Loop Information
   0.0043 (  0.1%)   0.0000 (  0.0%)   0.0043 (  0.1%)   0.0040 (  0.0%)  Dominator Tree Construction
   0.0042 (  0.1%)   0.0000 (  0.0%)   0.0042 (  0.0%)   0.0040 (  0.0%)  Loop-Closed SSA Form Pass
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0038 (  0.0%)  Branch Probability Analysis
   0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0038 (  0.0%)  Dominator Tree Construction
   0.0037 (  0.0%)   0.0000 (  0.0%)   0.0037 (  0.0%)   0.0038 (  0.0%)  Dominator Tree Construction
   0.0024 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%)   0.0037 (  0.0%)  Dominator Tree Construction
   0.0029 (  0.0%)   0.0000 (  0.0%)   0.0029 (  0.0%)   0.0037 (  0.0%)  Lazy Value Information Analysis
   0.0037 (  0.0%)   0.0000 (  0.0%)   0.0037 (  0.0%)   0.0036 (  0.0%)  Branch Probability Analysis
   0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0035 (  0.0%)  Branch Probability Basic Block Placement
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0034 (  0.0%)  Dominator Tree Construction
   0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0033 (  0.0%)  Dominator Tree Construction
   0.0031 (  0.0%)   0.0000 (  0.0%)   0.0031 (  0.0%)   0.0033 (  0.0%)  Constant Hoisting
   0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0032 (  0.0%)  Dominator Tree Construction
   0.0021 (  0.0%)   0.0035 (  6.9%)   0.0055 (  0.1%)   0.0032 (  0.0%)  Dominator Tree Construction
   0.0038 (  0.0%)   0.0000 (  0.0%)   0.0038 (  0.0%)   0.0032 (  0.0%)  Loop-Closed SSA Form Pass
   0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0029 (  0.0%)  Loop-Closed SSA Form Pass
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0029 (  0.0%)  Natural Loop Information
   0.0029 (  0.0%)   0.0000 (  0.0%)   0.0029 (  0.0%)   0.0028 (  0.0%)  Loop-Closed SSA Form Pass
   0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  Partially inline calls to library functions
   0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0026 (  0.0%)  Dominator Tree Construction
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0026 (  0.0%)  Machine Function Analysis
   0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0025 (  0.0%)  NVPTX specific alloca hoisting
   0.0023 (  0.0%)   0.0000 (  0.0%)   0.0023 (  0.0%)   0.0024 (  0.0%)  Dominator Tree Construction
   0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0023 (  0.0%)  Dominator Tree Construction
   0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0022 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0022 (  0.0%)  Canonicalize natural loops
   0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0022 (  0.0%)  Dominator Tree Construction
   0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0022 (  0.0%)  Dominator Tree Construction
   0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0021 (  0.0%)  Dominator Tree Construction
   0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0021 (  0.0%)  Canonicalize natural loops
   0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0021 (  0.0%)  Delete dead loops
   0.0019 (  0.0%)   0.0000 (  0.0%)   0.0019 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
   0.0023 (  0.0%)   0.0000 (  0.0%)   0.0023 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0020 (  0.0%)  Dominator Tree Construction
   0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0020 (  0.0%)  Dominator Tree Construction
   0.0003 (  0.0%)   0.0000 (  0.1%)   0.0003 (  0.0%)   0.0020 (  0.0%)  Lower 'expect' Intrinsics
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0019 (  0.0%)  Block Frequency Analysis
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0019 (  0.0%)  MachinePostDominator Tree Construction
   0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0017 (  0.0%)  MachineDominator Tree Construction
   0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0017 (  0.0%)  Natural Loop Information
   0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0017 (  0.0%)  Machine Block Frequency Analysis
   0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0017 (  0.0%)  Machine Block Frequency Analysis
   0.0011 (  0.0%)   0.0039 (  7.7%)   0.0050 (  0.1%)   0.0017 (  0.0%)  Scalar Evolution Analysis
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0016 (  0.0%)  Natural Loop Information
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0015 (  0.0%)  Machine Block Frequency Analysis
   0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  Natural Loop Information
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0014 (  0.0%)  Natural Loop Information
   0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0014 (  0.0%)  Natural Loop Information
   0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0014 (  0.0%)  Scalar Evolution Analysis
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0014 (  0.0%)  Scalar Evolution Analysis
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0014 (  0.0%)  Unnamed pass: implement Pass::getPassName()
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0013 (  0.0%)  Canonicalize natural loops
   0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Global Variable Optimizer
   0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Canonicalize natural loops
   0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Merge disjoint stack slots
   0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Machine Natural Loop Construction
   0.0010 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)   0.0011 (  0.0%)  Machine Natural Loop Construction
   0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  Speculatively execute instructions
   0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  Process Implicit Definitions
   0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0011 (  0.0%)  Expand ISel Pseudo-instructions
   0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0011 (  0.0%)  Machine Natural Loop Construction
   0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  NVPTX optimize redundant cvta.to.local instruction
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0011 (  0.0%)  Canonicalize natural loops
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0010 (  0.0%)  Scalar Evolution Analysis
   0.0005 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)   0.0008 (  0.0%)  MergedLoadStoreMotion
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0008 (  0.0%)  Lower pointer arguments of CUDA kernels
   0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Remove unreachable blocks from the CFG
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Replace occurrences of __nvvm_reflect() calls with 0/1
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)  Memory Dependence Analysis
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Remove unreachable machine basic blocks
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0006 (  0.0%)  Optimize machine instruction PHIs
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Canonicalize natural loops
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0005 (  0.0%)  Memory Dependence Analysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)  Inline Cost Analysis
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)  Speculatively execute instructions
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)  Remove unreachable blocks from the CFG
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)  Canonicalize natural loops
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)  Rotate Loops
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)  Memory Dependence Analysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)  Lower invoke and unwind, for unwindless code generators
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)  Tail Duplication
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0004 (  0.0%)  Memory Dependence Analysis
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)  SROA
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)  Memory Dependence Analysis
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0002 (  0.0%)  Internalize Global Symbols
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)  Scalar Evolution Analysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  SLP Vectorizer
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0001 (  0.0%)  Loop Vectorization
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Post RA top-down list latency scheduler
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Strip Unused Function Prototypes
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Loop Access Analysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  StackMap Liveness Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Safe Stack instrumentation pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Machine Instruction Scheduler
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Lower Garbage Collection Instructions
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Alignment from assumptions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Live Stack Slot Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Stack Slot Coloring
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Local Stack Slot Allocation
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0000 (  0.0%)  Assign valid PTX names to globals
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Merge Duplicate Global Constants
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Ensure that the global variables are in the global address space
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Unnamed pass: implement Pass::getPassName()
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  No Alias Analysis (always returns 'may' alias)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   8.3034 (100.0%)   0.0502 (100.0%)   8.3537 (100.0%)   8.3467 (100.0%)  Total
jingyue updated this revision to Diff 30071.Jul 17 2015, 11:06 PM
jingyue edited edge metadata.

run GVN only under -O3

jingyue updated this object.Jul 22 2015, 9:59 PM
This revision was automatically updated to reflect the committed changes.