This is an archive of the discontinued LLVM Phabricator instance.

[LoopDist] Distribute vectorizable loops
Needs ReviewPublic

Authored by sanwou01 on Mar 30 2021, 7:36 AM.

Details

Summary

Loop distribute bails out early if a loop is already vectorizable. As a
first attempt to make the LoopDistribute pass more generally
useful (with the eventual aim of enabling loop distribute by default at
-O3), this patch removes that restriction.

Originally, this pass tries to separate the vectorizable parts of a loop
from its non-vectorizable parts, such that some of the resulting loops
can be vectorized. Loop distribution could be more generally useful, for
example, by improving cache locality of accesses in each loop.

With this change, all vectorizable load/stores end up in individual
partitions, only to be merged back together. With
--loop-distribute-merge-vectorizable-partitions=false however, the pass
distributes as much as possible, allowing us to start iterating on the
cost model.

To prevent removeUnusedInsts() from creating undefs outside of the loop,
replace any uses of seed instructions. For each value used outside of
the loop there is exactly one partition that uses that instruction as a
seed, thanks to findDefsUsedOutsideOfLoop(). This guarantees that all
uses outside of the loop are mapped to the correct partition.

This change, together with
--loop-distribute-merge-vectorizable-partitions=false (and
--enable-loop-distribute), distributes many more loops in the LLVM test
suite, with very mixed performance results.

Follow-up patches will work on a cost model to improve the performance
impact of the pass.

Diff Detail

Event Timeline

sanwou01 created this revision.Mar 30 2021, 7:36 AM
sanwou01 requested review of this revision.Mar 30 2021, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 7:36 AM

with the eventual aim of enabling loop distribute by default at -O3

This would be great!

By just looking at this patch I find it a bit difficult to get an overview of all moving parts involved. I.e., this makes probably sense:

Loop distribute bails out early if a loop is already vectorizable.

but by not doing this, do we remove opportunities for the vectoriser? So, perhaps the easiest is to get some perf numbers on the table?

Then, we can think about the cost-model too, and see if we can create some ideas about that. This pass is not enabled by default, so if perf numbers are okay and we don't make (downstream) users of this pass unhappy, it looks like a good step forward to me, but some ideas about steps after that would be good.

By just looking at this patch I find it a bit difficult to get an overview of all moving parts involved. I.e., this makes probably sense:

Loop distribute bails out early if a loop is already vectorizable.

but by not doing this, do we remove opportunities for the vectoriser? So, perhaps the easiest is to get some perf numbers on the table?

Then, we can think about the cost-model too, and see if we can create some ideas about that. This pass is not enabled by default, so if perf numbers are okay and we don't make (downstream) users of this pass unhappy, it looks like a good step forward to me, but some ideas about steps after that would be good.

D100381 implements a simple heuristics-based cost model for loop distribute and flips the merge-vectorizable-partitions switch that this patch adds. We can talk about the performance there: it doesn't really make sense here as this is more a bit of preliminary work. Further ideas on the cost model are very welcome, though!

To illustrate the behaviour of this patch a bit more, here are the number of loops distributed in the test suite (including SPEC 2006/2017). First column is the number of loops distributed *before* this patch, with loop distribute enabled; the second column is the same *after* this patch. The third column flips the no-merge-vectorizable switch, which corresponds to the maximum number of loops we *could* distribute.

There are a handful of cases where the first and second columns differ, which I wasn't entirely expecting. I'll have a look at what I've missed there.

Tests: 367
Metric: loop-distribute.NumLoopsDistributed

Program                                                                                              old-distribute new-distribute new-distribute-no-merge diff
 test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test                             2.00           2.00         235.00                  11650.0%
 test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test                              2.00           2.00         235.00                  11650.0%
 test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test                                        5.00           5.00         575.00                  11400.0%
 test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test                               30.00          30.00         1523.00                 4976.7%
 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test                                 2.00           2.00          61.00                  2950.0%
 test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test                                             4.00           4.00          97.00                  2325.0%
 test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test                        17.00          17.00         400.00                  2252.9%
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test                       17.00          17.00         400.00                  2252.9%
 test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test                                         3.00           4.00          70.00                  2233.3%
 test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test                             23.00          23.00         534.00                  2221.7%
 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test                                         NaN             1.00          23.00                  2200.0%
 test-suite :: MultiSource/Applications/oggenc/oggenc.test                                            NaN             1.00          23.00                  2200.0%
 test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test                                    NaN             2.00          42.00                  2000.0%
 test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test                                         1.00           1.00          13.00                  1200.0%
 test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test                                           1.00           1.00          13.00                  1200.0%
 test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test                                 2.00           2.00          24.00                  1100.0%
 test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test                                2.00           2.00          24.00                  1100.0%
 test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test                                         1.00           1.00          10.00                  900.0%
 test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test                              3.00           3.00          29.00                  866.7%
 test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test                                    3.00           3.00          29.00                  866.7%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test                        NaN             2.00          14.00                  600.0%
 test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test                              3.00           3.00          20.00                  566.7%
 test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test                                    3.00           3.00          20.00                  566.7%
 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test                                   19.00          19.00         121.00                  536.8%
 test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test                                    19.00          19.00         121.00                  536.8%
 test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test                              3.00           3.00          19.00                  533.3%
 test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test                                    3.00           3.00          19.00                  533.3%
 test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test                                            2.00           2.00           7.00                  250.0%
 test-suite :: MultiSource/Applications/siod/siod.test                                                 1.00           1.00           3.00                  200.0%
 test-suite :: External/SPEC/CINT2006/429.mcf/429.mcf.test                                             2.00           2.00           5.00                  150.0%
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test                       54.00          54.00          88.00                  63.0%
 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test                        54.00          54.00          88.00                  63.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv.test           2.00           2.00           3.00                  50.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test                               NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test                       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test                       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl.test                         NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt.test                         NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test                   NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test                                 NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test                         NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test                                            NaN            NaN             3.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/FloatMM.test                                          NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.test                               NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test                   NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test                           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test           NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test                 NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt.test                 NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test         NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test                   NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test         NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test           NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test             NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/Quicksort.test                                        NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt.test             NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test                   NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/RealMM.test                                           NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt.test                   NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test                           NaN            NaN            26.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test           NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test                                       NaN            NaN            94.00                   0.0%
 test-suite :: MultiSource/Benchmarks/VersaBench/bmm/bmm.test                                         NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog/dynprog.test          NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test                                      NaN            NaN             3.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/flops-2.test                                              NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/flops.test                                                NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/fp-convert.test                                           NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/pi.test                                                   NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test            NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/revertBits.test                                           NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/salsa20.test                                              NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm.test                NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Misc/whetstone.test                                            NaN            NaN             4.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk.test                NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test                 2.00           2.00           2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky.test        NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k.test              NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test          NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/McGill/queens.test                                             NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test  NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.test                NaN            NaN             3.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test                                NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/Bubblesort.test                                       NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test                              NaN            NaN            28.00                   0.0%
 test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test                       1.00           1.00           1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/nbench/nbench.test                                              NaN            NaN             5.00                   0.0%
 test-suite :: MultiSource/Benchmarks/sim/sim.test                                                    NaN            NaN             6.00                   0.0%
 test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test                                      NaN            NaN           195.00                   0.0%
 test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test                                      NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test                                      NaN            NaN             1.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper.test        NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/CoyoteBench/huffbench.test                                     NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/CoyoteBench/lpbench.test                                       NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d-imper/jacobi-1d-imper.test        NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml.test                    NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test         NaN            NaN             2.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test                        NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test                   NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test                        NaN            NaN            30.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test         NaN            NaN             2.00                   0.0%
 test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test                                     NaN            NaN             7.00                   0.0%
 test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test                                  NaN            NaN            27.00                   0.0%
 test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test                        NaN            NaN             8.00                   0.0%
 test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test                                      NaN            NaN             7.00                   0.0%
 test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test                                   NaN            NaN             6.00                   0.0%
 test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test                           NaN            NaN            10.00                   0.0%
 test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test                                 NaN            NaN            27.00                   0.0%
 test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test                       NaN            NaN             8.00                   0.0%
 test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test         NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt.test                     NaN            NaN             2.00                   0.0%
 test-suite :: MicroBenchmarks/ImageProcessing/Dilate/Dilate.test                                     NaN            NaN             1.00                   0.0%
 test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test                                     NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test                           NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test                           NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Applications/ClamAV/clamscan.test                                          NaN            NaN            50.00                   0.0%
 test-suite :: MultiSource/Applications/JM/lencod/lencod.test                                         NaN            NaN            19.00                   0.0%
 test-suite :: MultiSource/Applications/SPASS/SPASS.test                                               1.00           1.00           1.00                   0.0%
 test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test                            NaN            NaN            10.00                   0.0%
 test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test                                    NaN            NaN             6.00                   0.0%
 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test                                NaN            NaN            64.00                   0.0%
 test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test                                        NaN            NaN             1.00                   0.0%
 test-suite :: External/SPEC/CFP2006/450.soplex/450.soplex.test                                       NaN            NaN            22.00                   0.0%
 test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test                                       NaN            NaN            36.00                   0.0%
 test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test                                             NaN            NaN             2.00                   0.0%
 test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test                                     NaN            NaN            15.00                   0.0%
 test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test                                   NaN            NaN            63.00                   0.0%
 test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test                               NaN            NaN            35.00                   0.0%
 test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test                                     NaN            NaN             2.00                   0.0%
 test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test                                     NaN            NaN            19.00                   0.0%
 test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test                                    NaN            NaN             2.00                   0.0%
 test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test                                    NaN            NaN            19.00                   0.0%
 test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test                                NaN            NaN            80.00                   0.0%
 test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test                                        NaN            NaN             3.00                   0.0%
 test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test                                        NaN            NaN            11.00                   0.0%
 test-suite :: External/SPEC/CINT2006/458.sjeng/458.sjeng.test                                        NaN            NaN             1.00                   0.0%
 test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test                                    NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Applications/d/make_dparser.test                                           NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Applications/hbd/hbd.test                                                   1.00           1.00           1.00                   0.0%
 test-suite :: MultiSource/Applications/lua/lua.test                                                  NaN            NaN            12.00                   0.0%
 test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test                                      NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test                              NaN            NaN             6.00                   0.0%
 test-suite :: MultiSource/Benchmarks/McCat/04-bisect/bisect.test                                     NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test                                         NaN            NaN             1.00                   0.0%
 test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test                                           NaN            NaN            21.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test                  NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/network-dijkstra/network-dijkstra.test                  NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.test                          NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test                                     NaN            NaN            12.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test                                   NaN            NaN            10.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test                                              NaN            NaN             8.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test                                    NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test                                        NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test                       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test                       NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl.test                     NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test                                    NaN            NaN             6.00                   0.0%
 test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test                            NaN            NaN             5.00                   0.0%
 test-suite :: MultiSource/Applications/minisat/minisat.test                                          NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/FreeBench/analyzer/analyzer.test                                NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Applications/obsequi/Obsequi.test                                          NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Applications/sgefa/sgefa.test                                              NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Applications/sqlite3/sqlite3.test                                          NaN            NaN            13.00                   0.0%
 test-suite :: MultiSource/Applications/viterbi/viterbi.test                                          NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test                                    NaN            NaN             5.00                   0.0%
 test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test                            NaN            NaN             5.00                   0.0%
 test-suite :: MultiSource/Benchmarks/BitBench/five11/five11.test                                     NaN            NaN             1.00                   0.0%
 test-suite :: MultiSource/Benchmarks/Bullet/bullet.test                                              NaN            NaN            32.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test                              NaN            NaN             8.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels.test                  NaN            NaN             2.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test                          NaN            NaN            15.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test                            NaN            NaN             3.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test                                  NaN            NaN             9.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test                        NaN            NaN             4.00                   0.0%
 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test                            NaN            NaN            12.00                   0.0%
 test-suite :: SingleSource/Benchmarks/Stanford/Treesort.test                                         NaN            NaN             2.00                   0.0%
sanwou01 updated this revision to Diff 338893.Apr 20 2021, 9:11 AM

Rebased, and addressed discrepancy in the loop distributed. The difference hinges on loops that contain backward dependences which the loop vectorizer can handle, but which would frustrate loop distribution. In this case, we don't distributing the loop and leave it to the loop vectorizer.

sanwou01 added a comment.EditedApr 20 2021, 9:15 AM

Now, there are no differences in distributed loops in the test suite and SPEC, before and after the patch, as intended.

sanwou01 retitled this revision from [RFC] [LoopDist] Distribute vectorizable loops to [LoopDist] Distribute vectorizable loops.Apr 20 2021, 9:15 AM
nikic resigned from this revision.Jun 9 2021, 1:48 PM
lebedev.ri resigned from this revision.Jan 12 2023, 5:21 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:21 PM