Introduced split shuffle node kind. If the node is supposed to be
gathered, the compiler tries to detect possibly vectorizable number of
scalar instruction in the whole node and if there are > number / 2 of
the compatible instructions, the whole node is splitted into 2 (number / 2)
subnodes, vectorized separately and then reshuffled into the resulting
vector.
Metric: SLP.NumVectorInstructions
Program results results0 diff
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 3.00 7.00 133.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/assembler/assembler.test 3.00 6.00 100.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1202.00 1627.00 35.4% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 149.00 201.00 34.9% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 149.00 201.00 34.9% test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test 37.00 46.00 24.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 282.00 10.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1624.00 1781.00 9.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1624.00 1781.00 9.7% test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test 115.00 126.00 9.6% test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test 115.00 126.00 9.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8737.00 9558.00 9.4% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1143.00 8.8% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 25.00 27.00 8.0% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 14.00 15.00 7.1% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 30.00 32.00 6.7% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1772.00 6.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3773.00 5.8% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3773.00 5.8% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4527.00 4772.00 5.4% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4240.00 4464.00 5.3% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 2100.00 5.2% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 421.00 440.00 4.5% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 299.00 311.00 4.0% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 35.00 36.00 2.9% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 39.00 40.00 2.6% test-suite :: MultiSource/Applications/hbd/hbd.test 41.00 42.00 2.4% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 265.00 1.9% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9220.00 1.3% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 4573.00 4598.00 0.5% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 709.00 711.00 0.3% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5623.00 5638.00 0.3% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5623.00 5638.00 0.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 383.00 384.00 0.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 624.00 623.00 -0.2% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 571.00 570.00 -0.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 854.00 851.00 -0.4% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 854.00 851.00 -0.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 65.00 64.00 -1.5% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 767.00 -2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 230.00 -2.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 230.00 -2.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/football/football.test 38.00 35.00 -7.9% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 188.00 -9.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 188.00 -9.2% test-suite :: MultiSource/Benchmarks/Ptrdist/anagram/anagram.test 20.00 18.00 -10.0% test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 37.00 31.00 -16.2% test-suite :: MultiSource/Benchmarks/MiBench/security-sha/security-sha.test 13.00 10.00 -23.1% test-suite :: SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall.test 8.00 6.00 -25.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm.test 8.00 6.00 -25.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k.test 8.00 6.00 -25.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu.test 8.00 6.00 -25.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt.test 16.00 12.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d-imper/jacobi-1d-imper.test 8.00 6.00 -25.0%
test-suite :: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.test 8.00 6.00 -25.0% test-suite :: MultiSource/Applications/aha/aha.test NaN 4.00 NaN
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test
- actually, more vector instruction, less shuffles.
MultiSource/Benchmarks/mafft/pairlocalalign.test - less shuffles
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test - more vectorized code,
less shuffles
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test - same
MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test - more
vectorized code, some inserts/shuffles were optimized.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test - more
vectorized code but some previously vectorized not vectorized anymore,
need D116312.
MultiSource/Benchmarks/mediabench/gsm/toast/toast.test - less shuffles,
more vector code.
MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test - same
MultiSource/Benchmarks/Prolangs-C/football/football.test - need
non-power-2 to improve more, but pretty the same.
External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test - more vector instruction, less shuffles.
External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test - more vector instruction, less shuffles.
MultiSource/Benchmarks/Ptrdist/anagram/anagram.test - less shuffles
All other - same
clang-format: please reformat the code