This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
1/2
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
1/4
BasicTTIImpl.h
-
Transforms/Vectorize/
-
Vectorize/
-
SLPVectorizer.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
arith-add-load.ll
-
arith-and-const-load.ll
1/8
arith-mul-load.ll
-
crash_7zip.ll
-
crash_bullet.ll
-
crash_bullet3.ll
-
crash_sim4b1.ll
-
fptosi-inseltpoison.ll
-
fptosi.ll
-
fptoui.ll
-
hadd-inseltpoison.ll
-
hadd.ll
-
insert-after-bundle.ll
-
memory-runtime-checks.ll
-
no_alternate_divrem.ll
-
odd_store.ll
2
pr49933.ll
-
remark_not_all_parts.ll
-
reorder_phi.ll
-
saxpy.ll
-
schedule-bundle.ll
-
simple-loop.ll
-
sitofp-inseltpoison.ll
-
sitofp.ll
-
uitofp.ll
-
vect_copyable_in_binops.ll

Differential D124284

[SLP]Try partial store vectorization if supported by target.
ClosedPublic

Authored by ABataev on Apr 22 2022, 11:17 AM.

Download Raw Diff

Details

Reviewers

RKSimon
xbolva00
SjoerdMeijer
fhahn

Commits

rG9dc4ced204d1: [SLP]Try partial store vectorization if supported by target.

Summary

We can try to vectorize number of stores less than MinVecRegSize
/ scalar_value_size, if it is allowed by target. Gives an extra
opportunity for the vectorization.

Fixes PR54985.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 22 2022, 11:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 11:17 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

ABataev requested review of this revision.Apr 22 2022, 11:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 11:17 AM

Harbormaster completed remote builds in B160909: Diff 424544.Apr 22 2022, 12:40 PM

xbolva00 added inline comments.Apr 22 2022, 12:57 PM

llvm/test/Transforms/SLPVectorizer/X86/pr49933.ll
2–3	Remove please.
4	Great! This patch also fixes https://github.com/llvm/llvm-project/issues/49277

Address comment

Harbormaster completed remote builds in B160942: Diff 424590.Apr 22 2022, 2:34 PM

xbolva00 added reviewers: SjoerdMeijer, fhahn.Apr 23 2022, 12:11 PM

I’m going to do some extra testing next week for this patch (if the bugs with non-typed pointers are fixed).

Metric: SLP.NumVectorInstructions

Program                                                                                  SLP.NumVectorInstructions
                                                                                         results                   results0 diff
                    test-suite :: MultiSource/Benchmarks/Prolangs-C++/shapes/shapes.test     0.00                      6.00     inf%
                      test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test     0.00                     20.00     inf%
                        test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     0.00                      4.00     inf%
              test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test     0.00                      2.00     inf%
                    test-suite :: MultiSource/Benchmarks/BitBench/uuencode/uuencode.test     0.00                      3.00     inf%
                              test-suite :: SingleSource/Benchmarks/Stanford/Towers.test     0.00                      1.00     inf%
                                        test-suite :: SingleSource/UnitTests/initp1.test     0.00                     20.00     inf%
                       test-suite :: SingleSource/UnitTests/ms_struct-bitfield-init.test     0.00                      1.00     inf%
     test-suite :: MultiSource/Benchmarks/MiBench/network-dijkstra/network-dijkstra.test     0.00                      2.00     inf%
     test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test     0.00                      4.00     inf%
                        test-suite :: MultiSource/Benchmarks/McCat/01-qbsort/qbsort.test     0.00                      2.00     inf%
                        test-suite :: MultiSource/Benchmarks/McCat/12-IOtest/iotest.test     0.00                      1.00     inf%
                                test-suite :: SingleSource/Benchmarks/Dhrystone/dry.test     0.00                      3.00     inf%
                                 test-suite :: MultiSource/Applications/sgefa/sgefa.test     0.00                      1.00     inf%
                       test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     0.00                      2.00     inf%
                 test-suite :: MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder.test     0.00                      2.00     inf%
                       test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test     0.00                      4.00     inf%
                      test-suite :: MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4.test     0.00                      1.00     inf%
                     test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test     0.00                      3.00     inf%
                                     test-suite :: MultiSource/Applications/aha/aha.test     0.00                      4.00     inf%
                                 test-suite :: MultiSource/Applications/spiff/spiff.test     0.00                      5.00     inf%
      test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test     0.00                      4.00     inf%
                         test-suite :: MultiSource/Applications/lambda-0.1.3/lambda.test     0.00                      2.00     inf%
            test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test     0.00                      1.00     inf%
                           test-suite :: MultiSource/Applications/hexxagon/hexxagon.test     0.00                      8.00     inf%
                  test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-tbl/unix-tbl.test     0.00                      2.00     inf%
                        test-suite :: MultiSource/Benchmarks/Prolangs-C++/life/life.test     0.00                     20.00     inf%
                  test-suite :: MultiSource/Benchmarks/Prolangs-C++/objects/objects.test     0.00                      3.00     inf%
           test-suite :: MultiSource/Benchmarks/MiBench/office-ispell/office-ispell.test     0.00                      5.00     inf%
                test-suite :: MultiSource/Benchmarks/Prolangs-C/assembler/assembler.test     0.00                      2.00     inf%
                                   test-suite :: MultiSource/Applications/siod/siod.test     2.00                    209.00 10350.0%
                                     test-suite :: MultiSource/Applications/lua/lua.test     1.00                     46.00  4500.0%
                             test-suite :: MultiSource/Applications/sqlite3/sqlite3.test    21.00                    438.00  1985.7%
                               test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test     2.00                     30.00  1400.0%
                 test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test     3.00                     41.00  1266.7%
                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.test     2.00                     26.00  1200.0%
              test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test    32.00                    378.00  1081.2%
                                   test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    10.00                     77.00   670.0%
                              test-suite :: MultiSource/Applications/d/make_dparser.test     2.00                     15.00   650.0%
                   test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    20.00                    141.00   605.0%
                  test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    20.00                    141.00   605.0%
              test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test     2.00                     13.00   550.0%
              test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test     2.00                     13.00   550.0%
             test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     5.00                     32.00   540.0%
           test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   100.00                    507.00   407.0%
          test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   100.00                    507.00   407.0%
                               test-suite :: SingleSource/Benchmarks/McGill/exptree.test     1.00                      5.00   400.0%
           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test     2.00                     10.00   400.0%
                              test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    11.00                     44.00   300.0%
                                 test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test     5.00                     18.00   260.0%
                   test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   131.00                    469.00   258.0%
                       test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test     8.00                     28.00   250.0%
                   test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test    73.00                    222.00   204.1%
         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/Pathfinder/PathFinder.test     2.00                      6.00   200.0%
                                 test-suite :: MultiSource/Applications/lemon/lemon.test     5.00                     15.00   200.0%
                             test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test    12.00                     35.00   191.7%
                           test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test    97.00                    282.00   190.7%
                test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 11194.00                  31786.00   184.0%
               test-suite :: MultiSource/Benchmarks/Prolangs-C/archie-client/archie.test     4.00                     11.00   175.0%
                test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test     7.00                     18.00   157.1%
     test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test    47.00                    120.00   155.3%
                              test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test    11.00                     28.00   154.5%
                                     test-suite :: MultiSource/Applications/hbd/hbd.test    41.00                    104.00   153.7%
                      test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   698.00                   1681.00   140.8%
                       test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   698.00                   1681.00   140.8%
                             test-suite :: MultiSource/Applications/ClamAV/clamscan.test    85.00                    195.00   129.4%
                           test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   396.00                    907.00   129.0%
                           test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test    31.00                     71.00   129.0%
                           test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    45.00                    101.00   124.4%
                           test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   101.00                    214.00   111.9%
                         test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1762.00                   3598.00   104.2%
                          test-suite :: External/SPEC/CFP2006/450.soplex/450.soplex.test    64.00                    130.00   103.1%
                                test-suite :: SingleSource/Benchmarks/Stanford/Perm.test     6.00                     12.00   100.0%
                              test-suite :: SingleSource/Benchmarks/Dhrystone/fldry.test     1.00                      2.00   100.0%
                             test-suite :: MultiSource/Applications/viterbi/viterbi.test     1.00                      2.00   100.0%
                             test-suite :: MultiSource/Applications/obsequi/Obsequi.test     2.00                      4.00   100.0%
                       test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   120.00                    233.00    94.2%
                        test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test    56.00                    107.00    91.1%
                               test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test   522.00                    976.00    87.0%
                             test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   165.00                    295.00    78.8%
                                 test-suite :: MultiSource/Applications/SPASS/SPASS.test   176.00                    307.00    74.4%
                             test-suite :: MultiSource/Benchmarks/Rodinia/srad/srad.test     3.00                      5.00    66.7%
                                 test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  6965.00                  11467.00    64.6%
                  test-suite :: MultiSource/Benchmarks/Prolangs-C/football/football.test    45.00                     73.00    62.2%
                      test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test    10.00                     16.00    60.0%
          test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   681.00                   1080.00    58.6%
                            test-suite :: MultiSource/Applications/JM/lencod/lencod.test  1175.00                   1814.00    54.4%
                        test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test    24.00                     37.00    54.2%
                       test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   980.00                   1508.00    53.9%
                           test-suite :: External/SPEC/CINT2006/458.sjeng/458.sjeng.test    32.00                     49.00    53.1%
               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    49.00                     74.00    51.0%
                 test-suite :: External/SPEC/CINT2006/462.libquantum/462.libquantum.test   107.00                    161.00    50.5%
                test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  3638.00                   5474.00    50.5%
               test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  3638.00                   5474.00    50.5%
                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test     2.00                      3.00    50.0%                                                                                               
                         test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test    88.00                    129.00    46.6%
                        test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test    88.00                    129.00    46.6%
                       test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test    10.00                     14.00    40.0%
                           test-suite :: MultiSource/Benchmarks/Olden/health/health.test     5.00                      7.00    40.0%
                                test-suite :: MultiSource/Applications/kimwitu++/kc.test    58.00                     81.00    39.7%
           test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   271.00                    378.00    39.5%
                    test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   750.00                   1031.00    37.5%
                     test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   750.00                   1031.00    37.5%
             test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    75.00                    103.00    37.3%
                    test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    75.00                    103.00    37.3%
                 test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   273.00                    372.00    36.3%
                   test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2078.00                   2778.00    33.7%
                               test-suite :: MultiSource/Applications/treecc/treecc.test    12.00                     16.00    33.3%
                                  test-suite :: SingleSource/Benchmarks/McGill/misr.test     6.00                      8.00    33.3%
                             test-suite :: MultiSource/Applications/minisat/minisat.test     3.00                      4.00    33.3%
               test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   814.00                   1084.00    33.2%
              test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   814.00                   1084.00    33.2%
                            test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   584.00                    777.00    33.0%
                               test-suite :: MultiSource/Applications/oggenc/oggenc.test   237.00                    311.00    31.2%
           test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    65.00                     85.00    30.8%
          test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    65.00                     85.00    30.8%
         test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test    53.00                     67.00    26.4%
                              test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test     4.00                      5.00    25.0%
                                 test-suite :: MultiSource/Benchmarks/nbench/nbench.test   218.00                    271.00    24.3%
           test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  3719.00                   4588.00    23.4%
          test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  3719.00                   4588.00    23.4%
           test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   670.00                    820.00    22.4%
                       test-suite :: MultiSource/Benchmarks/Ptrdist/anagram/anagram.test    32.00                     39.00    21.9%
                  test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  4980.00                   6059.00    21.7%
                          test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  4991.00                   6071.00    21.6%
                           test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test    10.00                     12.00    20.0%
                 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   490.00                    586.00    19.6%
                  test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 15903.00                  18607.00    17.0%
                          test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5982.00                   6994.00    16.9%
                       test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   494.00                    553.00    11.9%
                        test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   494.00                    553.00    11.9%
               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.test    27.00                     30.00    11.1%
             test-suite :: MultiSource/Benchmarks/MiBench/security-sha/security-sha.test    18.00                     20.00    11.1%
                             test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test    25.00                     27.00     8.0%
               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    86.00                     92.00     7.0%
                           test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test    16.00                     17.00     6.2%
             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test   236.00                    247.00     4.7%
                      test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  6030.00                   6307.00     4.6%
               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   285.00                    298.00     4.6%
                     test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test    89.00                     93.00     4.5%
               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test   102.00                    106.00     3.9%
                              test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test  3098.00                   3198.00     3.2%
                             test-suite :: SingleSource/UnitTests/matrix-types-spec.test    31.00                     32.00     3.2%
                     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test   143.00                    147.00     2.8%
                 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test    49.00                     50.00     2.0%
                            test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test    59.00                     60.00     1.7%
                          test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  1023.00                   1038.00     1.5%
                test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test    84.00                     85.00     1.2%
                              test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test  1020.00                   1029.00     0.9%
                              test-suite :: SingleSource/Benchmarks/SmallPT/smallpt.test   114.00                    115.00     0.9%
                         test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1560.00                   1564.00     0.3%

Statistics. All numbers are improvements.

Looks great. Could you also observe some runtime perf improvements?

In D124284#3472395, @xbolva00 wrote:

Looks great. Could you also observe some runtime perf improvements?

My system is pretty busy so the perf numbers would not be correct most probably.

In D124284#3472397, @ABataev wrote:

In D124284#3472395, @xbolva00 wrote:

Looks great. Could you also observe some runtime perf improvements?

My system is pretty busy so the perf numbers would not be correct most probably.

Just an example:

test-suite :: MultiSource/Benchmarks/llubenchmark/llu.test  10.33     17.52     69.7%

The test is not affected at all.

As to some long run tests:

test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test  28.09     29.60      5.4%
test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  32.65     34.00      4.1%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test  96.71    100.30      3.7%
test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test  49.65     50.82      2.4%
test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test  50.21     50.93      1.4%
test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  33.50     31.27     -6.6%
test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test  60.73     55.16     -9.2%
test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test  36.94     32.76    -11.3%

The less %, the better. Geomean is -100% but just like I said I would not trust these numbers.

Yeah, understood.

Increased priority, some numbers for long run tests:
Regressions

Metric: exec_time
test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test  26.98     27.49      1.9%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test  94.46     95.73      1.3%
test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test  63.61     64.22      1.0%

Gains

test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 234.34    231.17     -1.4%
test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test  27.07     26.70     -1.4%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test  59.34     57.95     -2.3%
test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test  45.51     44.42     -2.4%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  36.46     35.40     -2.9%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test  57.70     55.93     -3.1%
test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test  57.02     53.86     -5.5%
test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test  45.46     40.61    -10.7%
test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test  51.19     41.87    -18.2%

Geomean is still -100%, which means that still the performance with the patch is better than before.
Tested for O3 + LTO, generic CPU.

Very nice numbers!

Maybe @asbirlea or @fhahn want to run own internal tests?

Noted, I will run some testing this week.

RKSimon added inline comments.Apr 26 2022, 5:21 AM

llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
53 ↗	(On Diff #424590)	its weird that none of tests actually CHECK-NOT for loads any more - given its the name of the test file :-\|

ABataev added inline comments.Apr 26 2022, 5:23 AM

llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
53 ↗	(On Diff #424590)	I can keep CHECK-NOT, if needed

RKSimon added inline comments.Apr 26 2022, 5:36 AM

llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
53 ↗	(On Diff #424590)	Either that or we move to auto-generating the test file checks - I have no strong preference

RKSimon mentioned this in rG6e078f980450: [GVN][NewGVN] Regenerate no_speculative_loads_with_asan.ll tests.Apr 27 2022, 2:58 AM

rebase?

Rebase

dmgreen added a subscriber: dmgreen.Apr 27 2022, 10:32 AM

dmgreen added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1372	I don't think that exposing isLegalOrCustom to the midend is the right way to go - I feel it sets a bad precedent I don't think that "Custom" means enough to base mid-end optimizations on. It can mean anything from "this can be custom lowered to a single instruction", to "this can _sometimes_ be custom lowered to a single instruction, in specific situations, otherwise it will expand", to "this has to be custom expanded into 150 instructions". The variance between them is just too large. It also created a dependency between the mid-end and SDAG ISel lowering that isn't good to introduce - considering that there are other ISel's like Global ISel, there might be a point in the future where SDAG is entirely unused in certain backends. From what I can tell (correct me if I'm wrong), what you want to add for this specific patch is a way to override/ignore getMinVectorRegisterBitWidth for stores that the target can efficiently handle. But you don't just want to change getMinVectorRegisterBitWidth? Can we add a method for doing that? `shouldOverrideMinStoreVectorRegisterBitwidth(Type *Ty)`. The default implementation can still be the same as the current BasicTTI::isLegalOrCustomInstruction method, but it allows the target to override it if desired, and doesn't expose LegalOrCustom to the midend. Which I think is better in the long run.

dmgreen added inline comments.Apr 27 2022, 10:34 AM

llvm/include/llvm/CodeGen/BasicTTIImpl.h
318	Should this be getTypeLegalizationCost or getValueType? Otherwise we are asking for the isOperationLegalOrCustom on a legal type (LT.second) below, which you would hope was always Legal and won't really tell you much about how legal a store to Ty is.

ABataev added inline comments.Apr 27 2022, 10:35 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
1372	I'll try to invent something better.

Harbormaster completed remote builds in B161631: Diff 425557.Apr 27 2022, 11:21 AM

vporpo added a subscriber: vporpo.Apr 27 2022, 2:32 PM

Address comments

Fix test checks

RKSimon mentioned this in D103925: [X86][SSE] Support 64-bit vectorization (WIP).Apr 28 2022, 7:59 AM

Harbormaster completed remote builds in B161792: Diff 425776.Apr 28 2022, 8:46 AM

Thanks for the update, this looks better to me. The perf results I have were looking OK too, except for one fp16 case where it was choosing to use v2f16 vectorization. It could be OK, it's essentially trading unrolling vs vectorizing and one isn't obviously better or worse than the other. But small vectorization factors can be difficult at times.

I think the problem is that getOperationAction will get the data from these OpActions, which will all be initialize to 0 (=Legal) and targets do not usually overrides that for illegal types:
https://github.com/llvm/llvm-project/blob/a9d68a5524dea113cace5983697786599cbdce9a/llvm/lib/CodeGen/TargetLoweringBase.cpp#L733

So it can pick up "Legal" stores just because the default initialization and the target has never had to set them to anything else in the past. There are TruncStoreActions that should be set though. What do you think of using something like this, based on whether there is a legal trunk store?

unsigned getStoreMinimumVF(unsigned VF, Type *ScalarTy) const {
  auto &&IsSupportedByTarget = [this, ScalarTy](unsigned VF) {
    auto *SrcTy = FixedVectorType::get(ScalarTy, VF / 2);
    EVT VT = getTLI()->getValueType(DL, SrcTy);
    TargetLowering::LegalizeAction LA =
        getTLI()->getOperationAction(ISD::STORE, VT);
    if (getTLI()->isTypeLegal(VT) &&
        (LA == TargetLowering::Legal || LA == TargetLowering::Custom))
      return true;

    auto LT = getTLI()->getTypeLegalizationCost(DL, SrcTy);
    LA = getTLI()->getTruncStoreAction(LT.second, VT);
    return LA == TargetLowering::Legal || LA == TargetLowering::Custom;
  };
  while (VF > 2 && IsSupportedByTarget(VF))
    VF /= 2;
  return VF;
}

Would that work for the cases you are interested in? The target can override it in any case, so at least it is controllable. But if that works for your use-cases it should hopefully match the target lowering a little better.

In D124284#3484061, @dmgreen wrote:
Thanks for the update, this looks better to me. The perf results I have were looking OK too, except for one fp16 case where it was choosing to use v2f16 vectorization. It could be OK, it's essentially trading unrolling vs vectorizing and one isn't obviously better or worse than the other. But small vectorization factors can be difficult at times.

I think the problem is that getOperationAction will get the data from these OpActions, which will all be initialize to 0 (=Legal) and targets do not usually overrides that for illegal types:
https://github.com/llvm/llvm-project/blob/a9d68a5524dea113cace5983697786599cbdce9a/llvm/lib/CodeGen/TargetLoweringBase.cpp#L733

So it can pick up "Legal" stores just because the default initialization and the target has never had to set them to anything else in the past. There are TruncStoreActions that should be set though. What do you think of using something like this, based on whether there is a legal trunk store?
unsigned getStoreMinimumVF(unsigned VF, Type *ScalarTy) const {
  auto &&IsSupportedByTarget = [this, ScalarTy](unsigned VF) {
    auto *SrcTy = FixedVectorType::get(ScalarTy, VF / 2);
    EVT VT = getTLI()->getValueType(DL, SrcTy);
    TargetLowering::LegalizeAction LA =
        getTLI()->getOperationAction(ISD::STORE, VT);
    if (getTLI()->isTypeLegal(VT) &&
        (LA == TargetLowering::Legal || LA == TargetLowering::Custom))
      return true;

    auto LT = getTLI()->getTypeLegalizationCost(DL, SrcTy);
    LA = getTLI()->getTruncStoreAction(LT.second, VT);
    return LA == TargetLowering::Legal || LA == TargetLowering::Custom;
  };
  while (VF > 2 && IsSupportedByTarget(VF))
    VF /= 2;
  return VF;
}
Would that work for the cases you are interested in? The target can override it in any case, so at least it is controllable. But if that works for your use-cases it should hopefully match the target lowering a little better.

I tried something similar already, it won't work. Plus, trunc store is not the case we're looking at here, it is different. This function just says to vectorizer that it might be worth trying this vector factor. The cost model should later inform that it is not profitable. If something is not correct in the TTI, it should be fixed in TTI.

If something is not correct in the TTI, it should be fixed in TTI.

Strong +1

But that issue should not be a blocker IMHO.

I tried something similar already, it won't work. Plus, trunc store is not the case we're looking at here, it is different. This function just says to vectorizer that it might be worth trying this vector factor. The cost model should later inform that it is not profitable. If something is not correct in the TTI, it should be fixed in TTI.

Oh OK, that's a shame. There may be something a little off with the f16 costmodel, it is not always perfect, but I don't see anything obvious from what it is printing. There are only v2f16 values, which can't go too wrong.
The issue isn't that SLP vectorization is worse than scalar, it's that in that particular case runtime unrolling is better. The SLP that happens can get in the way of something more profitable, and no pass in llvm operates in a vacuum.

That issue isn't too important though. My worry is that this is currently an expensive way of saying "return 2". I've no strong objection if you want to go with the current method, but perhaps the default should be more "correct" and we can override the targets that want something different/more aggressive? They can choose to spend the extra compile time on factors that might not be expected to be very profitable to other archs.

rebase? I'm not sure if rGc5e875f599c25c2ea5a5c3dc6396de17c0c80a45 will have changed due to this patch

I'm seeing some fairly big regressions with this patch, specifically on Rome (AMD) architecture.
A couple of examples that are public in the test suite: SingleSource/Benchmarks/Shootout: for sieve I'm seeing a 20% performance regression in an opt build and an xfdo one, and for MicroBenchmarks/ImageProcessing/Dither 10% regression (opt, thinlto and xfdo).
I'm seeing also a couple on Skylake, opt build, in the range of 5-13 %, an example being eigen with 13% regression; this may be harder to track down as it's in a specific configuration but let me know if you want to reproduce this one.

As far as performance improvements, I see a few on Skylake in the range of 3-6%. An example here is MicroBenchmarks/ImageProcessing/Blur, which ranges between 4-5% improvement.

Overall, the regressions outnumber the gains in the testing I've done so far and would likely block our compiler release.

In D124284#3485220, @dmgreen wrote:

I tried something similar already, it won't work. Plus, trunc store is not the case we're looking at here, it is different. This function just says to vectorizer that it might be worth trying this vector factor. The cost model should later inform that it is not profitable. If something is not correct in the TTI, it should be fixed in TTI.

Oh OK, that's a shame. There may be something a little off with the f16 costmodel, it is not always perfect, but I don't see anything obvious from what it is printing. There are only v2f16 values, which can't go too wrong.
The issue isn't that SLP vectorization is worse than scalar, it's that in that particular case runtime unrolling is better. The SLP that happens can get in the way of something more profitable, and no pass in llvm operates in a vacuum.

That issue isn't too important though. My worry is that this is currently an expensive way of saying "return 2". I've no strong objection if you want to go with the current method, but perhaps the default should be more "correct" and we can override the targets that want something different/more aggressive? They can choose to spend the extra compile time on factors that might not be expected to be very profitable to other archs.

In D124284#3489002, @asbirlea wrote:

I'm seeing some fairly big regressions with this patch, specifically on Rome (AMD) architecture.
A couple of examples that are public in the test suite: SingleSource/Benchmarks/Shootout: for sieve I'm seeing a 20% performance regression in an opt build and an xfdo one, and for MicroBenchmarks/ImageProcessing/Dither 10% regression (opt, thinlto and xfdo).
I'm seeing also a couple on Skylake, opt build, in the range of 5-13 %, an example being eigen with 13% regression; this may be harder to track down as it's in a specific configuration but let me know if you want to reproduce this one.

As far as performance improvements, I see a few on Skylake in the range of 3-6%. An example here is MicroBenchmarks/ImageProcessing/Blur, which ranges between 4-5% improvement.

Overall, the regressions outnumber the gains in the testing I've done so far and would likely block our compiler release.

I'm trying to improve it. But if we have perf regressions, there is something wrong with the cost model.

Reworked initial implementation to be more conservative. Also, now it is able to handle trunc stores.

Harbormaster completed remote builds in B162691: Diff 427012.May 4 2022, 9:05 AM

@asbirlea Do you have an update on the regressions you were seeing vs latest patch?

In D124284#3493607, @RKSimon wrote:

@asbirlea Do you have an update on the regressions you were seeing vs latest patch?

Performance testing still ongoing, should be completed by tomorrow.

I ran the same set of benchmarks again without issue this time (well, there was an issue, but it turned out that someone had changed the benchmark sources :) ). They might not be the most amazing SLP tests, but no remaining objections from me.

(The reason I suggested the truncate code the way I did was because by default the legalizing rules for smaller than legal power2 types is to promote integers to larger sizes. So under MVE where we only have 128bit vectors, a v4i8 vector will be promoted to a v4i32. We would then need a v4i32->v4i8 truncstore for it to be legal. Which it does have! So it would allow some of the smaller than legal types to be vectorized, essentially treating the v4i8 operations as v4i32's. If we are going for more conservative it might make sense to ignore that though, and float types will always widen as opposed to promote by default.)

@asbirlea How are you specifying the SSE/AVX level for your benchmark runs - are you running with -march=native the x86-64-v* levels or something else?

RKSimon mentioned this in rG96d2d2508e4d: [SLP][X86] Add test coverage for PR47491 / Issue #46835.May 8 2022, 3:25 AM

RKSimon mentioned this in rG2233a6150015: [SLP][X86] Add test coverage for PR49934 / Issue #49278.May 8 2022, 3:33 AM

In D124284#3496381, @RKSimon wrote:

@asbirlea How are you specifying the SSE/AVX level for your benchmark runs - are you running with -march=native the x86-64-v* levels or something else?

I believe the runs are using -target-cpu k8 and -target-cpu haswell.

The latest performance testing still shows one regression in a benchmark from singlesource, but there are many improvements that offset it in non-public benchmarks. So this latest diff is good to go from my side.

LG. Thanks for many perf improvements.

@RKSimon ?

Please can you rebase? I added a number of PR tests yesterday and I'm curious how many improve.

In D124284#3499844, @asbirlea wrote:

In D124284#3496381, @RKSimon wrote:

@asbirlea How are you specifying the SSE/AVX level for your benchmark runs - are you running with -march=native the x86-64-v* levels or something else?

I believe the runs are using -target-cpu k8 and -target-cpu haswell.

The latest performance testing still shows one regression in a benchmark from singlesource, but there are many improvements that offset it in non-public benchmarks. So this latest diff is good to go from my side.

Hi, thanks for the testing. What's the name of the regressed single source test?

In D124284#3500236, @RKSimon wrote:

Please can you rebase? I added a number of PR tests yesterday and I'm curious how many improve.

Sure, will do later today

Rebase

LGTM - naturally the test-suite regression needs further investigation but I think that can be performed post-commit

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	I'll investigate adding 32-bit vector load/store handling as well (it has the same costs as the codegen for 64-bit anyhow).

This revision is now accepted and ready to land.May 9 2022, 8:56 AM

ABataev added inline comments.May 9 2022, 9:03 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	TTI does not report that it supports 32 bit stores.

RKSimon added inline comments.May 9 2022, 9:22 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	We never bothered to add it - we mainly use the 64-bit vector load/store to handle f64-i64 handling on 32-bit targets

This revision was landed with ongoing or failed builds.May 9 2022, 9:49 AM

Closed by commit rG9dc4ced204d1: [SLP]Try partial store vectorization if supported by target. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG9dc4ced204d1: [SLP]Try partial store vectorization if supported by target..

Harbormaster completed remote builds in B163496: Diff 428098.May 9 2022, 9:58 AM

In D124284#3500259, @ABataev wrote:

In D124284#3499844, @asbirlea wrote:

In D124284#3496381, @RKSimon wrote:

@asbirlea How are you specifying the SSE/AVX level for your benchmark runs - are you running with -march=native the x86-64-v* levels or something else?

I believe the runs are using -target-cpu k8 and -target-cpu haswell.

The latest performance testing still shows one regression in a benchmark from singlesource, but there are many improvements that offset it in non-public benchmarks. So this latest diff is good to go from my side.

Hi, thanks for the testing. What's the name of the regressed single source test?

Shootout/sieve for xfdo configuration on Rome looks regressed by 20%, and Shootout/fib2 for opt configuration also on Rome by 10%.
Dither has one -10% (floyd 128) and one +10% (floyd 512) on Rome opt and xfdo, but the same floyd ones plus a few others are +15 to +20% on Skylake opt, thinlto and xfdo and +5% to +10% on Haswell opt, thinlto and xfdo. Net win overall.
Eigen, complex benchmark also has many configs with net wins, e.g. +10% to +30% haswell xfdo and +10 to +25% rome opt.

There are others, but I hope this gives a rough idea.

vdmitrie added a subscriber: vdmitrie.May 9 2022, 5:58 PM

vdmitrie added inline comments.

llvm/include/llvm/CodeGen/BasicTTIImpl.h
330	Hm, there is some discrepancy here. Let's assume we entered the loop with VF = 8 and target supports 8 but not 4. VF == 8 , VF >2 && IsSupportedByTarget(8) evaluates to true => VF = VF/2 VF == 4 , VF >2 && IsSupportedByTarget(4) evaluates to false hence we return 4 which is actually not supported by target. Is that what the intent was?

ABataev added inline comments.May 9 2022, 7:01 PM

llvm/include/llvm/CodeGen/BasicTTIImpl.h
330	IsSupportedByTarget performs checks for VF/2, not for VF, and only if it is supported, sets VF to VF/2.

vdmitrie added inline comments.May 9 2022, 7:04 PM

llvm/include/llvm/CodeGen/BasicTTIImpl.h
330	Ah, I overlooked that. Thanks.

xbolva00 added inline comments.May 13 2022, 5:26 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	Do you plan to add them?

RKSimon added inline comments.May 13 2022, 5:31 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	yes - got a few other blockers to deal with first though - that yak has to be shaved.......

dtemirbulatov added a subscriber: dtemirbulatov.May 17 2022, 5:58 AM

anna added a subscriber: anna.May 17 2022, 8:35 AM

xbolva00 added inline comments.May 29 2022, 10:13 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	ok, thanks!

xbolva00 added inline comments.Jun 12 2022, 9:45 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	any updates? void pr(char* __restrict a, char* __restrict r){ for (int i = 0; i < 4; i++){ r[i] += a[i]; } } gcc emits nicely paddb.

RKSimon mentioned this in D127604: [SLP][X86] Add 32-bit vector stores to help vectorization opportunities.Jun 12 2022, 10:31 AM

RKSimon added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll
14	https://reviews.llvm.org/D127604 - but I need someone to perf test the patch properly.

RKSimon mentioned this in rGe961e05d593c: [SLP][X86] Add 32-bit vector stores to help vectorization opportunities.Jun 30 2022, 12:46 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

17 lines

TargetTransformInfoImpl.h

1 line

CodeGen/

BasicTTIImpl.h

20 lines

Transforms/

Vectorize/

SLPVectorizer.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

18 lines

test/

Transforms/

SLPVectorizer/

X86/

arith-add-load.ll

100 lines

arith-and-const-load.ll

82 lines

100 lines

22 lines

34 lines

25 lines

21 lines

fptosi-inseltpoison.ll

54 lines

54 lines

54 lines

14 lines

14 lines

insert-after-bundle.ll

178 lines

memory-runtime-checks.ll

22 lines

no_alternate_divrem.ll

18 lines

odd_store.ll

16 lines

pr49933.ll

62 lines

remark_not_all_parts.ll

38 lines

38 lines

14 lines

12 lines

48 lines

sitofp-inseltpoison.ll

37 lines

sitofp.ll

37 lines

uitofp.ll

46 lines

vect_copyable_in_binops.ll

266 lines

Diff 428116

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 944 Lines • ▼ Show 20 Lines	public:
/// If IsScalable is true, the returned ElementCount must be a scalable VF.		/// If IsScalable is true, the returned ElementCount must be a scalable VF.
ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const;		ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const;

/// \return The maximum vectorization factor for types of given element		/// \return The maximum vectorization factor for types of given element
/// bit width and opcode, or 0 if there is no maximum VF.		/// bit width and opcode, or 0 if there is no maximum VF.
/// Currently only used by the SLP vectorizer.		/// Currently only used by the SLP vectorizer.
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const;		unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const;

		/// \return The minimum vectorization factor for the store instruction. Given
		/// the initial estimation of the minimum vector factor and store value type,
		/// it tries to find possible lowest VF, which still might be profitable for
		/// the vectorization.
		/// \param VF Initial estimation of the minimum vector factor.
		/// \param ScalarMemTy Scalar memory type of the store operation.
		/// \param ScalarValTy Scalar type of the stored value.
		/// Currently only used by the SLP vectorizer.
		unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
		Type *ScalarValTy) const;

/// \return True if it should be considered for address type promotion.		/// \return True if it should be considered for address type promotion.
/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is		/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
/// profitable without finding other extensions fed by the same input.		/// profitable without finding other extensions fed by the same input.
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;

/// \return The size of a cache line in bytes.		/// \return The size of a cache line in bytes.
unsigned getCacheLineSize() const;		unsigned getCacheLineSize() const;
▲ Show 20 Lines • Show All 392 Lines • ▼ Show 20 Lines	bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;

/// \returns True if it is legal to vectorize the given reduction kind.		/// \returns True if it is legal to vectorize the given reduction kind.
bool isLegalToVectorizeReduction(const RecurrenceDescriptor &RdxDesc,		bool isLegalToVectorizeReduction(const RecurrenceDescriptor &RdxDesc,
ElementCount VF) const;		ElementCount VF) const;

/// \returns True if the given type is supported for scalable vectors		/// \returns True if the given type is supported for scalable vectors
bool isElementTypeLegalForScalableVector(Type *Ty) const;		bool isElementTypeLegalForScalableVector(Type *Ty) const;

		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think that exposing isLegalOrCustom to the midend is the right way to go - I feel it sets a bad precedent I don't think that "Custom" means enough to base mid-end optimizations on. It can mean anything from "this can be custom lowered to a single instruction", to "this can _sometimes_ be custom lowered to a single instruction, in specific situations, otherwise it will expand", to "this has to be custom expanded into 150 instructions". The variance between them is just too large. It also created a dependency between the mid-end and SDAG ISel lowering that isn't good to introduce - considering that there are other ISel's like Global ISel, there might be a point in the future where SDAG is entirely unused in certain backends. From what I can tell (correct me if I'm wrong), what you want to add for this specific patch is a way to override/ignore getMinVectorRegisterBitWidth for stores that the target can efficiently handle. But you don't just want to change getMinVectorRegisterBitWidth? Can we add a method for doing that? `shouldOverrideMinStoreVectorRegisterBitwidth(Type Ty)`. The default implementation can still be the same as the current BasicTTI::isLegalOrCustomInstruction method, but it allows the target to override it if desired, and doesn't expose LegalOrCustom to the midend. Which I think is better in the long run. dmgreen:* I don't think that exposing isLegalOrCustom to the midend is the right way to go - I feel it…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll try to invent something better. ABataev: I'll try to invent something better.
/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes loads or has a better vector factor.		/// SizeInBytes loads or has a better vector factor.
unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const;		VectorType *VecTy) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes stores or has a better vector factor.		/// SizeInBytes stores or has a better vector factor.
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	public:
virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;		virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() const = 0;		virtual unsigned getMinVectorRegisterBitWidth() const = 0;
virtual Optional<unsigned> getMaxVScale() const = 0;		virtual Optional<unsigned> getMaxVScale() const = 0;
virtual Optional<unsigned> getVScaleForTuning() const = 0;		virtual Optional<unsigned> getVScaleForTuning() const = 0;
virtual bool shouldMaximizeVectorBandwidth() const = 0;		virtual bool shouldMaximizeVectorBandwidth() const = 0;
virtual ElementCount getMinimumVF(unsigned ElemWidth,		virtual ElementCount getMinimumVF(unsigned ElemWidth,
bool IsScalable) const = 0;		bool IsScalable) const = 0;
virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;		virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;
		virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
		Type *ScalarValTy) const = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
virtual unsigned getCacheLineSize() const = 0;		virtual unsigned getCacheLineSize() const = 0;
virtual Optional<unsigned> getCacheSize(CacheLevel Level) const = 0;		virtual Optional<unsigned> getCacheSize(CacheLevel Level) const = 0;
virtual Optional<unsigned> getCacheAssociativity(CacheLevel Level) const = 0;		virtual Optional<unsigned> getCacheAssociativity(CacheLevel Level) const = 0;

/// \return How much before a load we should place the prefetch		/// \return How much before a load we should place the prefetch
/// instruction. This is currently measured in number of		/// instruction. This is currently measured in number of
▲ Show 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	public:
}		}
ElementCount getMinimumVF(unsigned ElemWidth,		ElementCount getMinimumVF(unsigned ElemWidth,
bool IsScalable) const override {		bool IsScalable) const override {
return Impl.getMinimumVF(ElemWidth, IsScalable);		return Impl.getMinimumVF(ElemWidth, IsScalable);
}		}
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override {		unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override {
return Impl.getMaximumVF(ElemWidth, Opcode);		return Impl.getMaximumVF(ElemWidth, Opcode);
}		}
		unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
		Type *ScalarValTy) const override {
		return Impl.getStoreMinimumVF(VF, ScalarMemTy, ScalarValTy);
		}
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {
return Impl.shouldConsiderAddressTypePromotion(		return Impl.shouldConsiderAddressTypePromotion(
I, AllowPromotionWithoutCommonHeader);		I, AllowPromotionWithoutCommonHeader);
}		}
unsigned getCacheLineSize() const override { return Impl.getCacheLineSize(); }		unsigned getCacheLineSize() const override { return Impl.getCacheLineSize(); }
Optional<unsigned> getCacheSize(CacheLevel Level) const override {		Optional<unsigned> getCacheSize(CacheLevel Level) const override {
return Impl.getCacheSize(Level);		return Impl.getCacheSize(Level);
▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	public:

bool shouldMaximizeVectorBandwidth() const { return false; }		bool shouldMaximizeVectorBandwidth() const { return false; }

ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const {		ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const {
return ElementCount::get(0, IsScalable);		return ElementCount::get(0, IsScalable);
}		}

unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; }		unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; }
		unsigned getStoreMinimumVF(unsigned VF, Type , Type ) const { return VF; }

bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
AllowPromotionWithoutCommonHeader = false;		AllowPromotionWithoutCommonHeader = false;
return false;		return false;
}		}

unsigned getCacheLineSize() const { return 0; }		unsigned getCacheLineSize() const { return 0; }
▲ Show 20 Lines • Show All 827 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
TargetLoweringBase::AddrMode AM;		TargetLoweringBase::AddrMode AM;
AM.BaseGV = BaseGV;		AM.BaseGV = BaseGV;
AM.BaseOffs = BaseOffset;		AM.BaseOffs = BaseOffset;
AM.HasBaseReg = HasBaseReg;		AM.HasBaseReg = HasBaseReg;
AM.Scale = Scale;		AM.Scale = Scale;
return getTLI()->isLegalAddressingMode(DL, AM, Ty, AddrSpace, I);		return getTLI()->isLegalAddressingMode(DL, AM, Ty, AddrSpace, I);
}		}

		unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
		Type *ScalarValTy) const {
		auto &&IsSupportedByTarget = [this, ScalarMemTy, ScalarValTy](unsigned VF) {
		auto *SrcTy = FixedVectorType::get(ScalarMemTy, VF / 2);
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be getTypeLegalizationCost or getValueType? Otherwise we are asking for the isOperationLegalOrCustom on a legal type (LT.second) below, which you would hope was always Legal and won't really tell you much about how legal a store to Ty is. dmgreen: Should this be getTypeLegalizationCost or getValueType? Otherwise we are asking for the…
		EVT VT = getTLI()->getValueType(DL, SrcTy);
		if (getTLI()->isOperationLegal(ISD::STORE, VT) \|\|
		getTLI()->isOperationCustom(ISD::STORE, VT))
		return true;

		EVT ValVT =
		getTLI()->getValueType(DL, FixedVectorType::get(ScalarValTy, VF / 2));
		EVT LegalizedVT =
		getTLI()->getTypeToTransformTo(ScalarMemTy->getContext(), VT);
		return getTLI()->isTruncStoreLegal(LegalizedVT, ValVT);
		};
		while (VF > 2 && IsSupportedByTarget(VF))
		vdmitrieUnsubmitted Not Done Reply Inline Actions Hm, there is some discrepancy here. Let's assume we entered the loop with VF = 8 and target supports 8 but not 4. VF == 8 , VF >2 && IsSupportedByTarget(8) evaluates to true => VF = VF/2 VF == 4 , VF >2 && IsSupportedByTarget(4) evaluates to false hence we return 4 which is actually not supported by target. Is that what the intent was? vdmitrie: Hm, there is some discrepancy here. Let's assume we entered the loop with VF = 8 and target…
		ABataevAuthorUnsubmitted Done Reply Inline Actions IsSupportedByTarget performs checks for VF/2, not for VF, and only if it is supported, sets VF to VF/2. ABataev: IsSupportedByTarget performs checks for VF/2, not for VF, and only if it is supported, sets VF…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Ah, I overlooked that. Thanks. vdmitrie: Ah, I overlooked that. Thanks.
		VF /= 2;
		return VF;
		}

bool isIndexedLoadLegal(TTI::MemIndexedMode M, Type *Ty,		bool isIndexedLoadLegal(TTI::MemIndexedMode M, Type *Ty,
const DataLayout &DL) const {		const DataLayout &DL) const {
EVT VT = getTLI()->getValueType(DL, Ty);		EVT VT = getTLI()->getValueType(DL, Ty);
return getTLI()->isIndexedLoadLegal(getISDIndexedMode(M), VT);		return getTLI()->isIndexedLoadLegal(getISDIndexedMode(M), VT);
}		}

bool isIndexedStoreLegal(TTI::MemIndexedMode M, Type *Ty,		bool isIndexedStoreLegal(TTI::MemIndexedMode M, Type *Ty,
const DataLayout &DL) const {		const DataLayout &DL) const {
▲ Show 20 Lines • Show All 1,990 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	bool vectorizeSimpleInstructions(SmallVectorImpl<Instruction *> &Instructions,
BasicBlock *BB, slpvectorizer::BoUpSLP &R,		BasicBlock *BB, slpvectorizer::BoUpSLP &R,
bool AtTerminator);		bool AtTerminator);

/// Scan the basic block and look for patterns that are likely to start		/// Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,		bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,
unsigned Idx);		unsigned Idx, unsigned MinVF);

bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);		bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);

/// The store instructions in a basic block organized by base pointer.		/// The store instructions in a basic block organized by base pointer.
StoreListMap Stores;		StoreListMap Stores;

/// The getelementptr instructions in a basic block organized by base pointer.		/// The getelementptr instructions in a basic block organized by base pointer.
GEPListMap GEPs;		GEPListMap GEPs;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	ElementCount TargetTransformInfo::getMinimumVF(unsigned ElemWidth,
return TTIImpl->getMinimumVF(ElemWidth, IsScalable);		return TTIImpl->getMinimumVF(ElemWidth, IsScalable);
}		}

unsigned TargetTransformInfo::getMaximumVF(unsigned ElemWidth,		unsigned TargetTransformInfo::getMaximumVF(unsigned ElemWidth,
unsigned Opcode) const {		unsigned Opcode) const {
return TTIImpl->getMaximumVF(ElemWidth, Opcode);		return TTIImpl->getMaximumVF(ElemWidth, Opcode);
}		}

		unsigned TargetTransformInfo::getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
		Type *ScalarValTy) const {
		return TTIImpl->getStoreMinimumVF(VF, ScalarMemTy, ScalarValTy);
		}

bool TargetTransformInfo::shouldConsiderAddressTypePromotion(		bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
return TTIImpl->shouldConsiderAddressTypePromotion(		return TTIImpl->shouldConsiderAddressTypePromotion(
I, AllowPromotionWithoutCommonHeader);		I, AllowPromotionWithoutCommonHeader);
}		}

unsigned TargetTransformInfo::getCacheLineSize() const {		unsigned TargetTransformInfo::getCacheLineSize() const {
return TTIImpl->getCacheLineSize();		return TTIImpl->getCacheLineSize();
▲ Show 20 Lines • Show All 561 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,731 Lines • ▼ Show 20 Lines	#endif
// Avoid duplicate scheduling of the block.		// Avoid duplicate scheduling of the block.
BS->ScheduleStart = nullptr;		BS->ScheduleStart = nullptr;
}		}

unsigned BoUpSLP::getVectorElementSize(Value *V) {		unsigned BoUpSLP::getVectorElementSize(Value *V) {
// If V is a store, just return the width of the stored value (or value		// If V is a store, just return the width of the stored value (or value
// truncated just before storing) without traversing the expression tree.		// truncated just before storing) without traversing the expression tree.
// This is the common case.		// This is the common case.
if (auto *Store = dyn_cast<StoreInst>(V)) {		if (auto *Store = dyn_cast<StoreInst>(V))
if (auto *Trunc = dyn_cast<TruncInst>(Store->getValueOperand()))
return DL->getTypeSizeInBits(Trunc->getSrcTy());
return DL->getTypeSizeInBits(Store->getValueOperand()->getType());		return DL->getTypeSizeInBits(Store->getValueOperand()->getType());
}

if (auto *IEI = dyn_cast<InsertElementInst>(V))		if (auto *IEI = dyn_cast<InsertElementInst>(V))
return getVectorElementSize(IEI->getOperand(1));		return getVectorElementSize(IEI->getOperand(1));

auto E = InstrElementSize.find(V);		auto E = InstrElementSize.find(V);
if (E != InstrElementSize.end())		if (E != InstrElementSize.end())
return E->second;		return E->second;

▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
if (Changed) {		if (Changed) {
R.optimizeGatherSequence();		R.optimizeGatherSequence();
LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");		LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");
}		}
return Changed;		return Changed;
}		}

bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,		bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,
unsigned Idx) {		unsigned Idx, unsigned MinVF) {
LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()		LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << Chain.size()
<< "\n");		<< "\n");
const unsigned Sz = R.getVectorElementSize(Chain[0]);		const unsigned Sz = R.getVectorElementSize(Chain[0]);
const unsigned MinVF = R.getMinVecRegSize() / Sz;
unsigned VF = Chain.size();		unsigned VF = Chain.size();

if (!isPowerOf2_32(Sz) \|\| !isPowerOf2_32(VF) \|\| VF < 2 \|\| VF < MinVF)		if (!isPowerOf2_32(Sz) \|\| !isPowerOf2_32(VF) \|\| VF < 2 \|\| VF < MinVF)
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx
<< "\n");		<< "\n");

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	while (I != E && !VectorizedStores.count(Stores[I])) {
I = ConsecutiveChain[I].first;		I = ConsecutiveChain[I].first;
}		}
assert(!Operands.empty() && "Expected non-empty list of stores.");		assert(!Operands.empty() && "Expected non-empty list of stores.");

unsigned MaxVecRegSize = R.getMaxVecRegSize();		unsigned MaxVecRegSize = R.getMaxVecRegSize();
unsigned EltSize = R.getVectorElementSize(Operands[0]);		unsigned EltSize = R.getVectorElementSize(Operands[0]);
unsigned MaxElts = llvm::PowerOf2Floor(MaxVecRegSize / EltSize);		unsigned MaxElts = llvm::PowerOf2Floor(MaxVecRegSize / EltSize);

unsigned MinVF = R.getMinVF(EltSize);
unsigned MaxVF = std::min(R.getMaximumVF(EltSize, Instruction::Store),		unsigned MaxVF = std::min(R.getMaximumVF(EltSize, Instruction::Store),
MaxElts);		MaxElts);
		auto *Store = cast<StoreInst>(Operands[0]);
		Type *StoreTy = Store->getValueOperand()->getType();
		Type *ValueTy = StoreTy;
		if (auto *Trunc = dyn_cast<TruncInst>(Store->getValueOperand()))
		ValueTy = Trunc->getSrcTy();
		unsigned MinVF = TTI->getStoreMinimumVF(
		R.getMinVF(DL->getTypeSizeInBits(ValueTy)), StoreTy, ValueTy);

// FIXME: Is division-by-2 the correct step? Should we assert that the		// FIXME: Is division-by-2 the correct step? Should we assert that the
// register size is a power-of-2?		// register size is a power-of-2?
unsigned StartIdx = 0;		unsigned StartIdx = 0;
for (unsigned Size = MaxVF; Size >= MinVF; Size /= 2) {		for (unsigned Size = MaxVF; Size >= MinVF; Size /= 2) {
for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {		for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {
ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);		ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);
if (!VectorizedStores.count(Slice.front()) &&		if (!VectorizedStores.count(Slice.front()) &&
!VectorizedStores.count(Slice.back()) &&		!VectorizedStores.count(Slice.back()) &&
vectorizeStoreChain(Slice, R, Cnt)) {		vectorizeStoreChain(Slice, R, Cnt, MinVF)) {
// Mark the vectorized stores so that we don't vectorize them again.		// Mark the vectorized stores so that we don't vectorize them again.
VectorizedStores.insert(Slice.begin(), Slice.end());		VectorizedStores.insert(Slice.begin(), Slice.end());
Changed = true;		Changed = true;
// If we vectorized initial block, no need to try to vectorize it		// If we vectorized initial block, no need to try to vectorize it
// again.		// again.
if (Cnt == StartIdx)		if (Cnt == StartIdx)
StartIdx += Size;		StartIdx += Size;
Cnt += Size;		Cnt += Size;
▲ Show 20 Lines • Show All 2,188 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-add-load.ll

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	entry:
%add.3 = add i8 %7, %6		%add.3 = add i8 %7, %6
store i8 %add.3, ptr %arrayidx2.3, align 1		store i8 %add.3, ptr %arrayidx2.3, align 1
ret void		ret void
}		}

define void @add8(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {		define void @add8(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {
; SSE-LABEL: @add8(		; SSE-LABEL: @add8(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1		; SSE-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[A:%.]], align 1
; SSE-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1		; SSE-NEXT: [[TMP1:%.]] = load <8 x i8>, ptr [[R:%.]], align 1
; SSE-NEXT: [[ADD:%.*]] = add i8 [[TMP1]], [[TMP0]]		; SSE-NEXT: [[TMP2:%.*]] = add <8 x i8> [[TMP1]], [[TMP0]]
; SSE-NEXT: store i8 [[ADD]], ptr [[R]], align 1		; SSE-NEXT: store <8 x i8> [[TMP2]], ptr [[R]], align 1
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1
; SSE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; SSE-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ADD_1:%.*]] = add i8 [[TMP3]], [[TMP2]]
; SSE-NEXT: store i8 [[ADD_1]], ptr [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 2
; SSE-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX2_2]], align 1
; SSE-NEXT: [[ADD_2:%.*]] = add i8 [[TMP5]], [[TMP4]]
; SSE-NEXT: store i8 [[ADD_2]], ptr [[ARRAYIDX2_2]], align 1
; SSE-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 3
; SSE-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; SSE-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 3
; SSE-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX2_3]], align 1
; SSE-NEXT: [[ADD_3:%.*]] = add i8 [[TMP7]], [[TMP6]]
; SSE-NEXT: store i8 [[ADD_3]], ptr [[ARRAYIDX2_3]], align 1
; SSE-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 4
; SSE-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; SSE-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 4
; SSE-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX2_4]], align 1
; SSE-NEXT: [[ADD_4:%.*]] = add i8 [[TMP9]], [[TMP8]]
; SSE-NEXT: store i8 [[ADD_4]], ptr [[ARRAYIDX2_4]], align 1
; SSE-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 5
; SSE-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; SSE-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 5
; SSE-NEXT: [[TMP11:%.*]] = load i8, ptr [[ARRAYIDX2_5]], align 1
; SSE-NEXT: [[ADD_5:%.*]] = add i8 [[TMP11]], [[TMP10]]
; SSE-NEXT: store i8 [[ADD_5]], ptr [[ARRAYIDX2_5]], align 1
; SSE-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 6
; SSE-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; SSE-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 6
; SSE-NEXT: [[TMP13:%.*]] = load i8, ptr [[ARRAYIDX2_6]], align 1
; SSE-NEXT: [[ADD_6:%.*]] = add i8 [[TMP13]], [[TMP12]]
; SSE-NEXT: store i8 [[ADD_6]], ptr [[ARRAYIDX2_6]], align 1
; SSE-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 7
; SSE-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; SSE-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 7
; SSE-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX2_7]], align 1
; SSE-NEXT: [[ADD_7:%.*]] = add i8 [[TMP15]], [[TMP14]]
; SSE-NEXT: store i8 [[ADD_7]], ptr [[ARRAYIDX2_7]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add8(		; AVX-LABEL: @add8(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1		; AVX-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[A:%.]], align 1
; AVX-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1		; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, ptr [[R:%.]], align 1
; AVX-NEXT: [[ADD:%.*]] = add i8 [[TMP1]], [[TMP0]]		; AVX-NEXT: [[TMP2:%.*]] = add <8 x i8> [[TMP1]], [[TMP0]]
; AVX-NEXT: store i8 [[ADD]], ptr [[R]], align 1		; AVX-NEXT: store <8 x i8> [[TMP2]], ptr [[R]], align 1
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1
; AVX-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; AVX-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 1
; AVX-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX2_1]], align 1
; AVX-NEXT: [[ADD_1:%.*]] = add i8 [[TMP3]], [[TMP2]]
; AVX-NEXT: store i8 [[ADD_1]], ptr [[ARRAYIDX2_1]], align 1
; AVX-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 2
; AVX-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 2
; AVX-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX2_2]], align 1
; AVX-NEXT: [[ADD_2:%.*]] = add i8 [[TMP5]], [[TMP4]]
; AVX-NEXT: store i8 [[ADD_2]], ptr [[ARRAYIDX2_2]], align 1
; AVX-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 3
; AVX-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; AVX-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 3
; AVX-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX2_3]], align 1
; AVX-NEXT: [[ADD_3:%.*]] = add i8 [[TMP7]], [[TMP6]]
; AVX-NEXT: store i8 [[ADD_3]], ptr [[ARRAYIDX2_3]], align 1
; AVX-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 4
; AVX-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; AVX-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 4
; AVX-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX2_4]], align 1
; AVX-NEXT: [[ADD_4:%.*]] = add i8 [[TMP9]], [[TMP8]]
; AVX-NEXT: store i8 [[ADD_4]], ptr [[ARRAYIDX2_4]], align 1
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 5
; AVX-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; AVX-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 5
; AVX-NEXT: [[TMP11:%.*]] = load i8, ptr [[ARRAYIDX2_5]], align 1
; AVX-NEXT: [[ADD_5:%.*]] = add i8 [[TMP11]], [[TMP10]]
; AVX-NEXT: store i8 [[ADD_5]], ptr [[ARRAYIDX2_5]], align 1
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 6
; AVX-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; AVX-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 6
; AVX-NEXT: [[TMP13:%.*]] = load i8, ptr [[ARRAYIDX2_6]], align 1
; AVX-NEXT: [[ADD_6:%.*]] = add i8 [[TMP13]], [[TMP12]]
; AVX-NEXT: store i8 [[ADD_6]], ptr [[ARRAYIDX2_6]], align 1
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 7
; AVX-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; AVX-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 7
; AVX-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX2_7]], align 1
; AVX-NEXT: [[ADD_7:%.*]] = add i8 [[TMP15]], [[TMP14]]
; AVX-NEXT: store i8 [[ADD_7]], ptr [[ARRAYIDX2_7]], align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
entry:		entry:
%0 = load i8, ptr %a, align 1		%0 = load i8, ptr %a, align 1
%1 = load i8, ptr %r, align 1		%1 = load i8, ptr %r, align 1
%add = add i8 %1, %0		%add = add i8 %1, %0
store i8 %add, ptr %r, align 1		store i8 %add, ptr %r, align 1
%arrayidx.1 = getelementptr inbounds i8, ptr %a, i64 1		%arrayidx.1 = getelementptr inbounds i8, ptr %a, i64 1
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-and-const-load.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	entry:
%arrayidx3.3 = getelementptr inbounds i8, ptr %dst, i64 3		%arrayidx3.3 = getelementptr inbounds i8, ptr %dst, i64 3
store i8 %7, ptr %arrayidx3.3, align 1		store i8 %7, ptr %arrayidx3.3, align 1
ret void		ret void
}		}

define void @and8(ptr noalias nocapture noundef writeonly %dst, ptr noalias nocapture noundef readonly %src) {		define void @and8(ptr noalias nocapture noundef writeonly %dst, ptr noalias nocapture noundef readonly %src) {
; SSE-LABEL: @and8(		; SSE-LABEL: @and8(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1		; SSE-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[SRC:%.]], align 1
; SSE-NEXT: [[TMP1:%.*]] = and i8 [[TMP0]], -64		; SSE-NEXT: [[TMP1:%.*]] = and <8 x i8> [[TMP0]], <i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64>
; SSE-NEXT: store i8 [[TMP1]], ptr [[DST:%.*]], align 1		; SSE-NEXT: store <8 x i8> [[TMP1]], ptr [[DST:%.*]], align 1
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1
; SSE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; SSE-NEXT: [[TMP3:%.*]] = and i8 [[TMP2]], -64
; SSE-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 1
; SSE-NEXT: store i8 [[TMP3]], ptr [[ARRAYIDX3_1]], align 1
; SSE-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; SSE-NEXT: [[TMP5:%.*]] = and i8 [[TMP4]], -64
; SSE-NEXT: [[ARRAYIDX3_2:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 2
; SSE-NEXT: store i8 [[TMP5]], ptr [[ARRAYIDX3_2]], align 1
; SSE-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 3
; SSE-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; SSE-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -64
; SSE-NEXT: [[ARRAYIDX3_3:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 3
; SSE-NEXT: store i8 [[TMP7]], ptr [[ARRAYIDX3_3]], align 1
; SSE-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 4
; SSE-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; SSE-NEXT: [[TMP9:%.*]] = and i8 [[TMP8]], -64
; SSE-NEXT: [[ARRAYIDX3_4:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 4
; SSE-NEXT: store i8 [[TMP9]], ptr [[ARRAYIDX3_4]], align 1
; SSE-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 5
; SSE-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; SSE-NEXT: [[TMP11:%.*]] = and i8 [[TMP10]], -64
; SSE-NEXT: [[ARRAYIDX3_5:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 5
; SSE-NEXT: store i8 [[TMP11]], ptr [[ARRAYIDX3_5]], align 1
; SSE-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 6
; SSE-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; SSE-NEXT: [[TMP13:%.*]] = and i8 [[TMP12]], -64
; SSE-NEXT: [[ARRAYIDX3_6:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 6
; SSE-NEXT: store i8 [[TMP13]], ptr [[ARRAYIDX3_6]], align 1
; SSE-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 7
; SSE-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; SSE-NEXT: [[TMP15:%.*]] = and i8 [[TMP14]], -64
; SSE-NEXT: [[ARRAYIDX3_7:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 7
; SSE-NEXT: store i8 [[TMP15]], ptr [[ARRAYIDX3_7]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @and8(		; AVX-LABEL: @and8(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1		; AVX-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[SRC:%.]], align 1
; AVX-NEXT: [[TMP1:%.*]] = and i8 [[TMP0]], -64		; AVX-NEXT: [[TMP1:%.*]] = and <8 x i8> [[TMP0]], <i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64, i8 -64>
; AVX-NEXT: store i8 [[TMP1]], ptr [[DST:%.*]], align 1		; AVX-NEXT: store <8 x i8> [[TMP1]], ptr [[DST:%.*]], align 1
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1
; AVX-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = and i8 [[TMP2]], -64
; AVX-NEXT: [[ARRAYIDX3_1:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 1
; AVX-NEXT: store i8 [[TMP3]], ptr [[ARRAYIDX3_1]], align 1
; AVX-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 2
; AVX-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; AVX-NEXT: [[TMP5:%.*]] = and i8 [[TMP4]], -64
; AVX-NEXT: [[ARRAYIDX3_2:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 2
; AVX-NEXT: store i8 [[TMP5]], ptr [[ARRAYIDX3_2]], align 1
; AVX-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 3
; AVX-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; AVX-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -64
; AVX-NEXT: [[ARRAYIDX3_3:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 3
; AVX-NEXT: store i8 [[TMP7]], ptr [[ARRAYIDX3_3]], align 1
; AVX-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 4
; AVX-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; AVX-NEXT: [[TMP9:%.*]] = and i8 [[TMP8]], -64
; AVX-NEXT: [[ARRAYIDX3_4:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 4
; AVX-NEXT: store i8 [[TMP9]], ptr [[ARRAYIDX3_4]], align 1
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 5
; AVX-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; AVX-NEXT: [[TMP11:%.*]] = and i8 [[TMP10]], -64
; AVX-NEXT: [[ARRAYIDX3_5:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 5
; AVX-NEXT: store i8 [[TMP11]], ptr [[ARRAYIDX3_5]], align 1
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 6
; AVX-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; AVX-NEXT: [[TMP13:%.*]] = and i8 [[TMP12]], -64
; AVX-NEXT: [[ARRAYIDX3_6:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 6
; AVX-NEXT: store i8 [[TMP13]], ptr [[ARRAYIDX3_6]], align 1
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 7
; AVX-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; AVX-NEXT: [[TMP15:%.*]] = and i8 [[TMP14]], -64
; AVX-NEXT: [[ARRAYIDX3_7:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 7
; AVX-NEXT: store i8 [[TMP15]], ptr [[ARRAYIDX3_7]], align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
entry:		entry:
%0 = load i8, ptr %src, align 1		%0 = load i8, ptr %src, align 1
%1 = and i8 %0, -64		%1 = and i8 %0, -64
store i8 %1, ptr %dst, align 1		store i8 %1, ptr %dst, align 1
%arrayidx.1 = getelementptr inbounds i8, ptr %src, i64 1		%arrayidx.1 = getelementptr inbounds i8, ptr %src, i64 1
%2 = load i8, ptr %arrayidx.1, align 1		%2 = load i8, ptr %arrayidx.1, align 1
▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v3 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v3 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v4 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v4 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX

; // PR47491		; // PR47491
; void pr(char* r, char* a){		; void pr(char* r, char* a){
; for (int i = 0; i < 8; i++){		; for (int i = 0; i < 8; i++){
; r[i] *= a[i];		; r[i] *= a[i];
; }		; }
; }		; }

define void @add4(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {		define void @add4(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {
		RKSimonUnsubmitted Not Done Reply Inline Actions I'll investigate adding 32-bit vector load/store handling as well (it has the same costs as the codegen for 64-bit anyhow). RKSimon: I'll investigate adding 32-bit vector load/store handling as well (it has the same costs as the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions TTI does not report that it supports 32 bit stores. ABataev: TTI does not report that it supports 32 bit stores.
		RKSimonUnsubmitted Not Done Reply Inline Actions We never bothered to add it - we mainly use the 64-bit vector load/store to handle f64-i64 handling on 32-bit targets RKSimon: We never bothered to add it - we mainly use the 64-bit vector load/store to handle f64-i64…
		xbolva00Unsubmitted Not Done Reply Inline Actions Do you plan to add them? xbolva00: Do you plan to add them?
		RKSimonUnsubmitted Not Done Reply Inline Actions yes - got a few other blockers to deal with first though - that yak has to be shaved....... RKSimon: yes - got a few other blockers to deal with first though - that yak has to be shaved.......
		xbolva00Unsubmitted Not Done Reply Inline Actions ok, thanks! xbolva00: ok, thanks!
		xbolva00Unsubmitted Not Done Reply Inline Actions any updates? void pr(char* __restrict a, char* __restrict r){ for (int i = 0; i < 4; i++){ r[i] += a[i]; } } gcc emits nicely paddb. xbolva00: any updates? ``` void pr(char* __restrict a, char* __restrict r){ for (int i = 0; i < 4…
		RKSimonUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/D127604 - but I need someone to perf test the patch properly. RKSimon: https://reviews.llvm.org/D127604 - but I need someone to perf test the patch properly.
; SSE-LABEL: @add4(		; SSE-LABEL: @add4(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1		; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1
; SSE-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1		; SSE-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1
; SSE-NEXT: [[MUL:%.*]] = mul i8 [[TMP1]], [[TMP0]]		; SSE-NEXT: [[MUL:%.*]] = mul i8 [[TMP1]], [[TMP0]]
; SSE-NEXT: store i8 [[MUL]], ptr [[R]], align 1		; SSE-NEXT: store i8 [[MUL]], ptr [[R]], align 1
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1		; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1
; SSE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1		; SSE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	entry:
%mul.3 = mul i8 %7, %6		%mul.3 = mul i8 %7, %6
store i8 %mul.3, ptr %arrayidx2.3, align 1		store i8 %mul.3, ptr %arrayidx2.3, align 1
ret void		ret void
}		}

define void @add8(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {		define void @add8(ptr noalias nocapture noundef %r, ptr noalias nocapture noundef readonly %a) {
; SSE-LABEL: @add8(		; SSE-LABEL: @add8(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1		; SSE-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[A:%.]], align 1
; SSE-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1		; SSE-NEXT: [[TMP1:%.]] = load <8 x i8>, ptr [[R:%.]], align 1
; SSE-NEXT: [[MUL:%.*]] = mul i8 [[TMP1]], [[TMP0]]		; SSE-NEXT: [[TMP2:%.*]] = mul <8 x i8> [[TMP1]], [[TMP0]]
; SSE-NEXT: store i8 [[MUL]], ptr [[R]], align 1		; SSE-NEXT: store <8 x i8> [[TMP2]], ptr [[R]], align 1
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1
; SSE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; SSE-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[MUL_1:%.*]] = mul i8 [[TMP3]], [[TMP2]]
; SSE-NEXT: store i8 [[MUL_1]], ptr [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; SSE-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 2
; SSE-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX2_2]], align 1
; SSE-NEXT: [[MUL_2:%.*]] = mul i8 [[TMP5]], [[TMP4]]
; SSE-NEXT: store i8 [[MUL_2]], ptr [[ARRAYIDX2_2]], align 1
; SSE-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 3
; SSE-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; SSE-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 3
; SSE-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX2_3]], align 1
; SSE-NEXT: [[MUL_3:%.*]] = mul i8 [[TMP7]], [[TMP6]]
; SSE-NEXT: store i8 [[MUL_3]], ptr [[ARRAYIDX2_3]], align 1
; SSE-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 4
; SSE-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; SSE-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 4
; SSE-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX2_4]], align 1
; SSE-NEXT: [[MUL_4:%.*]] = mul i8 [[TMP9]], [[TMP8]]
; SSE-NEXT: store i8 [[MUL_4]], ptr [[ARRAYIDX2_4]], align 1
; SSE-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 5
; SSE-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; SSE-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 5
; SSE-NEXT: [[TMP11:%.*]] = load i8, ptr [[ARRAYIDX2_5]], align 1
; SSE-NEXT: [[MUL_5:%.*]] = mul i8 [[TMP11]], [[TMP10]]
; SSE-NEXT: store i8 [[MUL_5]], ptr [[ARRAYIDX2_5]], align 1
; SSE-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 6
; SSE-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; SSE-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 6
; SSE-NEXT: [[TMP13:%.*]] = load i8, ptr [[ARRAYIDX2_6]], align 1
; SSE-NEXT: [[MUL_6:%.*]] = mul i8 [[TMP13]], [[TMP12]]
; SSE-NEXT: store i8 [[MUL_6]], ptr [[ARRAYIDX2_6]], align 1
; SSE-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 7
; SSE-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; SSE-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 7
; SSE-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX2_7]], align 1
; SSE-NEXT: [[MUL_7:%.*]] = mul i8 [[TMP15]], [[TMP14]]
; SSE-NEXT: store i8 [[MUL_7]], ptr [[ARRAYIDX2_7]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add8(		; AVX-LABEL: @add8(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i8, ptr [[A:%.]], align 1		; AVX-NEXT: [[TMP0:%.]] = load <8 x i8>, ptr [[A:%.]], align 1
; AVX-NEXT: [[TMP1:%.]] = load i8, ptr [[R:%.]], align 1		; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, ptr [[R:%.]], align 1
; AVX-NEXT: [[MUL:%.*]] = mul i8 [[TMP1]], [[TMP0]]		; AVX-NEXT: [[TMP2:%.*]] = mul <8 x i8> [[TMP1]], [[TMP0]]
; AVX-NEXT: store i8 [[MUL]], ptr [[R]], align 1		; AVX-NEXT: store <8 x i8> [[TMP2]], ptr [[R]], align 1
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 1
; AVX-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX_1]], align 1
; AVX-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 1
; AVX-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX2_1]], align 1
; AVX-NEXT: [[MUL_1:%.*]] = mul i8 [[TMP3]], [[TMP2]]
; AVX-NEXT: store i8 [[MUL_1]], ptr [[ARRAYIDX2_1]], align 1
; AVX-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 2
; AVX-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX_2]], align 1
; AVX-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 2
; AVX-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX2_2]], align 1
; AVX-NEXT: [[MUL_2:%.*]] = mul i8 [[TMP5]], [[TMP4]]
; AVX-NEXT: store i8 [[MUL_2]], ptr [[ARRAYIDX2_2]], align 1
; AVX-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 3
; AVX-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX_3]], align 1
; AVX-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 3
; AVX-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX2_3]], align 1
; AVX-NEXT: [[MUL_3:%.*]] = mul i8 [[TMP7]], [[TMP6]]
; AVX-NEXT: store i8 [[MUL_3]], ptr [[ARRAYIDX2_3]], align 1
; AVX-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 4
; AVX-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_4]], align 1
; AVX-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 4
; AVX-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX2_4]], align 1
; AVX-NEXT: [[MUL_4:%.*]] = mul i8 [[TMP9]], [[TMP8]]
; AVX-NEXT: store i8 [[MUL_4]], ptr [[ARRAYIDX2_4]], align 1
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 5
; AVX-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_5]], align 1
; AVX-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 5
; AVX-NEXT: [[TMP11:%.*]] = load i8, ptr [[ARRAYIDX2_5]], align 1
; AVX-NEXT: [[MUL_5:%.*]] = mul i8 [[TMP11]], [[TMP10]]
; AVX-NEXT: store i8 [[MUL_5]], ptr [[ARRAYIDX2_5]], align 1
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 6
; AVX-NEXT: [[TMP12:%.*]] = load i8, ptr [[ARRAYIDX_6]], align 1
; AVX-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 6
; AVX-NEXT: [[TMP13:%.*]] = load i8, ptr [[ARRAYIDX2_6]], align 1
; AVX-NEXT: [[MUL_6:%.*]] = mul i8 [[TMP13]], [[TMP12]]
; AVX-NEXT: store i8 [[MUL_6]], ptr [[ARRAYIDX2_6]], align 1
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 7
; AVX-NEXT: [[TMP14:%.*]] = load i8, ptr [[ARRAYIDX_7]], align 1
; AVX-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds i8, ptr [[R]], i64 7
; AVX-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX2_7]], align 1
; AVX-NEXT: [[MUL_7:%.*]] = mul i8 [[TMP15]], [[TMP14]]
; AVX-NEXT: store i8 [[MUL_7]], ptr [[ARRAYIDX2_7]], align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
entry:		entry:
%0 = load i8, ptr %a, align 1		%0 = load i8, ptr %a, align 1
%1 = load i8, ptr %r, align 1		%1 = load i8, ptr %r, align 1
%mul = mul i8 %1, %0		%mul = mul i8 %1, %0
store i8 %mul, ptr %r, align 1		store i8 %mul, ptr %r, align 1
%arrayidx.1 = getelementptr inbounds i8, ptr %a, i64 1		%arrayidx.1 = getelementptr inbounds i8, ptr %a, i64 1
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_7zip.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }			%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }
	%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }			%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }

	define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {			define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {
	; CHECK-LABEL: @LzmaDec_DecodeReal2(			; CHECK-LABEL: @LzmaDec_DecodeReal2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4			; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4
	; CHECK-NEXT: [[CODE21_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 [[P]], i64 0, i32 5
	; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]			; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]
	; CHECK: do.body66.i:			; CHECK: do.body66.i:
	; CHECK-NEXT: [[RANGE_2_I:%.]] = phi i32 [ [[RANGE_4_I:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[CODE_2_I:%.]] = phi i32 [ [[CODE_4_I:%.]], [[DO_COND_I]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP0]]
	; CHECK-NEXT: [[DOTRANGE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_2_I]]			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; CHECK-NEXT: [[DOTCODE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_2_I]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> <i32 undef, i32 poison>, i32 [[TMP2]], i32 1
	; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]
	; CHECK: if.else.i:			; CHECK: if.else.i:
	; CHECK-NEXT: [[SUB91_I:%.*]] = sub i32 [[DOTRANGE_2_I]], undef			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], undef
	; CHECK-NEXT: [[SUB92_I:%.*]] = sub i32 [[DOTCODE_2_I]], undef
	; CHECK-NEXT: br label [[DO_COND_I]]			; CHECK-NEXT: br label [[DO_COND_I]]
	; CHECK: do.cond.i:			; CHECK: do.cond.i:
	; CHECK-NEXT: [[RANGE_4_I]] = phi i32 [ [[SUB91_I]], [[IF_ELSE_I]] ], [ undef, [[DO_BODY66_I]] ]			; CHECK-NEXT: [[TMP5]] = phi <2 x i32> [ [[TMP4]], [[IF_ELSE_I]] ], [ [[TMP3]], [[DO_BODY66_I]] ]
	; CHECK-NEXT: [[CODE_4_I]] = phi i32 [ [[SUB92_I]], [[IF_ELSE_I]] ], [ [[DOTCODE_2_I]], [[DO_BODY66_I]] ]
	; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]
	; CHECK: do.end1006.i:			; CHECK: do.end1006.i:
	; CHECK-NEXT: [[DOTRANGE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_4_I]]			; CHECK-NEXT: [[TMP6:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP5]]
	; CHECK-NEXT: [[DOTCODE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_4_I]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[RANGE20_I]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[DOTRANGE_4_I]], i32* [[RANGE20_I]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: store i32 [[DOTCODE_4_I]], i32* [[CODE21_I]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4			%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4
	%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5			%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5
	br label %do.body66.i			br label %do.body66.i

	do.body66.i: ; preds = %do.cond.i, %entry			do.body66.i: ; preds = %do.cond.i, %entry
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }			%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }

	define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {			define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {
	; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(			; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0			; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[NUB5:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO]], i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]
	; CHECK: land.lhs.true.i.1:			; CHECK: land.lhs.true.i.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]			; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
	; CHECK: if.then7.1:			; CHECK: if.then7.1:
	; CHECK-NEXT: [[INC_1:%.*]] = add nsw i32 0, 1			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[INC_1]], i32* [[M_NUMCONSTRAINTROWS4]], align 4			; CHECK-NEXT: store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4
	; CHECK-NEXT: [[DEC_1:%.*]] = add nsw i32 6, -1
	; CHECK-NEXT: store i32 [[DEC_1]], i32* [[NUB5]], align 4
	; CHECK-NEXT: br label [[FOR_INC_1]]			; CHECK-NEXT: br label [[FOR_INC_1]]
	; CHECK: for.inc.1:			; CHECK: for.inc.1:
	; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[DEC_1]], [[IF_THEN7_1]] ], [ 6, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi i32 [ [[INC_1]], [[IF_THEN7_1]] ], [ 0, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 1, i32 -1>
	; CHECK-NEXT: [[INC_2:%.*]] = add nsw i32 [[TMP1]], 1			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[INC_2]], i32* [[M_NUMCONSTRAINTROWS4]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: [[DEC_2:%.*]] = add nsw i32 [[TMP0]], -1
	; CHECK-NEXT: store i32 [[DEC_2]], i32* [[NUB5]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %if.else, label %if.then			br i1 undef, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	ret void			ret void

	Show All 27 Lines
	%class.btVector4.7.32.67.92.117.142.177.187.262.282.331 = type { %class.btVector3.5.30.65.90.115.140.175.185.260.280.330 }			%class.btVector4.7.32.67.92.117.142.177.187.262.282.331 = type { %class.btVector3.5.30.65.90.115.140.175.185.260.280.330 }

	define void @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(%class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* %this) {			define void @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(%class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* %this) {
	; CHECK-LABEL: @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(			; CHECK-LABEL: @_ZN30GIM_TRIANGLE_CALCULATION_CACHE18triangle_collisionERK9btVector3S2_S2_fS2_S2_S2_fR25GIM_TRIANGLE_CONTACT_DATA(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332:%.]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS:%.*]], i64 0, i32 2, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332:%.]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS:%.*]], i64 0, i32 2, i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332 [[THIS]], i64 0, i32 2, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332 [[THIS]], i64 0, i32 2, i64 0, i32 0, i64 2
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX36]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX36]], align 4
	; CHECK-NEXT: [[ADD587:%.*]] = fadd float undef, undef			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> <float undef, float poison>, float [[TMP0]], i32 1
	; CHECK-NEXT: [[SUB600:%.*]] = fsub float [[ADD587]], undef			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> [[TMP1]], undef
	; CHECK-NEXT: store float [[SUB600]], float* undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x float> [[TMP2]], undef
	; CHECK-NEXT: [[SUB613:%.*]] = fsub float [[ADD587]], [[SUB600]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; CHECK-NEXT: store float [[SUB613]], float* [[ARRAYIDX26]], align 4			; CHECK-NEXT: store float [[TMP4]], float* undef, align 4
	; CHECK-NEXT: [[ADD626:%.*]] = fadd float [[TMP0]], undef			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[SUB639:%.*]] = fsub float [[ADD626]], undef			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX26]] to <2 x float>*
	; CHECK-NEXT: [[SUB652:%.*]] = fsub float [[ADD626]], [[SUB639]]			; CHECK-NEXT: store <2 x float> [[TMP5]], <2 x float>* [[TMP6]], align 4
	; CHECK-NEXT: store float [[SUB652]], float* [[ARRAYIDX36]], align 4
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE1609:%.]], label [[IF_THEN1595:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE1609:%.]], label [[IF_THEN1595:%.]]
	; CHECK: if.then1595:			; CHECK: if.then1595:
	; CHECK-NEXT: br i1 undef, label [[RETURN:%.]], label [[FOR_BODY_LR_PH_I_I1702:%.]]			; CHECK-NEXT: br i1 undef, label [[RETURN:%.]], label [[FOR_BODY_LR_PH_I_I1702:%.]]
	; CHECK: for.body.lr.ph.i.i1702:			; CHECK: for.body.lr.ph.i.i1702:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.else1609:			; CHECK: if.else1609:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: return:			; CHECK: return:
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll

	Show All 18 Lines
	; CHECK-NEXT: br label [[FOR_BODY144:%.*]]			; CHECK-NEXT: br label [[FOR_BODY144:%.*]]
	; CHECK: for.body144:			; CHECK: for.body144:
	; CHECK-NEXT: br i1 undef, label [[FOR_END227:%.*]], label [[FOR_BODY144]]			; CHECK-NEXT: br i1 undef, label [[FOR_END227:%.*]], label [[FOR_BODY144]]
	; CHECK: for.end227:			; CHECK: for.end227:
	; CHECK-NEXT: br i1 undef, label [[FOR_END271:%.]], label [[FOR_BODY233:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END271:%.]], label [[FOR_BODY233:%.]]
	; CHECK: for.body233:			; CHECK: for.body233:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY233]], label [[FOR_END271]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY233]], label [[FOR_END271]]
	; CHECK: for.end271:			; CHECK: for.end271:
	; CHECK-NEXT: [[TMP0:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]			; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x float> [ <float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000>, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x float> undef, [[TMP0]]
	; CHECK-NEXT: [[SUB275:%.*]] = fsub float undef, [[TMP1]]
	; CHECK-NEXT: [[SUB279:%.*]] = fsub float undef, [[TMP0]]
	; CHECK-NEXT: br i1 undef, label [[IF_THEN291:%.*]], label [[RETURN]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN291:%.*]], label [[RETURN]]
	; CHECK: if.then291:			; CHECK: if.then291:
	; CHECK-NEXT: [[MUL292:%.*]] = fmul float [[SUB275]], 5.000000e-01			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 5.000000e-01, float 5.000000e-01>
	; CHECK-NEXT: [[ADD294:%.*]] = fadd float [[TMP1]], [[MUL292]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP0]], [[TMP2]]
	; CHECK-NEXT: [[MUL295:%.*]] = fmul float [[SUB279]], 5.000000e-01
	; CHECK-NEXT: [[ADD297:%.*]] = fadd float [[TMP0]], [[MUL295]]
	; CHECK-NEXT: br i1 undef, label [[IF_END332:%.]], label [[IF_ELSE319:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END332:%.]], label [[IF_ELSE319:%.]]
	; CHECK: if.else319:			; CHECK: if.else319:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN325:%.]], label [[IF_END327:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN325:%.]], label [[IF_END327:%.]]
	; CHECK: if.then325:			; CHECK: if.then325:
	; CHECK-NEXT: br label [[IF_END327]]			; CHECK-NEXT: br label [[IF_END327]]
	; CHECK: if.end327:			; CHECK: if.end327:
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> <float poison, float undef>, float [[TMP4]], i32 0
	; CHECK-NEXT: br i1 undef, label [[IF_THEN329:%.*]], label [[IF_END332]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN329:%.*]], label [[IF_END332]]
	; CHECK: if.then329:			; CHECK: if.then329:
	; CHECK-NEXT: br label [[IF_END332]]			; CHECK-NEXT: br label [[IF_END332]]
	; CHECK: if.end332:			; CHECK: if.end332:
	; CHECK-NEXT: [[DX272_1:%.*]] = phi float [ [[SUB275]], [[IF_THEN329]] ], [ [[SUB275]], [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]			; CHECK-NEXT: [[TMP6:%.*]] = phi <2 x float> [ [[TMP5]], [[IF_THEN329]] ], [ [[TMP5]], [[IF_END327]] ], [ <float 0x3F847AE140000000, float 0x3F847AE140000000>, [[IF_THEN291]] ]
	; CHECK-NEXT: [[DY276_1:%.*]] = phi float [ undef, [[IF_THEN329]] ], [ undef, [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]			; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP3]], [[TMP6]]
	; CHECK-NEXT: [[SUB334:%.*]] = fsub float [[ADD294]], [[DX272_1]]
	; CHECK-NEXT: [[SUB338:%.*]] = fsub float [[ADD297]], [[DY276_1]]
	; CHECK-NEXT: [[ARRAYIDX_I_I606:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113:%.]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX_I_I606:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113:%.]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: store float [[SUB334]], float* [[ARRAYIDX_I_I606]], align 4			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[ARRAYIDX_I_I606]] to <2 x float>*
	; CHECK-NEXT: [[ARRAYIDX3_I607:%.]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113 [[VERTICES]], i64 0, i32 0, i64 1			; CHECK-NEXT: store <2 x float> [[TMP7]], <2 x float>* [[TMP8]], align 4
	; CHECK-NEXT: store float [[SUB338]], float* [[ARRAYIDX3_I607]], align 4
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.then17.1:			; CHECK: if.then17.1:
	; CHECK-NEXT: br label [[IF_END22_1]]			; CHECK-NEXT: br label [[IF_END22_1]]
	; CHECK: if.end22.1:			; CHECK: if.end22.1:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN17_2:%.]], label [[IF_END22_2:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN17_2:%.]], label [[IF_END22_2:%.]]
	; CHECK: if.then17.2:			; CHECK: if.then17.2:
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

	Show All 21 Lines
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then17:			; CHECK: if.then17:
	; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]
	; CHECK: land.rhs.lr.ph:			; CHECK: land.rhs.lr.ph:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end98:			; CHECK: if.end98:
	; CHECK-NEXT: [[FROM299:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]
	; CHECK: if.then103:			; CHECK: if.then103:
				; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 0
	; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef			; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef
	; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2			; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2
	; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171 undef, i64 0, i32 0
	; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]			; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[COND125]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[DOTSUB100]], i32 1
	; CHECK-NEXT: br label [[FOR_COND_I:%.*]]			; CHECK-NEXT: br label [[FOR_COND_I:%.*]]
	; CHECK: for.cond.i:			; CHECK: for.cond.i:
	; CHECK-NEXT: [[ROW_0_I:%.]] = phi i32 [ undef, [[LAND_RHS_I874:%.]] ], [ [[DOTSUB100]], [[IF_THEN103]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ undef, [[LAND_RHS_I874:%.]] ], [ [[TMP1]], [[IF_THEN103]] ]
	; CHECK-NEXT: [[COL_0_I:%.*]] = phi i32 [ undef, [[LAND_RHS_I874]] ], [ [[COND125]], [[IF_THEN103]] ]
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]
	; CHECK: land.rhs.i874:			; CHECK: land.rhs.i874:
	; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]			; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]
	; CHECK: for.end.i:			; CHECK: for.end.i:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]
	; CHECK: if.then.i:			; CHECK: if.then.i:
	; CHECK-NEXT: [[ADD14_I:%.*]] = add nsw i32 [[ROW_0_I]], undef			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], undef
	; CHECK-NEXT: [[ADD15_I:%.*]] = add nsw i32 [[COL_0_I]], undef
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]
	; CHECK: if.end.i:			; CHECK: if.end.i:
	; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]			; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]
	; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0			; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0
	; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]			; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]
	; CHECK: for.body28.lr.ph.i:			; CHECK: for.body28.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END33_I]]			; CHECK-NEXT: br label [[FOR_END33_I]]
	; CHECK: for.end33.i:			; CHECK: for.end33.i:
	; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]
	; CHECK: for.body52.lr.ph.i:			; CHECK: for.body52.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END58_I]]			; CHECK-NEXT: br label [[FOR_END58_I]]
	; CHECK: for.end58.i:			; CHECK: for.end58.i:
	; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]			; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]
	; CHECK: while.cond260.i:			; CHECK: while.cond260.i:
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]
	; CHECK: land.rhs263.i:			; CHECK: land.rhs263.i:
	; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]			; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]
	; CHECK: while.end275.i:			; CHECK: while.end275.i:
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]
	; CHECK: extend_bw.exit:			; CHECK: extend_bw.exit:
	; CHECK-NEXT: [[ADD14_I1262:%.*]] = phi i32 [ [[ADD14_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]			; CHECK-NEXT: [[TMP4:%.*]] = phi <2 x i32> [ [[TMP3]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
	; CHECK-NEXT: [[ADD15_I1261:%.*]] = phi i32 [ [[ADD15_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
	; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then157:			; CHECK: if.then157:
	; CHECK-NEXT: [[ADD158:%.*]] = add nsw i32 [[ADD14_I1262]], 1			; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP4]], <i32 1, i32 1>
	; CHECK-NEXT: store i32 [[ADD158]], i32* [[FROM299]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[FROM1115]] to <2 x i32>*
	; CHECK-NEXT: [[ADD160:%.*]] = add nsw i32 [[ADD15_I1261]], 1			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: store i32 [[ADD160]], i32* [[FROM1115]], align 4
	; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]
	; CHECK: land.lhs.true167:			; CHECK: land.lhs.true167:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.inc603:			; CHECK: for.inc603:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]
	; CHECK: for.end605:			; CHECK: for.end605:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: return:			; CHECK: return:
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; CHECK-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; CHECK-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; CHECK-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; CHECK-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; CHECK-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f64_8i8() #0 {		define void @fptosi_8f64_8i8() #0 {
; CHECK-LABEL: @fptosi_8f64_8i8(		; CHECK-LABEL: @fptosi_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; CHECK-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; CHECK-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptosi double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptosi_8f32_8i8() #0 {		define void @fptosi_8f32_8i8() #0 {
; CHECK-LABEL: @fptosi_8f32_8i8(		; CHECK-LABEL: @fptosi_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; CHECK-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptosi float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptosi float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptosi float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptosi float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptosi float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptosi float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptosi float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptosi float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f64_8i8() #0 {		define void @fptoui_8f64_8i8() #0 {
; CHECK-LABEL: @fptoui_8f64_8i8(		; CHECK-LABEL: @fptoui_8f64_8i8(
; CHECK-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		; CHECK-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; CHECK-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		; CHECK-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
; CHECK-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
; CHECK-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
; CHECK-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
; CHECK-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
; CHECK-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8		%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	;
store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2		store i16 %cvt5, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 5), align 2
store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2		store i16 %cvt6, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 6), align 2
store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2		store i16 %cvt7, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @dst16, i32 0, i64 7), align 2
ret void		ret void
}		}

define void @fptoui_8f32_8i8() #0 {		define void @fptoui_8f32_8i8() #0 {
; CHECK-LABEL: @fptoui_8f32_8i8(		; CHECK-LABEL: @fptoui_8f32_8i8(
; CHECK-NEXT: [[A0:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; CHECK-NEXT: [[A1:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		; CHECK-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
; CHECK-NEXT: [[A2:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
; CHECK-NEXT: [[A3:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
; CHECK-NEXT: [[A4:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
; CHECK-NEXT: [[A5:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
; CHECK-NEXT: [[A6:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
; CHECK-NEXT: [[A7:%.]] = load float, float getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
; CHECK-NEXT: [[CVT0:%.*]] = fptoui float [[A0]] to i8
; CHECK-NEXT: [[CVT1:%.*]] = fptoui float [[A1]] to i8
; CHECK-NEXT: [[CVT2:%.*]] = fptoui float [[A2]] to i8
; CHECK-NEXT: [[CVT3:%.*]] = fptoui float [[A3]] to i8
; CHECK-NEXT: [[CVT4:%.*]] = fptoui float [[A4]] to i8
; CHECK-NEXT: [[CVT5:%.*]] = fptoui float [[A5]] to i8
; CHECK-NEXT: [[CVT6:%.*]] = fptoui float [[A6]] to i8
; CHECK-NEXT: [[CVT7:%.*]] = fptoui float [[A7]] to i8
; CHECK-NEXT: store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
; CHECK-NEXT: store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
; CHECK-NEXT: store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
; CHECK-NEXT: store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
; CHECK-NEXT: store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
; CHECK-NEXT: store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
; CHECK-NEXT: store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
; CHECK-NEXT: store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4		%a5 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	;
%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6		%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6
%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7		%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> undef, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[ADD01:%.*]] = fadd float [[X0]], [[X1]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; CHECK-NEXT: store float [[ADD01]], float* [[P:%.*]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[F]], i64 2		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x float> [[F]], i64 3
; CHECK-NEXT: [[ADD23:%.*]] = fadd float [[X2]], [[X3]]
; CHECK-NEXT: [[P23:%.]] = getelementptr inbounds float, float [[P]], i64 1
; CHECK-NEXT: store float [[ADD23]], float* [[P23]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%add01 = fadd float %x0, %x1		%add01 = fadd float %x0, %x1
store float %add01, float* %p, align 4		store float %add01, float* %p, align 4
%x2 = extractelement <4 x float> %f, i64 2		%x2 = extractelement <4 x float> %f, i64 2
%x3 = extractelement <4 x float> %f, i64 3		%x3 = extractelement <4 x float> %f, i64 3
▲ Show 20 Lines • Show All 299 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	;
%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6		%r06 = insertelement <8 x i16> %r05, i16 %r6, i32 6
%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7		%r07 = insertelement <8 x i16> %r06, i16 %r7, i32 7
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[F:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[F]], <4 x float> undef, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[ADD01:%.*]] = fadd float [[X0]], [[X1]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; CHECK-NEXT: store float [[ADD01]], float* [[P:%.*]], align 4		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[P:%.]] to <2 x float>
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[F]], i64 2		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x float> [[F]], i64 3
; CHECK-NEXT: [[ADD23:%.*]] = fadd float [[X2]], [[X3]]
; CHECK-NEXT: [[P23:%.]] = getelementptr inbounds float, float [[P]], i64 1
; CHECK-NEXT: store float [[ADD23]], float* [[P23]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%add01 = fadd float %x0, %x1		%add01 = fadd float %x0, %x1
store float %add01, float* %p, align 4		store float %add01, float* %p, align 4
%x2 = extractelement <4 x float> %f, i64 2		%x2 = extractelement <4 x float> %f, i64 2
%x3 = extractelement <4 x float> %f, i64 3		%x3 = extractelement <4 x float> %f, i64 3
▲ Show 20 Lines • Show All 299 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-after-bundle.ll

	Show All 16 Lines
	entry:			entry:
	%cmp = icmp slt i32 %x, %y			%cmp = icmp slt i32 %x, %y
	%b.a = select i1 %cmp, i32 %b, i32 %a			%b.a = select i1 %cmp, i32 %b, i32 %a
	%retval.0 = trunc i32 %b.a to i8			%retval.0 = trunc i32 %b.a to i8
	ret i8 %retval.0			ret i8 %retval.0
	}			}

	define void @bar(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture readonly %c, i8* noalias nocapture readonly %d, i8* noalias nocapture %e, i32 %w) local_unnamed_addr #1 {			define void @bar(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture readonly %c, i8* noalias nocapture readonly %d, i8* noalias nocapture %e, i32 %w) local_unnamed_addr #1 {
	; SSE-LABEL: @bar(			; CHECK-LABEL: @bar(
	; SSE-NEXT: entry:			; CHECK-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[W:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <16 x i32> poison, i32 [[W:%.]], i32 0
	; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[W]], i32 0			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK: for.body:
	; SSE-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[W]], i32 0			; CHECK-NEXT: [[I_0356:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; SSE-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[A_ADDR_0355:%.]] = phi i8 [ [[A:%.]], [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[W]], i32 0			; CHECK-NEXT: [[E_ADDR_0354:%.]] = phi i8 [ [[E:%.]], [[ENTRY]] ], [ [[ADD_PTR192:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[SHUFFLE3:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[D_ADDR_0353:%.]] = phi i8 [ [[D:%.]], [[ENTRY]] ], [ [[ADD_PTR191:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: [[C_ADDR_0352:%.]] = phi i8 [ [[C:%.]], [[ENTRY]] ], [ [[ADD_PTR190:%.]], [[FOR_BODY]] ]
	; SSE: for.body:			; CHECK-NEXT: [[B_ADDR_0351:%.]] = phi i8 [ [[B:%.]], [[ENTRY]] ], [ [[ADD_PTR189:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[I_0356:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[C_ADDR_0352]] to <16 x i8>*
	; SSE-NEXT: [[A_ADDR_0355:%.]] = phi i8 [ [[A:%.]], [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; SSE-NEXT: [[E_ADDR_0354:%.]] = phi i8 [ [[E:%.]], [[ENTRY]] ], [ [[ADD_PTR192:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[D_ADDR_0353]] to <16 x i8>*
	; SSE-NEXT: [[D_ADDR_0353:%.]] = phi i8 [ [[D:%.]], [[ENTRY]] ], [ [[ADD_PTR191:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
	; SSE-NEXT: [[C_ADDR_0352:%.]] = phi i8 [ [[C:%.]], [[ENTRY]] ], [ [[ADD_PTR190:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[A_ADDR_0355]] to <16 x i8>*
	; SSE-NEXT: [[B_ADDR_0351:%.]] = phi i8 [ [[B:%.]], [[ENTRY]] ], [ [[ADD_PTR189:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> [[TMP5]], align 1
	; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[C_ADDR_0352]] to <4 x i8>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[B_ADDR_0351]] to <16 x i8>*
	; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1			; CHECK-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1
	; SSE-NEXT: [[TMP6:%.]] = bitcast i8 [[D_ADDR_0353]] to <4 x i8>*			; CHECK-NEXT: [[TMP9:%.*]] = icmp ult <16 x i8> [[TMP2]], [[TMP4]]
	; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> [[TMP6]], align 1			; CHECK-NEXT: [[TMP10:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP8]], <16 x i8> [[TMP6]]
	; SSE-NEXT: [[TMP8:%.]] = bitcast i8 [[A_ADDR_0355]] to <4 x i8>*			; CHECK-NEXT: [[TMP11:%.*]] = zext <16 x i8> [[TMP10]] to <16 x i32>
	; SSE-NEXT: [[TMP9:%.]] = load <4 x i8>, <4 x i8> [[TMP8]], align 1			; CHECK-NEXT: [[TMP12:%.*]] = mul <16 x i32> [[TMP11]], [[SHUFFLE]]
	; SSE-NEXT: [[TMP10:%.]] = bitcast i8 [[B_ADDR_0351]] to <4 x i8>*			; CHECK-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>
	; SSE-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1			; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[E_ADDR_0354]] to <16 x i8>*
	; SSE-NEXT: [[TMP12:%.*]] = icmp ult <4 x i8> [[TMP5]], [[TMP7]]			; CHECK-NEXT: store <16 x i8> [[TMP13]], <16 x i8>* [[TMP14]], align 1
	; SSE-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i8> [[TMP11]], <4 x i8> [[TMP9]]			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1
	; SSE-NEXT: [[TMP14:%.*]] = zext <4 x i8> [[TMP13]] to <4 x i32>			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16
	; SSE-NEXT: [[TMP15:%.*]] = mul <4 x i32> [[TMP14]], [[SHUFFLE]]			; CHECK-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16
	; SSE-NEXT: [[TMP16:%.*]] = trunc <4 x i32> [[TMP15]] to <4 x i8>			; CHECK-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16
	; SSE-NEXT: [[TMP17:%.]] = bitcast i8 [[E_ADDR_0354]] to <4 x i8>*			; CHECK-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16
	; SSE-NEXT: store <4 x i8> [[TMP16]], <4 x i8>* [[TMP17]], align 1			; CHECK-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16
	; SSE-NEXT: [[ARRAYIDX45:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 4			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 8
	; SSE-NEXT: [[ARRAYIDX47:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 4			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; SSE-NEXT: [[ARRAYIDX49:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 4			; CHECK: for.end:
	; SSE-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 4			; CHECK-NEXT: ret void
	; SSE-NEXT: [[ARRAYIDX56:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 4
	; SSE-NEXT: [[TMP18:%.]] = bitcast i8 [[ARRAYIDX45]] to <4 x i8>*
	; SSE-NEXT: [[TMP19:%.]] = load <4 x i8>, <4 x i8> [[TMP18]], align 1
	; SSE-NEXT: [[TMP20:%.]] = bitcast i8 [[ARRAYIDX47]] to <4 x i8>*
	; SSE-NEXT: [[TMP21:%.]] = load <4 x i8>, <4 x i8> [[TMP20]], align 1
	; SSE-NEXT: [[TMP22:%.]] = bitcast i8 [[ARRAYIDX49]] to <4 x i8>*
	; SSE-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1
	; SSE-NEXT: [[TMP24:%.]] = bitcast i8 [[ARRAYIDX52]] to <4 x i8>*
	; SSE-NEXT: [[TMP25:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1
	; SSE-NEXT: [[TMP26:%.*]] = icmp ult <4 x i8> [[TMP19]], [[TMP21]]
	; SSE-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> [[TMP25]], <4 x i8> [[TMP23]]
	; SSE-NEXT: [[TMP28:%.*]] = zext <4 x i8> [[TMP27]] to <4 x i32>
	; SSE-NEXT: [[TMP29:%.*]] = mul <4 x i32> [[TMP28]], [[SHUFFLE1]]
	; SSE-NEXT: [[TMP30:%.*]] = trunc <4 x i32> [[TMP29]] to <4 x i8>
	; SSE-NEXT: [[TMP31:%.]] = bitcast i8 [[ARRAYIDX56]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP30]], <4 x i8>* [[TMP31]], align 1
	; SSE-NEXT: [[ARRAYIDX93:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 8
	; SSE-NEXT: [[ARRAYIDX95:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 8
	; SSE-NEXT: [[ARRAYIDX97:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 8
	; SSE-NEXT: [[ARRAYIDX100:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 8
	; SSE-NEXT: [[ARRAYIDX104:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 8
	; SSE-NEXT: [[TMP32:%.]] = bitcast i8 [[ARRAYIDX93]] to <4 x i8>*
	; SSE-NEXT: [[TMP33:%.]] = load <4 x i8>, <4 x i8> [[TMP32]], align 1
	; SSE-NEXT: [[TMP34:%.]] = bitcast i8 [[ARRAYIDX95]] to <4 x i8>*
	; SSE-NEXT: [[TMP35:%.]] = load <4 x i8>, <4 x i8> [[TMP34]], align 1
	; SSE-NEXT: [[TMP36:%.]] = bitcast i8 [[ARRAYIDX97]] to <4 x i8>*
	; SSE-NEXT: [[TMP37:%.]] = load <4 x i8>, <4 x i8> [[TMP36]], align 1
	; SSE-NEXT: [[TMP38:%.]] = bitcast i8 [[ARRAYIDX100]] to <4 x i8>*
	; SSE-NEXT: [[TMP39:%.]] = load <4 x i8>, <4 x i8> [[TMP38]], align 1
	; SSE-NEXT: [[TMP40:%.*]] = icmp ult <4 x i8> [[TMP33]], [[TMP35]]
	; SSE-NEXT: [[TMP41:%.*]] = select <4 x i1> [[TMP40]], <4 x i8> [[TMP39]], <4 x i8> [[TMP37]]
	; SSE-NEXT: [[TMP42:%.*]] = zext <4 x i8> [[TMP41]] to <4 x i32>
	; SSE-NEXT: [[TMP43:%.*]] = mul <4 x i32> [[TMP42]], [[SHUFFLE2]]
	; SSE-NEXT: [[TMP44:%.*]] = trunc <4 x i32> [[TMP43]] to <4 x i8>
	; SSE-NEXT: [[TMP45:%.]] = bitcast i8 [[ARRAYIDX104]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP44]], <4 x i8>* [[TMP45]], align 1
	; SSE-NEXT: [[ARRAYIDX141:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 12
	; SSE-NEXT: [[ARRAYIDX143:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 12
	; SSE-NEXT: [[ARRAYIDX145:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 12
	; SSE-NEXT: [[ARRAYIDX148:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 12
	; SSE-NEXT: [[ARRAYIDX152:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 12
	; SSE-NEXT: [[TMP46:%.]] = bitcast i8 [[ARRAYIDX141]] to <4 x i8>*
	; SSE-NEXT: [[TMP47:%.]] = load <4 x i8>, <4 x i8> [[TMP46]], align 1
	; SSE-NEXT: [[TMP48:%.]] = bitcast i8 [[ARRAYIDX143]] to <4 x i8>*
	; SSE-NEXT: [[TMP49:%.]] = load <4 x i8>, <4 x i8> [[TMP48]], align 1
	; SSE-NEXT: [[TMP50:%.]] = bitcast i8 [[ARRAYIDX145]] to <4 x i8>*
	; SSE-NEXT: [[TMP51:%.]] = load <4 x i8>, <4 x i8> [[TMP50]], align 1
	; SSE-NEXT: [[TMP52:%.]] = bitcast i8 [[ARRAYIDX148]] to <4 x i8>*
	; SSE-NEXT: [[TMP53:%.]] = load <4 x i8>, <4 x i8> [[TMP52]], align 1
	; SSE-NEXT: [[TMP54:%.*]] = icmp ult <4 x i8> [[TMP47]], [[TMP49]]
	; SSE-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP54]], <4 x i8> [[TMP53]], <4 x i8> [[TMP51]]
	; SSE-NEXT: [[TMP56:%.*]] = zext <4 x i8> [[TMP55]] to <4 x i32>
	; SSE-NEXT: [[TMP57:%.*]] = mul <4 x i32> [[TMP56]], [[SHUFFLE3]]
	; SSE-NEXT: [[TMP58:%.*]] = trunc <4 x i32> [[TMP57]] to <4 x i8>
	; SSE-NEXT: [[TMP59:%.]] = bitcast i8 [[ARRAYIDX152]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP58]], <4 x i8>* [[TMP59]], align 1
	; SSE-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1
	; SSE-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16
	; SSE-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16
	; SSE-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16
	; SSE-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16
	; SSE-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16
	; SSE-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 8
	; SSE-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; SSE: for.end:
	; SSE-NEXT: ret void
	;
	; AVX512-LABEL: @bar(
	; AVX512-NEXT: entry:
	; AVX512-NEXT: [[TMP0:%.]] = insertelement <16 x i32> poison, i32 [[W:%.]], i32 0
	; AVX512-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer
	; AVX512-NEXT: br label [[FOR_BODY:%.*]]
	; AVX512: for.body:
	; AVX512-NEXT: [[I_0356:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[A_ADDR_0355:%.]] = phi i8 [ [[A:%.]], [[ENTRY]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[E_ADDR_0354:%.]] = phi i8 [ [[E:%.]], [[ENTRY]] ], [ [[ADD_PTR192:%.]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[D_ADDR_0353:%.]] = phi i8 [ [[D:%.]], [[ENTRY]] ], [ [[ADD_PTR191:%.]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[C_ADDR_0352:%.]] = phi i8 [ [[C:%.]], [[ENTRY]] ], [ [[ADD_PTR190:%.]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[B_ADDR_0351:%.]] = phi i8 [ [[B:%.]], [[ENTRY]] ], [ [[ADD_PTR189:%.]], [[FOR_BODY]] ]
	; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[C_ADDR_0352]] to <16 x i8>*
	; AVX512-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[D_ADDR_0353]] to <16 x i8>*
	; AVX512-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
	; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[A_ADDR_0355]] to <16 x i8>*
	; AVX512-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> [[TMP5]], align 1
	; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[B_ADDR_0351]] to <16 x i8>*
	; AVX512-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1
	; AVX512-NEXT: [[TMP9:%.*]] = icmp ult <16 x i8> [[TMP2]], [[TMP4]]
	; AVX512-NEXT: [[TMP10:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP8]], <16 x i8> [[TMP6]]
	; AVX512-NEXT: [[TMP11:%.*]] = zext <16 x i8> [[TMP10]] to <16 x i32>
	; AVX512-NEXT: [[TMP12:%.*]] = mul <16 x i32> [[TMP11]], [[SHUFFLE]]
	; AVX512-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>
	; AVX512-NEXT: [[TMP14:%.]] = bitcast i8 [[E_ADDR_0354]] to <16 x i8>*
	; AVX512-NEXT: store <16 x i8> [[TMP13]], <16 x i8>* [[TMP14]], align 1
	; AVX512-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1
	; AVX512-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16
	; AVX512-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16
	; AVX512-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16
	; AVX512-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16
	; AVX512-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16
	; AVX512-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 8
	; AVX512-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX512: for.end:
	; AVX512-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.0356 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.0356 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%a.addr.0355 = phi i8* [ %a, %entry ], [ %add.ptr, %for.body ]			%a.addr.0355 = phi i8* [ %a, %entry ], [ %add.ptr, %for.body ]
	%e.addr.0354 = phi i8* [ %e, %entry ], [ %add.ptr192, %for.body ]			%e.addr.0354 = phi i8* [ %e, %entry ], [ %add.ptr192, %for.body ]
	▲ Show 20 Lines • Show All 578 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: [[A_8:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 8			; CHECK-NEXT: [[A_8:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 8
	; CHECK-NEXT: store float 0.000000e+00, float* [[A_8]], align 4			; CHECK-NEXT: store float 0.000000e+00, float* [[A_8]], align 4
	; CHECK-NEXT: [[L6:%.]] = load float, float [[B_14]], align 4			; CHECK-NEXT: [[L6:%.]] = load float, float [[B_14]], align 4
	; CHECK-NEXT: [[A_5:%.]] = getelementptr inbounds float, float [[A]], i64 5			; CHECK-NEXT: [[A_5:%.]] = getelementptr inbounds float, float [[A]], i64 5
	; CHECK-NEXT: store float [[L6]], float* [[A_5]], align 4			; CHECK-NEXT: store float [[L6]], float* [[A_5]], align 4
	; CHECK-NEXT: [[A_6:%.]] = getelementptr inbounds float, float [[A]], i64 6			; CHECK-NEXT: [[A_6:%.]] = getelementptr inbounds float, float [[A]], i64 6
	; CHECK-NEXT: store float 0.000000e+00, float* [[A_6]], align 4			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A_6]] to <2 x float>*
	; CHECK-NEXT: [[A_7:%.]] = getelementptr inbounds float, float [[A]], i64 7			; CHECK-NEXT: store <2 x float> zeroinitializer, <2 x float>* [[TMP0]], align 4
	; CHECK-NEXT: store float 0.000000e+00, float* [[A_7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%b.10 = getelementptr inbounds float, float* %b, i64 10			%b.10 = getelementptr inbounds float, float* %b, i64 10
	%b.14 = getelementptr inbounds float, float* %b, i64 14			%b.14 = getelementptr inbounds float, float* %b, i64 14
	br i1 %c, label %then, label %else			br i1 %c, label %then, label %else

	else:			else:
	▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; CHECK: bb22:			; CHECK: bb22:
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds float, float [[ARG4:%.*]], i32 0			; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds float, float [[ARG4:%.*]], i32 0
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP23]], i64 7			; CHECK-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP23]], i64 7
	; CHECK-NEXT: br i1 [[C_2:%.]], label [[BB25:%.]], label [[BB22]]			; CHECK-NEXT: br i1 [[C_2:%.]], label [[BB25:%.]], label [[BB22]]
	; CHECK: bb25:			; CHECK: bb25:
	; CHECK-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP23]], i64 6			; CHECK-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP23]], i64 6
	; CHECK-NEXT: store float 0.000000e+00, float* [[TMP24]], align 4			; CHECK-NEXT: store float 0.000000e+00, float* [[TMP24]], align 4
	; CHECK-NEXT: [[TMP27:%.]] = load float, float [[ARG5:%.*]], align 4			; CHECK-NEXT: [[TMP27:%.]] = load float, float [[ARG5:%.*]], align 4
	; CHECK-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP23]], i64 5
	; CHECK-NEXT: [[TMP29:%.*]] = fadd float 0.000000e+00, 0.000000e+00			; CHECK-NEXT: [[TMP29:%.*]] = fadd float 0.000000e+00, 0.000000e+00
	; CHECK-NEXT: store float 0.000000e+00, float* [[TMP26]], align 4			; CHECK-NEXT: store float 0.000000e+00, float* [[TMP26]], align 4
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP23]], i64 4			; CHECK-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP23]], i64 4
	; CHECK-NEXT: store float 0.000000e+00, float* [[TMP28]], align 4
	; CHECK-NEXT: [[TMP31:%.*]] = fadd float 0.000000e+00, 0.000000e+00			; CHECK-NEXT: [[TMP31:%.*]] = fadd float 0.000000e+00, 0.000000e+00
	; CHECK-NEXT: store float 0.000000e+00, float* [[TMP30]], align 4			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP30]] to <2 x float>*
				; CHECK-NEXT: store <2 x float> zeroinitializer, <2 x float>* [[TMP5]], align 4
	; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[ARG4]], i32 0			; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[ARG4]], i32 0
	; CHECK-NEXT: br label [[BB33:%.*]]			; CHECK-NEXT: br label [[BB33:%.*]]
	; CHECK: bb33:			; CHECK: bb33:
	; CHECK-NEXT: br label [[BB34:%.*]]			; CHECK-NEXT: br label [[BB34:%.*]]
	; CHECK: bb34:			; CHECK: bb34:
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr float, float [[TMP32]], i64 3			; CHECK-NEXT: [[TMP35:%.]] = getelementptr float, float [[TMP32]], i64 3
	; CHECK-NEXT: [[TMP36:%.]] = getelementptr float, float [[TMP32]], i64 2
	; CHECK-NEXT: [[TMP37:%.]] = load float, float [[TMP35]], align 4			; CHECK-NEXT: [[TMP37:%.]] = load float, float [[TMP35]], align 4
	; CHECK-NEXT: [[TMP38:%.*]] = fadd float 0.000000e+00, [[TMP37]]			; CHECK-NEXT: [[TMP38:%.*]] = fadd float 0.000000e+00, [[TMP37]]
	; CHECK-NEXT: store float [[TMP38]], float* [[TMP35]], align 4			; CHECK-NEXT: store float [[TMP38]], float* [[TMP35]], align 4
	; CHECK-NEXT: [[TMP39:%.]] = getelementptr float, float [[TMP32]], i64 1			; CHECK-NEXT: [[TMP39:%.]] = getelementptr float, float [[TMP32]], i64 1
	; CHECK-NEXT: [[TMP40:%.]] = load float, float [[TMP36]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP39]] to <2 x float>*
	; CHECK-NEXT: [[TMP41:%.*]] = fadd float 0.000000e+00, [[TMP40]]			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> [[TMP6]], align 4
	; CHECK-NEXT: store float [[TMP41]], float* [[TMP36]], align 4			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> zeroinitializer, [[TMP7]]
	; CHECK-NEXT: [[TMP42:%.]] = load float, float [[TMP39]], align 4			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP39]] to <2 x float>*
	; CHECK-NEXT: [[TMP43:%.*]] = fadd float 0.000000e+00, [[TMP42]]			; CHECK-NEXT: store <2 x float> [[TMP8]], <2 x float>* [[TMP9]], align 4
	; CHECK-NEXT: store float [[TMP43]], float* [[TMP39]], align 4
	; CHECK-NEXT: [[TMP44:%.]] = load float, float [[ARG3:%.*]], align 4			; CHECK-NEXT: [[TMP44:%.]] = load float, float [[ARG3:%.*]], align 4
	; CHECK-NEXT: [[TMP45:%.]] = load float, float [[TMP32]], align 4			; CHECK-NEXT: [[TMP45:%.]] = load float, float [[TMP32]], align 4
	; CHECK-NEXT: [[TMP46:%.*]] = fadd float 0.000000e+00, [[TMP45]]			; CHECK-NEXT: [[TMP46:%.*]] = fadd float 0.000000e+00, [[TMP45]]
	; CHECK-NEXT: store float [[TMP46]], float* [[TMP32]], align 4			; CHECK-NEXT: store float [[TMP46]], float* [[TMP32]], align 4
	; CHECK-NEXT: call void @quux()			; CHECK-NEXT: call void @quux()
	; CHECK-NEXT: br label [[BB47:%.*]]			; CHECK-NEXT: br label [[BB47:%.*]]
	; CHECK: bb47:			; CHECK: bb47:
	; CHECK-NEXT: br label [[BB17]]			; CHECK-NEXT: br label [[BB17]]
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-threshold=-200 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-threshold=-200 -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -S \| FileCheck %s

	define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {			define void @test_add_sdiv(i32 %arr1, i32 %arr2, i32 %a0, i32 %a1, i32 %a2, i32 %a3) {
	; CHECK-LABEL: @test_add_sdiv(			; CHECK-LABEL: @test_add_sdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0			; CHECK-NEXT: [[GEP1_0:%.]] = getelementptr i32, i32 [[ARR1:%.*]], i32 0
	; CHECK-NEXT: [[GEP1_1:%.]] = getelementptr i32, i32 [[ARR1]], i32 1
	; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2			; CHECK-NEXT: [[GEP1_2:%.]] = getelementptr i32, i32 [[ARR1]], i32 2
	; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3			; CHECK-NEXT: [[GEP1_3:%.]] = getelementptr i32, i32 [[ARR1]], i32 3
	; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0			; CHECK-NEXT: [[GEP2_0:%.]] = getelementptr i32, i32 [[ARR2:%.*]], i32 0
	; CHECK-NEXT: [[GEP2_1:%.]] = getelementptr i32, i32 [[ARR2]], i32 1
	; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2			; CHECK-NEXT: [[GEP2_2:%.]] = getelementptr i32, i32 [[ARR2]], i32 2
	; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3			; CHECK-NEXT: [[GEP2_3:%.]] = getelementptr i32, i32 [[ARR2]], i32 3
	; CHECK-NEXT: [[V0:%.]] = load i32, i32 [[GEP1_0]], align 4
	; CHECK-NEXT: [[V1:%.]] = load i32, i32 [[GEP1_1]], align 4
	; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]], align 4			; CHECK-NEXT: [[V2:%.]] = load i32, i32 [[GEP1_2]], align 4
	; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]], align 4			; CHECK-NEXT: [[V3:%.]] = load i32, i32 [[GEP1_3]], align 4
	; CHECK-NEXT: [[Y0:%.]] = add nsw i32 [[A0:%.]], 1146			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[A0:%.]], i32 0
	; CHECK-NEXT: [[Y1:%.]] = add nsw i32 [[A1:%.]], 146			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[A1:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 1146, i32 146>
	; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42			; CHECK-NEXT: [[Y2:%.]] = add nsw i32 [[A2:%.]], 42
	; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0			; CHECK-NEXT: [[Y3:%.]] = add nsw i32 [[A3:%.]], 0
	; CHECK-NEXT: [[RES0:%.*]] = add nsw i32 [[V0]], [[Y0]]
	; CHECK-NEXT: [[RES1:%.*]] = add nsw i32 [[V1]], [[Y1]]
	; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]			; CHECK-NEXT: [[RES2:%.*]] = sdiv i32 [[V2]], [[Y2]]
	; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]			; CHECK-NEXT: [[RES3:%.*]] = add nsw i32 [[V3]], [[Y3]]
	; CHECK-NEXT: store i32 [[RES0]], i32* [[GEP2_0]], align 4			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[GEP1_0]] to <2 x i32>*
	; CHECK-NEXT: store i32 [[RES1]], i32* [[GEP2_1]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP4]], [[TMP2]]
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP2_0]] to <2 x i32>*
				; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]], align 4			; CHECK-NEXT: store i32 [[RES2]], i32* [[GEP2_2]], align 4
	; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]], align 4			; CHECK-NEXT: store i32 [[RES3]], i32* [[GEP2_3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep1.0 = getelementptr i32, i32* %arr1, i32 0			%gep1.0 = getelementptr i32, i32* %arr1, i32 0
	%gep1.1 = getelementptr i32, i32* %arr1, i32 1			%gep1.1 = getelementptr i32, i32* %arr1, i32 1
	%gep1.2 = getelementptr i32, i32* %arr1, i32 2			%gep1.2 = getelementptr i32, i32* %arr1, i32 2
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/odd_store.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
ret i32 undef		ret i32 undef
}		}

; PR41892		; PR41892
define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v2f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v2f32_store(		; CHECK-LABEL: @test_v4f32_v2f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1		; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[X0]], i32 0
; CHECK-NEXT: store float [[X0]], float* [[P]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[X1]], i32 1
; CHECK-NEXT: store float [[X1]], float* [[P1]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P:%.]] to <2 x float>
		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
store float %x0, float* %p, align 4		store float %x0, float* %p, align 4
store float %x1, float* %p1, align 4		store float %x1, float* %p1, align 4
ret void		ret void
Show All 14 Lines	;
ret void		ret void
}		}

define void @test_v4f32_v3f32_store(<4 x float> %f, float* %p){		define void @test_v4f32_v3f32_store(<4 x float> %f, float* %p){
; CHECK-LABEL: @test_v4f32_v3f32_store(		; CHECK-LABEL: @test_v4f32_v3f32_store(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0		; CHECK-NEXT: [[X0:%.]] = extractelement <4 x float> [[F:%.]], i64 0
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1		; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x float> [[F]], i64 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[F]], i64 2		; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x float> [[F]], i64 2
; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 2
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[X0]], i32 0
; CHECK-NEXT: store float [[X0]], float* [[P]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[X1]], i32 1
; CHECK-NEXT: store float [[X1]], float* [[P1]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: store float [[X2]], float* [[P2]], align 4		; CHECK-NEXT: store float [[X2]], float* [[P2]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%x0 = extractelement <4 x float> %f, i64 0		%x0 = extractelement <4 x float> %f, i64 0
%x1 = extractelement <4 x float> %f, i64 1		%x1 = extractelement <4 x float> %f, i64 1
%x2 = extractelement <4 x float> %f, i64 2		%x2 = extractelement <4 x float> %f, i64 2
%p1 = getelementptr inbounds float, float* %p, i64 1		%p1 = getelementptr inbounds float, float* %p, i64 1
%p2 = getelementptr inbounds float, float* %p, i64 2		%p2 = getelementptr inbounds float, float* %p, i64 2
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr49933.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-- -mcpu=skylake-avx512 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-- -mcpu=skylake-avx512 \| FileCheck %s
	; These code should be fully vectorized by D57059 patch

				xbolva00Unsubmitted Not Done Reply Inline Actions Remove please. xbolva00: Remove please.
	define void @foo(i8* noalias nocapture %t0, i8* noalias nocapture readonly %t1) {			define void @foo(i8* noalias nocapture %t0, i8* noalias nocapture readonly %t1) {
				xbolva00Unsubmitted Not Done Reply Inline Actions Great! This patch also fixes https://github.com/llvm/llvm-project/issues/49277 xbolva00: Great! This patch also fixes https://github.com/llvm/llvm-project/issues/49277
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[T3:%.]] = load i8, i8 [[T1:%.*]], align 1, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[T1:%.]] to <8 x i8>
	; CHECK-NEXT: [[T4:%.*]] = icmp ult i8 [[T3]], 64			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[T5:%.*]] = sub i8 0, [[T3]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp ult <8 x i8> [[TMP2]], <i8 64, i8 64, i8 64, i8 64, i8 64, i8 64, i8 64, i8 64>
	; CHECK-NEXT: [[T6:%.*]] = select i1 [[T4]], i8 [[T3]], i8 [[T5]]			; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i8> zeroinitializer, [[TMP2]]
	; CHECK-NEXT: store i8 [[T6]], i8* [[T0:%.*]], align 1, !tbaa [[TBAA0]]			; CHECK-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP3]], <8 x i8> [[TMP2]], <8 x i8> [[TMP4]]
	; CHECK-NEXT: [[T7:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 1			; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[T0:%.]] to <8 x i8>
	; CHECK-NEXT: [[T8:%.]] = load i8, i8 [[T7]], align 1, !tbaa [[TBAA0]]			; CHECK-NEXT: store <8 x i8> [[TMP5]], <8 x i8>* [[TMP6]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T9:%.*]] = icmp ult i8 [[T8]], 64
	; CHECK-NEXT: [[T10:%.*]] = sub i8 0, [[T8]]
	; CHECK-NEXT: [[T11:%.*]] = select i1 [[T9]], i8 [[T8]], i8 [[T10]]
	; CHECK-NEXT: [[T12:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 1
	; CHECK-NEXT: store i8 [[T11]], i8* [[T12]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T13:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 2
	; CHECK-NEXT: [[T14:%.]] = load i8, i8 [[T13]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T15:%.*]] = icmp ult i8 [[T14]], 64
	; CHECK-NEXT: [[T16:%.*]] = sub i8 0, [[T14]]
	; CHECK-NEXT: [[T17:%.*]] = select i1 [[T15]], i8 [[T14]], i8 [[T16]]
	; CHECK-NEXT: [[T18:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 2
	; CHECK-NEXT: store i8 [[T17]], i8* [[T18]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T19:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 3
	; CHECK-NEXT: [[T20:%.]] = load i8, i8 [[T19]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T21:%.*]] = icmp ult i8 [[T20]], 64
	; CHECK-NEXT: [[T22:%.*]] = sub i8 0, [[T20]]
	; CHECK-NEXT: [[T23:%.*]] = select i1 [[T21]], i8 [[T20]], i8 [[T22]]
	; CHECK-NEXT: [[T24:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 3
	; CHECK-NEXT: store i8 [[T23]], i8* [[T24]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 4
	; CHECK-NEXT: [[T26:%.]] = load i8, i8 [[T25]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T27:%.*]] = icmp ult i8 [[T26]], 64
	; CHECK-NEXT: [[T28:%.*]] = sub i8 0, [[T26]]
	; CHECK-NEXT: [[T29:%.*]] = select i1 [[T27]], i8 [[T26]], i8 [[T28]]
	; CHECK-NEXT: [[T30:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 4
	; CHECK-NEXT: store i8 [[T29]], i8* [[T30]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T31:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 5
	; CHECK-NEXT: [[T32:%.]] = load i8, i8 [[T31]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T33:%.*]] = icmp ult i8 [[T32]], 64
	; CHECK-NEXT: [[T34:%.*]] = sub i8 0, [[T32]]
	; CHECK-NEXT: [[T35:%.*]] = select i1 [[T33]], i8 [[T32]], i8 [[T34]]
	; CHECK-NEXT: [[T36:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 5
	; CHECK-NEXT: store i8 [[T35]], i8* [[T36]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T37:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 6
	; CHECK-NEXT: [[T38:%.]] = load i8, i8 [[T37]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T39:%.*]] = icmp ult i8 [[T38]], 64
	; CHECK-NEXT: [[T40:%.*]] = sub i8 0, [[T38]]
	; CHECK-NEXT: [[T41:%.*]] = select i1 [[T39]], i8 [[T38]], i8 [[T40]]
	; CHECK-NEXT: [[T42:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 6
	; CHECK-NEXT: store i8 [[T41]], i8* [[T42]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T43:%.]] = getelementptr inbounds i8, i8 [[T1]], i64 7
	; CHECK-NEXT: [[T44:%.]] = load i8, i8 [[T43]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[T45:%.*]] = icmp ult i8 [[T44]], 64
	; CHECK-NEXT: [[T46:%.*]] = sub i8 0, [[T44]]
	; CHECK-NEXT: [[T47:%.*]] = select i1 [[T45]], i8 [[T44]], i8 [[T46]]
	; CHECK-NEXT: [[T48:%.]] = getelementptr inbounds i8, i8 [[T0]], i64 7
	; CHECK-NEXT: store i8 [[T47]], i8* [[T48]], align 1, !tbaa [[TBAA0]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i8, i8* %t1, align 1, !tbaa !3			%t3 = load i8, i8* %t1, align 1, !tbaa !3
	%t4 = icmp ult i8 %t3, 64			%t4 = icmp ult i8 %t3, 64
	%t5 = sub i8 0, %t3			%t5 = sub i8 0, %t3
	%t6 = select i1 %t4, i8 %t3, i8 %t5			%t6 = select i1 %t4, i8 %t3, i8 %t5
	store i8 %t6, i8* %t0, align 1, !tbaa !3			store i8 %t6, i8* %t0, align 1, !tbaa !3
	%t7 = getelementptr inbounds i8, i8* %t1, i64 1			%t7 = getelementptr inbounds i8, i8* %t1, i64 1
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -pass-remarks-output=%t < %s \| FileCheck %s		; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -pass-remarks-output=%t < %s \| FileCheck %s
; RUN: FileCheck --input-file=%t --check-prefix=YAML %s		; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

define i32 @foo(i32* nocapture readonly %diff) #0 {		define i32 @foo(i32* nocapture readonly %diff) #0 {
; CHECK-LABEL: @foo(		; CHECK-LABEL: @foo(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[M2:%.*]] = alloca [8 x [8 x i32]], align 16		; CHECK-NEXT: [[M2:%.*]] = alloca [8 x [8 x i32]], align 16
; CHECK-NEXT: [[TMP0:%.]] = bitcast [8 x [8 x i32]] [[M2]] to i8*		; CHECK-NEXT: [[TMP0:%.]] = bitcast [8 x [8 x i32]] [[M2]] to i8*
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[A_088:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ADD24:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[A_088:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ADD24:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[INDVARS_IV]], 3		; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[INDVARS_IV]], 3
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP1]], 4
; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 4		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP2]]
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP3]]
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP4]], [[TMP2]]
; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0		; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0
; CHECK-NEXT: store i32 [[ADD3]], i32* [[ARRAYIDX6]], align 16		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[ARRAYIDX]] to <2 x i32>*
; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 [[ADD3]], [[A_088]]		; CHECK-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP1]], 1		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[ARRAYIDX2]] to <2 x i32>*
; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 4
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[ARRAYIDX13]], align 4		; CHECK-NEXT: [[TMP7:%.*]] = add nsw <2 x i32> [[TMP6]], [[TMP4]]
; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP1]], 5		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP7]], i32 0
; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]		; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 [[TMP8]], [[A_088]]
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX16]], align 4		; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[ARRAYIDX6]] to <2 x i32>*
; CHECK-NEXT: [[ADD17:%.*]] = add nsw i32 [[TMP8]], [[TMP6]]		; CHECK-NEXT: store <2 x i32> [[TMP7]], <2 x i32>* [[TMP9]], align 16
; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1
; CHECK-NEXT: store i32 [[ADD17]], i32* [[ARRAYIDX20]], align 4		; CHECK-NEXT: [[ADD24]] = add nsw i32 [[ADD10]], [[TMP10]]
; CHECK-NEXT: [[ADD24]] = add nsw i32 [[ADD10]], [[ADD17]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
; CHECK: for.end:		; CHECK: for.end:
; CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 0		; CHECK-NEXT: [[ARRAYDECAY:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 0
; CHECK-NEXT: ret i32 [[ADD24]]		; CHECK-NEXT: ret i32 [[ADD24]]
;		;
entry:		entry:
Show All 21 Lines	for.body: ; preds = %for.body, %entry
%arrayidx16 = getelementptr inbounds i32, i32* %diff, i64 %7		%arrayidx16 = getelementptr inbounds i32, i32* %diff, i64 %7
%8 = load i32, i32* %arrayidx16, align 4		%8 = load i32, i32* %arrayidx16, align 4
%add17 = add nsw i32 %8, %6		%add17 = add nsw i32 %8, %6
%arrayidx20 = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]]* %m2, i64 0, i64 %indvars.iv, i64 1		%arrayidx20 = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]]* %m2, i64 0, i64 %indvars.iv, i64 1
store i32 %add17, i32* %arrayidx20, align 4		store i32 %add17, i32* %arrayidx20, align 4
%add24 = add nsw i32 %add10, %add17		%add24 = add nsw i32 %add10, %add17

; YAML: Pass: slp-vectorizer		; YAML: Pass: slp-vectorizer
; YAML-NEXT: Name: NotPossible		; YAML-NEXT: Name: StoresVectorized
; YAML-NEXT: Function: foo		; YAML-NEXT: Function: foo
; YAML-NEXT: Args:		; YAML-NEXT: Args:
; YAML-NEXT: - String: 'Cannot SLP vectorize list: vectorization was impossible'		; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
; YAML-NEXT: - String: ' with available vectorization factors'		; YAML-NEXT: - Cost: '-1'
		; YAML-NEXT: - String: ' and with tree size '
		; YAML-NEXT: - TreeSize: '4'

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 8		%exitcond = icmp eq i64 %indvars.iv.next, 8
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
%arraydecay = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]]* %m2, i64 0, i64 0		%arraydecay = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]]* %m2, i64 0, i64 0
ret i32 %add24		ret i32 %add24
}		}

llvm/test/Transforms/SLPVectorizer/X86/reorder_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s

	%struct.complex = type { float, float }			%struct.complex = type { float, float }

	define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {			define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP20:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP20:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP19:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP19:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP3:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP18:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[A]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load <2 x float>, <2 x float> [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x float> [[TMP10]], float [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fmul float [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x float> [[TMP9]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fmul float [[TMP7]], [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = fsub float [[TMP12]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul float [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul <2 x float> [[TMP9]], [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.*]] = fmul float [[TMP5]], [[TMP11]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]			; CHECK-NEXT: [[TMP16:%.*]] = fsub <2 x float> [[TMP12]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP18]] = fadd float [[TMP3]], [[TMP14]]			; CHECK-NEXT: [[TMP17:%.*]] = fadd <2 x float> [[TMP12]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP19]] = fadd float [[TMP2]], [[TMP17]]			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> [[TMP17]], <2 x i32> <i32 0, i32 3>
				; CHECK-NEXT: [[TMP19]] = fadd <2 x float> [[TMP2]], [[TMP18]]
	; CHECK-NEXT: [[TMP20]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP20]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]			; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0
	; CHECK-NEXT: store float [[TMP18]], float* [[TMP22]], align 4			; CHECK-NEXT: [[TMP23:%.]] = bitcast float [[TMP22]] to <2 x float>*
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT]], i32 0, i32 1			; CHECK-NEXT: store <2 x float> [[TMP19]], <2 x float>* [[TMP23]], align 4
	; CHECK-NEXT: store float [[TMP19]], float* [[TMP23]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = add i64 256, 0			%0 = add i64 256, 0
	br label %loop			br label %loop

	loop:			loop:
	%1 = phi i64 [ 0, %entry ], [ %20, %loop ]			%1 = phi i64 [ 0, %entry ], [ %20, %loop ]
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/saxpy.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	}			}

	; Make sure we don't crash on this one.			; Make sure we don't crash on this one.
	define void @SAXPY_crash(i32* noalias nocapture %x, i32* noalias nocapture %y, i64 %i) {			define void @SAXPY_crash(i32* noalias nocapture %x, i32* noalias nocapture %y, i64 %i) {
	; CHECK-LABEL: @SAXPY_crash(			; CHECK-LABEL: @SAXPY_crash(
	; CHECK-NEXT: [[TMP1:%.]] = add i64 [[I:%.]], 1			; CHECK-NEXT: [[TMP1:%.]] = add i64 [[I:%.]], 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[Y:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[Y:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw i32 undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
	; CHECK-NEXT: store i32 [[TMP5]], i32* [[TMP2]], align 4			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <2 x i32> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[I]], 2			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP2]] to <2 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[X]], i64 [[TMP6]]			; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[Y]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw i32 undef, [[TMP9]]
	; CHECK-NEXT: store i32 [[TMP10]], i32* [[TMP7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = add i64 %i, 1			%1 = add i64 %i, 1
	%2 = getelementptr inbounds i32, i32* %x, i64 %1			%2 = getelementptr inbounds i32, i32* %x, i64 %1
	%3 = getelementptr inbounds i32, i32* %y, i64 %1			%3 = getelementptr inbounds i32, i32* %y, i64 %1
	%4 = load i32, i32* %3, align 4			%4 = load i32, i32* %3, align 4
	%5 = add nsw i32 undef, %4			%5 = add nsw i32 undef, %4
	store i32 %5, i32* %2, align 4			store i32 %5, i32* %2, align 4
	%6 = add i64 %i, 2			%6 = add i64 %i, 2
	%7 = getelementptr inbounds i32, i32* %x, i64 %6			%7 = getelementptr inbounds i32, i32* %x, i64 %6
	%8 = getelementptr inbounds i32, i32* %y, i64 %6			%8 = getelementptr inbounds i32, i32* %y, i64 %6
	%9 = load i32, i32* %8, align 4			%9 = load i32, i32* %8, align 4
	%10 = add nsw i32 undef, %9			%10 = add nsw i32 undef, %9
	store i32 %10, i32* %7, align 4			store i32 %10, i32* %7, align 4
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -slp-vectorizer -mcpu=bdver1 < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [1 x i32] zeroinitializer, align 4

	define i32 @slp_schedule_bundle() local_unnamed_addr #0 {			define i32 @slp_schedule_bundle() local_unnamed_addr #0 {
	; CHECK-LABEL: @slp_schedule_bundle(			; CHECK-LABEL: @slp_schedule_bundle(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([1 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>			; CHECK-NEXT: [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0), align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0) to <2 x i32>*), align 4
	; CHECK-NEXT: [[DOTLOBIT_4:%.*]] = lshr i32 [[TMP3]], 31			; CHECK-NEXT: [[TMP4:%.*]] = lshr <2 x i32> [[TMP3]], <i32 31, i32 31>
	; CHECK-NEXT: [[DOTLOBIT_NOT_4:%.*]] = xor i32 [[DOTLOBIT_4]], 1			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[TMP4]], <i32 1, i32 1>
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_4]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0), align 4			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0) to <2 x i32>*), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr ([1 x i32], [1 x i32]* @b, i64 5, i64 0), align 4
	; CHECK-NEXT: [[DOTLOBIT_5:%.*]] = lshr i32 [[TMP4]], 31
	; CHECK-NEXT: [[DOTLOBIT_NOT_5:%.*]] = xor i32 [[DOTLOBIT_5]], 1
	; CHECK-NEXT: store i32 [[DOTLOBIT_NOT_5]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 5, i64 0), align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 0, i64 0), align 4
	%.lobit = lshr i32 %0, 31			%.lobit = lshr i32 %0, 31
	%.lobit.not = xor i32 %.lobit, 1			%.lobit.not = xor i32 %.lobit, 1
	store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4			store i32 %.lobit.not, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @a, i64 0, i64 0), align 4
	%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4			%1 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @b, i64 1, i64 0), align 4
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/simple-loop.ll

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	._crit_edge: ; preds = %.lr.ph, %0
ret i32 undef		ret i32 undef
}		}

define i32 @unrollable(i32* %in, i32* %out, i64 %n) nounwind ssp uwtable {		define i32 @unrollable(i32* %in, i32* %out, i64 %n) nounwind ssp uwtable {
; CHECK-LABEL: @unrollable(		; CHECK-LABEL: @unrollable(
; CHECK-NEXT: [[TMP1:%.]] = icmp eq i64 [[N:%.]], 0		; CHECK-NEXT: [[TMP1:%.]] = icmp eq i64 [[N:%.]], 0
; CHECK-NEXT: br i1 [[TMP1]], label [[DOT_CRIT_EDGE:%.]], label [[DOTLR_PH:%.]]		; CHECK-NEXT: br i1 [[TMP1]], label [[DOT_CRIT_EDGE:%.]], label [[DOTLR_PH:%.]]
; CHECK: .lr.ph:		; CHECK: .lr.ph:
; CHECK-NEXT: [[I_019:%.]] = phi i64 [ [[TMP26:%.]], [[DOTLR_PH]] ], [ 0, [[TMP0:%.*]] ]		; CHECK-NEXT: [[I_019:%.]] = phi i64 [ [[TMP18:%.]], [[DOTLR_PH]] ], [ 0, [[TMP0:%.*]] ]
; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I_019]], 2		; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I_019]], 2
; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 [[TMP2]]		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 [[TMP2]]
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP2]], 2
; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP2]], 1		; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 [[TMP4]]
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP2]]
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4		; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*
; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP2]], 2		; CHECK-NEXT: [[TMP8:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4
; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 [[TMP8]]		; CHECK-NEXT: [[TMP9:%.*]] = mul <2 x i32> [[TMP8]], <i32 7, i32 7>
; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP9]], <i32 7, i32 14>
; CHECK-NEXT: [[TMP11:%.*]] = or i64 [[TMP2]], 3		; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*
; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 [[TMP11]]		; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 [[TMP4]]
; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4		; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP5]] to <2 x i32>*
; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[TMP4]], 7		; CHECK-NEXT: [[TMP14:%.]] = load <2 x i32>, <2 x i32> [[TMP13]], align 4
; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 7		; CHECK-NEXT: [[TMP15:%.*]] = mul <2 x i32> [[TMP14]], <i32 7, i32 7>
; CHECK-NEXT: [[TMP16:%.*]] = mul i32 [[TMP7]], 7		; CHECK-NEXT: [[TMP16:%.*]] = add <2 x i32> [[TMP15]], <i32 21, i32 28>
; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP16]], 14		; CHECK-NEXT: store <2 x i32> [[TMP10]], <2 x i32>* [[TMP11]], align 4
; CHECK-NEXT: [[TMP18:%.*]] = mul i32 [[TMP10]], 7
; CHECK-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 21
; CHECK-NEXT: [[TMP20:%.*]] = mul i32 [[TMP13]], 7
; CHECK-NEXT: [[TMP21:%.*]] = add i32 [[TMP20]], 28
; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP2]]
; CHECK-NEXT: store i32 [[TMP15]], i32* [[TMP22]], align 4
; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 [[TMP5]]
; CHECK-NEXT: store i32 [[TMP17]], i32* [[TMP23]], align 4
; CHECK-NEXT: [[BARRIER:%.*]] = call i32 @goo(i32 0)		; CHECK-NEXT: [[BARRIER:%.*]] = call i32 @goo(i32 0)
; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 [[TMP8]]		; CHECK-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*
; CHECK-NEXT: store i32 [[TMP19]], i32* [[TMP24]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP16]], <2 x i32>* [[TMP17]], align 4
; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 [[TMP11]]		; CHECK-NEXT: [[TMP18]] = add i64 [[I_019]], 1
; CHECK-NEXT: store i32 [[TMP21]], i32* [[TMP25]], align 4		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[TMP18]], [[N]]
; CHECK-NEXT: [[TMP26]] = add i64 [[I_019]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[TMP26]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]]
; CHECK: ._crit_edge:		; CHECK: ._crit_edge:
; CHECK-NEXT: ret i32 undef		; CHECK-NEXT: ret i32 undef
;		;
%1 = icmp eq i64 %n, 0		%1 = icmp eq i64 %n, 0
br i1 %1, label %._crit_edge, label %.lr.ph		br i1 %1, label %._crit_edge, label %.lr.ph

.lr.ph: ; preds = %0, %.lr.ph		.lr.ph: ; preds = %0, %.lr.ph
Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; SITOFP to vXf32		; SITOFP to vXf32
;		;

define void @sitofp_2i64_2f32() #0 {		define void @sitofp_2i64_2f32() #0 {
; CHECK-LABEL: @sitofp_2i64_2f32(		; SSE-LABEL: @sitofp_2i64_2f32(
; CHECK-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		; SSE-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
; CHECK-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float		; SSE-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
; CHECK-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float		; SSE-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
; CHECK-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; CHECK-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @sitofp_2i64_2f32(
		; AVX256NODQ-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
		; AVX256NODQ-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @sitofp_2i64_2f32(
		; AVX512-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX512-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @sitofp_2i64_2f32(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX256DQ-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX256DQ-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%cvt0 = sitofp i64 %ld0 to float		%cvt0 = sitofp i64 %ld0 to float
%cvt1 = sitofp i64 %ld1 to float		%cvt1 = sitofp i64 %ld1 to float
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
ret void		ret void
▲ Show 20 Lines • Show All 565 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; SITOFP to vXf32		; SITOFP to vXf32
;		;

define void @sitofp_2i64_2f32() #0 {		define void @sitofp_2i64_2f32() #0 {
; CHECK-LABEL: @sitofp_2i64_2f32(		; SSE-LABEL: @sitofp_2i64_2f32(
; CHECK-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		; SSE-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
; CHECK-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float		; SSE-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
; CHECK-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float		; SSE-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
; CHECK-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; CHECK-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX256NODQ-LABEL: @sitofp_2i64_2f32(
		; AVX256NODQ-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX256NODQ-NEXT: [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
		; AVX256NODQ-NEXT: [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
		; AVX256NODQ-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX256NODQ-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX256NODQ-NEXT: ret void
		;
		; AVX512-LABEL: @sitofp_2i64_2f32(
		; AVX512-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX512-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @sitofp_2i64_2f32(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX256DQ-NEXT: [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX256DQ-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%cvt0 = sitofp i64 %ld0 to float		%cvt0 = sitofp i64 %ld0 to float
%cvt1 = sitofp i64 %ld1 to float		%cvt1 = sitofp i64 %ld1 to float
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
ret void		ret void
▲ Show 20 Lines • Show All 598 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll

Show First 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; UITOFP to vXf32		; UITOFP to vXf32
;		;

define void @uitofp_2i64_2f32() #0 {		define void @uitofp_2i64_2f32() #0 {
; CHECK-LABEL: @uitofp_2i64_2f32(		; SSE-LABEL: @uitofp_2i64_2f32(
; CHECK-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		; SSE-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
; CHECK-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		; SSE-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
; CHECK-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float		; SSE-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
; CHECK-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float		; SSE-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
; CHECK-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; CHECK-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; CHECK-NEXT: ret void		; SSE-NEXT: ret void
		;
		; AVX1-LABEL: @uitofp_2i64_2f32(
		; AVX1-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX1-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX1-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
		; AVX1-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
		; AVX1-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX1-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX1-NEXT: ret void
		;
		; AVX2-LABEL: @uitofp_2i64_2f32(
		; AVX2-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
		; AVX2-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
		; AVX2-NEXT: [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
		; AVX2-NEXT: [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
		; AVX2-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
		; AVX2-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
		; AVX2-NEXT: ret void
		;
		; AVX512-LABEL: @uitofp_2i64_2f32(
		; AVX512-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX512-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX512-NEXT: ret void
		;
		; AVX256DQ-LABEL: @uitofp_2i64_2f32(
		; AVX256DQ-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
		; AVX256DQ-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
		; AVX256DQ-NEXT: store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%cvt0 = uitofp i64 %ld0 to float		%cvt0 = uitofp i64 %ld0 to float
%cvt1 = uitofp i64 %ld1 to float		%cvt1 = uitofp i64 %ld1 to float
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
ret void		ret void
▲ Show 20 Lines • Show All 509 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

Show All 34 Lines

define void @add1(i32* noalias %dst, i32* noalias %src) {		define void @add1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @add1(		; CHECK-LABEL: @add1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[ADD3]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[TMP2]], 2
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[ADD6]], i32* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[INCDEC_PTR]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], <i32 1, i32 2>
		; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[INCDEC_PTR1]] to <2 x i32>*
		; CHECK-NEXT: store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP5]], 3
; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
Show All 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[INCDEC_PTR2]] to <2 x i32>*
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[INCDEC_PTR3]] to <2 x i32>*
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = add nsw i32 [[TMP3]], -3
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[INCDEC_PTR2]] to <2 x i32>*
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[INCDEC_PTR3]] to <2 x i32>*
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
Show All 10 Lines	entry:
%sub8 = sub nsw i32 %3, -3		%sub8 = sub nsw i32 %3, -3
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1(i32* noalias %dst, i32* noalias %src) {		define void @addsub1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @addsub1(		; CHECK-LABEL: @addsub1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 2
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 2
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[DST]] to <2 x i32>*
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
; CHECK-NEXT: store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store i32 [[TMP6]], i32* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP7]], -3
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
Show All 11 Lines	entry:
%sub8 = sub nsw i32 %3, -3		%sub8 = sub nsw i32 %3, -3
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @mul(i32* noalias %dst, i32* noalias %src) {		define void @mul(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @mul(		; CHECK-LABEL: @mul(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 2
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 2
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 257		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
; CHECK-NEXT: store i32 [[MUL]], i32* [[DST]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <2 x i32> [[TMP1]], <i32 257, i32 -3>
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <2 x i32>*
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
; CHECK-NEXT: [[MUL3:%.*]] = mul nsw i32 [[TMP1]], -3
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[MUL3]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store i32 [[TMP4]], i32* [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[MUL9:%.*]] = mul nsw i32 [[TMP3]], -9		; CHECK-NEXT: [[MUL9:%.*]] = mul nsw i32 [[TMP5]], -9
; CHECK-NEXT: store i32 [[MUL9]], i32* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store i32 [[MUL9]], i32* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%mul = mul nsw i32 %0, 257		%mul = mul nsw i32 %0, 257
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
Show All 15 Lines

define void @shl0(i32* noalias %dst, i32* noalias %src) {		define void @shl0(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @shl0(		; CHECK-LABEL: @shl0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[SHL]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SHL5:%.*]] = shl i32 [[TMP2]], 2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SHL5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[INCDEC_PTR]] to <2 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP3:%.*]] = shl <2 x i32> [[TMP2]], <i32 1, i32 2>
		; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[INCDEC_PTR1]] to <2 x i32>*
		; CHECK-NEXT: store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
		; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP5]], 3
; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines

define void @add1f(float* noalias %dst, float* noalias %src) {		define void @add1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @add1f(		; CHECK-LABEL: @add1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4		; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = fadd fast float [[TMP1]], 1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], 2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[INCDEC_PTR]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], 3.000000e+00		; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <2 x float> [[TMP2]], <float 1.000000e+00, float 2.000000e+00>
		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[INCDEC_PTR1]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP5]], 3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %0, float* %dst, align 4		store float %0, float* %dst, align 4
Show All 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4		; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], -2.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[INCDEC_PTR4]] to <2 x float>*
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], -3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4		; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = fadd fast float [[TMP2]], -2.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: store float [[SUB5]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[INCDEC_PTR3]] to <2 x float>*
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: store <2 x float> [[TMP6]], <2 x float>* [[TMP7]], align 4
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %sub, float* %dst, align 4		store float %sub, float* %dst, align 4
Show All 10 Lines	entry:
%sub8 = fsub fast float %3, -3.000000e+00		%sub8 = fsub fast float %3, -3.000000e+00
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1f(float* noalias %dst, float* noalias %src) {		define void @addsub1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @addsub1f(		; CHECK-LABEL: @addsub1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 2
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 2
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP3]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[SUB1:%.*]] = fsub fast float [[TMP1]], -1.000000e+00		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[DST]] to <2 x float>*
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: store float [[SUB1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP6:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store float [[TMP6]], float* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP7]], -3.000000e+00
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
Show All 11 Lines	entry:
%sub8 = fsub fast float %3, -3.000000e+00		%sub8 = fsub fast float %3, -3.000000e+00
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @mulf(float* noalias %dst, float* noalias %src) {		define void @mulf(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @mulf(		; CHECK-LABEL: @mulf(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 2
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 2
; CHECK-NEXT: [[SUB:%.*]] = fmul fast float [[TMP0]], 2.570000e+02		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], <float 2.570000e+02, float -3.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <2 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: [[SUB3:%.*]] = fmul fast float [[TMP1]], -3.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00		; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fmul fast float %0, 2.570000e+02		%sub = fmul fast float %0, 2.570000e+02
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

define void @add1fn(float* noalias %dst, float* noalias %src) {		define void @add1fn(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @add1fn(		; CHECK-LABEL: @add1fn(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4		; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = fadd float [[TMP1]], 1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], 2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[INCDEC_PTR]] to <2 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], 3.000000e+00		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP2]], <float 1.000000e+00, float 2.000000e+00>
		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[INCDEC_PTR1]] to <2 x float>*
		; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
		; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP5]], 3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %0, float* %dst, align 4		store float %0, float* %dst, align 4
Show All 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4		; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[INCDEC_PTR2]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], -2.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[INCDEC_PTR4]] to <2 x float>*
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], -3.000000e+00
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	entry:
%sub9 = fadd float %3, -3.000000e+00		%sub9 = fadd float %3, -3.000000e+00
store float %sub9, float* %incdec.ptr7, align 4		store float %sub9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @mulfn(float* noalias %dst, float* noalias %src) {		define void @mulfn(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @mulfn(		; CHECK-LABEL: @mulfn(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 2
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 2
; CHECK-NEXT: [[SUB:%.*]] = fmul float [[TMP0]], 2.570000e+02		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <2 x float>*
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4		; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.570000e+02, float -3.000000e+00>
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <2 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4		; CHECK-NEXT: store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
; CHECK-NEXT: [[SUB3:%.*]] = fmul float [[TMP1]], -3.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load float, float [[INCDEC_PTR5]], align 4
; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00		; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fmul float %0, 2.570000e+02		%sub = fmul float %0, 2.570000e+02
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Try partial store vectorization if supported by target.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428116

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/arith-add-load.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-and-const-load.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-mul-load.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_7zip.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-after-bundle.ll

llvm/test/Transforms/SLPVectorizer/X86/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/X86/no_alternate_divrem.ll

llvm/test/Transforms/SLPVectorizer/X86/odd_store.ll

llvm/test/Transforms/SLPVectorizer/X86/pr49933.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_phi.ll

llvm/test/Transforms/SLPVectorizer/X86/saxpy.ll

llvm/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll

llvm/test/Transforms/SLPVectorizer/X86/simple-loop.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll

llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

[SLP]Try partial store vectorization if supported by target.
ClosedPublic