This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
11/25
ScalarEvolutionExpander.cpp
-
test/Transforms/IndVarSimplify/ARM/
-
Transforms/
-
IndVarSimplify/
-
ARM/
-
indvar-unroll-imm-cost.ll

Differential D76434

[SCEV] Query expanded immediate cost at minsize
ClosedPublic

Authored by samparker on Mar 19 2020, 9:17 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
reames
echristo
mkazantsev
hfinkel
craig.topper
arsenm
dmgreen
uweigand
SjoerdMeijer

Commits

rG0bdf8c912724: [SCEV] Constant expansion cost at minsize

Summary

As code size is the only thing we care about at minsize, query the cost of materialising immediates when calculating the cost of a SCEV expansion. We also modify the CostKind to TCK_CodeSize for minsize, instead of RecipThroughput. This gives -0.1% geomean reduction of the llvm test suite at -Oz for both thumbv7a and aarch64.

Diff Detail

Event Timeline

samparker created this revision.Mar 19 2020, 9:17 AM

Herald added subscribers: danielkiss, zzheng, hiraditya and 3 others. · View Herald TranscriptMar 19 2020, 9:18 AM

Hmm, i'm not a ventillator. Some initial thoughts.

This is rather pessimistic. If we really want to do this, we need to use TargetTransformInfo::getIntImmCostInst().
What cost does that model? I'm under impression that TargetTransformInfo::getIntImmCost*() model TargetCostKind::TCK_CodeSize, since it is mainly used in ConstantHoistingPass. Here we model TargetCostKind::TCK_RecipThroughput.
Assuming that TargetTransformInfo::getIntImmCost*() actually models TargetCostKind::TCK_RecipThroughput, i believe, all X86 changes should not be here. Either most of them will get fixed via using TargetTransformInfo::getIntImmCostInst(), or the llvm::SCEVCheapExpansionBudget will need to be bumped.

Herald added a subscriber: • wuzish. · View Herald TranscriptMar 19 2020, 12:52 PM

@samparker are you trying to mitigate perf impact for in-order CPU?
I wonder if D73501 simply is counter-productive then.
Though yes, vectorizers need to be taught that trick.

Thanks for taking a look.

This is rather pessimistic. If we really want to do this, we need to use TargetTransformInfo::getIntImmCostInst().

This sounds good and what I was also considering.

What cost does that model?

I was under the impression that throughput/codesize costs for immediates would be highly correlated, where a 'high cost' constant would introduce instruction(s) to generate it, increasing code size and reducing throughput. As @spatel said in D76124, the lines have become blurred but we should be modelling at least something here.

I wonder if D73501 simply is counter-productive then.

I would be lying if I said that the patch didn't cause a whole world of pain :) But I would like to try to resolve this, if we can, by modelling costs better. The SCEV changes may have just broken our we do unrolling for our little microcontrollers, so I'll be looking at ARM TTI too.

Herald added a reviewer: aartbik. · View Herald TranscriptMar 20 2020, 1:29 AM

Do we model rthrougput for constants elsewhere?
I may be wrong, but i don't recall seeing any such modelling previously.
Let's take a step back here. How about we simply revert D73501?

Do we model rthrougput for constants elsewhere?

I guess that depends on who 'we' are because, again, different backends will be modelling different things. Why the focus on throughput now anyway? When this code was using getOperationCost, it would have been getting some performance/code size cost, which I expect most backends will be modelling for constants. And I'm not really sure why throughput is specifically important here, we're not concerning ourselves with the throughput of casts and compares, right?

Let's take a step back here. How about we simply revert D73501?

Sounds good to me, although I still need to prod around here to see if codegen risk being erratic in the future.

In D76434#1933359, @samparker wrote:

Do we model rthrougput for constants elsewhere?

I guess that depends on who 'we' are because, again, different backends will be modelling different things.

Sorry, i meant llvm transforms in general, not backends.

Why the focus on throughput now anyway? When this code was using getOperationCost,
it would have been getting some performance/code size cost,
which I expect most backends will be modelling for constants.
And I'm not really sure why throughput is specifically important here,
we're not concerning ourselves with the throughput of casts and compares, right?

We have 3 cost models - latency, size and rthroughput.
My aspirational goal in this scevexpander budget was: "how much more computations are we willing to do without it being too much of a burden?"
Size model isn't really applicable here - we *could* lower any sequence into a libcall, regardless of it's native instruction count.
Likewise i'm not sure we're really after latency here, which leaves us with rthroughput.
But we can't subtract oranges from cucumbers, so all cost modelling should be consistently using rthroughput cost model.
Thus i'm asking, what does getIntImmCost() model? rthroughput or size?

Let's take a step back here. How about we simply revert D73501?

Sounds good to me, although I still need to prod around here to see if codegen risk being erratic in the future.

I'm pretty sure there's only two APIs that were designed around throughput: 'getInstructionThroughput' and 'getArithmeticInstrCost' and maybe the latter should be named more explicitly. The code size calls are generally a bit more explicit, so there's 'getIntImmCodeSizeCost', which is not the one I've used.

"how much more computations are we willing to do without it being too much of a burden?"

And this also depends what we're considering as a 'burden', and another angle that I'll probably need to look at in the near future is the burden of code size... Either way, this code shouldn't assume that constants are free (in any sense of the term).

Introduced a lambda to look at an expressions operand and, if it's a constant, query for cost using getIntImmCostInst.

Herald added a subscriber: javed.absar. · View Herald TranscriptMar 26 2020, 7:05 AM

In D76434#1933498, @lebedev.ri wrote:

<...>
Thus i'm asking, what does getIntImmCost() model? rthroughput or size?

This revision now requires changes to proceed.Mar 26 2020, 7:53 AM

The API is there to figure out whether the constant will be folded into the given instruction, otherwise there will be some 'cost' to materialize it. Having to generate instruction(s) for the materialization is likely to increase code size, but more importantly, reduce throughput and increase latency - which is why it's sometimes beneficial to hoist expensive constants out of loops.

Ping.

Rebased. Thanks for reverting rewriteLoopExitValues @lebedev.ri, but I'm still seeing value in this patch. I'm not seeing any changes when running the test suite on my X86 box, but for Arm's DSP suite this affects 42/166 benchmarks for Thumb1 and 30/166 for Thumb2. Out of those changes I'm seeing a 1.6% geomean improvement for both targets.

aartbik removed a reviewer: aartbik.Apr 14 2020, 4:14 PM

Now only performing the checks when optimising for minsize.
thumbv7a results:

Metric: size..text

Program                                                                                              master  scev-expander diff
                                    test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   47596   46528       -2.2%
                                   test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test    6484    6340       -2.2%
                                        test-suite :: MultiSource/Benchmarks/VersaBench/bmm/bmm.test     924     908       -1.7%
                                 test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test    2916    2868       -1.6%
                                          test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test    4748    4696       -1.1%
                        test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test    3120    3088       -1.0%
                                        test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test    2408    2384       -1.0%
                          test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test   19920   19748       -0.9%
                          test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test   19920   19748       -0.9%
                         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test   32516   32236       -0.9%
                                  test-suite :: SingleSource/Benchmarks/Shootout/Shootout-lists.test     956     964        0.8%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test    2920    2896       -0.8%
                 test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   14600   14480       -0.8%
                                            test-suite :: SingleSource/Benchmarks/McGill/queens.test     976     968       -0.8%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test   31508   31252       -0.8%
                             test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   68296   67752       -0.8%
                                           test-suite :: SingleSource/Benchmarks/Misc/whetstone.test    1604    1592       -0.7%
                                       test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test    4936    4904       -0.6%
                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test   23836   23684       -0.6%
                                  test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test   27552   27384       -0.6%
                                     test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test    1320    1312       -0.6%
                                           test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test    1348    1340       -0.6%
                                          test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test    1376    1368       -0.6%
                                         test-suite :: MultiSource/Applications/minisat/minisat.test    9980    9924       -0.6%
                                         test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test    3008    3024        0.5%
                       test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   69828   69460       -0.5%
                                 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test   20116   20012       -0.5%
                                          test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test    3332    3316       -0.5%
                                      test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  168600  167800       -0.5%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   26104   25984       -0.5%
                                   test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test    3592    3576       -0.4%
                                           test-suite :: MultiSource/Applications/oggenc/oggenc.test   90092   89700       -0.4%
                          test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test    2788    2776       -0.4%
                                              test-suite :: SingleSource/Benchmarks/McGill/misr.test    1864    1872        0.4%
                                          test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    7540    7508       -0.4%
                                        test-suite :: MultiSource/Applications/JM/lencod/lencod.test  318860  317620       -0.4%
                           test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test    2176    2168       -0.4%
                                   test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test    4828    4812       -0.3%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   29164   29068       -0.3%
                                     test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test   24400   24320       -0.3%
                                            test-suite :: SingleSource/Benchmarks/Misc/oourafft.test    5344    5328       -0.3%
                                             test-suite :: MultiSource/Applications/spiff/spiff.test   12184   12152       -0.3%
                                             test-suite :: MultiSource/Applications/sgefa/sgefa.test    6096    6080       -0.3%
                                             test-suite :: MultiSource/Applications/SPASS/SPASS.test  201512  201000       -0.3%
                                        test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  128336  128016       -0.2%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test    9640    9616       -0.2%
                                         test-suite :: MultiSource/Applications/ClamAV/clamscan.test  246116  245556       -0.2%
                                                 test-suite :: MultiSource/Applications/lua/lua.test   61520   61392       -0.2%
                                               test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test   40916   40832       -0.2%
               test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test    7796    7780       -0.2%
                                               test-suite :: SingleSource/Benchmarks/Misc/flops.test    3936    3928       -0.2%
                                         test-suite :: MultiSource/Applications/obsequi/Obsequi.test   17468   17436       -0.2%
                               test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   89716   89552       -0.2%
                                       test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  324352  323820       -0.2%
                                             test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296700  296220       -0.2%
                                     test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test    2524    2520       -0.2%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test   20760   20728       -0.2%
                                             test-suite :: MultiSource/Benchmarks/nbench/nbench.test   16872   16848       -0.1%
                            test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test    2948    2952        0.1%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.test    6084    6076       -0.1%
                                          test-suite :: MultiSource/Applications/d/make_dparser.test   43812   43756       -0.1%
                                         test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test   20152   20176        0.1%
                                       test-suite :: MultiSource/Applications/hexxagon/hexxagon.test    7252    7244       -0.1%
                       test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test   14828   14812       -0.1%
                            test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test   92020   91924       -0.1%
                                      test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test    8228    8220       -0.1%
                                                   test-suite :: MultiSource/Benchmarks/sim/sim.test    8704    8696       -0.1%
                                         test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  168288  168136       -0.1%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test   91924   91844       -0.1%
                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test  218764  218580       -0.1%
                                                    test-suite :: MicroBenchmarks/harris/harris.test   62584   62536       -0.1%
                                       test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test   11224   11216       -0.1%
                            test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   90788   90724       -0.1%
                                   test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test   12352   12344       -0.1%
                             test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   62016   61976       -0.1%
                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test    7120    7116       -0.1%
               test-suite :: MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter.test   61768   61736       -0.1%
        test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test   61800   61768       -0.1%
                                        test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test   62136   62104       -0.1%
                                    test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test   62440   62408       -0.1%
                                               test-suite :: MultiSource/Applications/siod/siod.test   47092   47068       -0.1%
                       test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   60088   60064       -0.0%

aarch64 results:

Metric: size..text

Program                                                                                              master  scev-expander diff
                                    test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   72100   68220       -5.4%
                                           test-suite :: SingleSource/Benchmarks/Misc/whetstone.test    2092    2004       -4.2%
                                 test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test    2588    2500       -3.4%
                                   test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test    8420    8196       -2.7%
                                          test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test    2212    2156       -2.5%
                                           test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test    1652    1620       -1.9%
                        test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test    2948    2908       -1.4%
                         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test   43972   43428       -1.2%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test    3668    3628       -1.1%
                                          test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test    5876    5812       -1.1%
                                 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test   22276   22068       -0.9%
                                           test-suite :: MultiSource/Benchmarks/Olden/em3d/em3d.test    2812    2836        0.9%
                                         test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test   26804   26580       -0.8%
                                     test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test     980     972       -0.8%
                                          test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    9132    9060       -0.8%
                          test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test   26220   26020       -0.8%
                          test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test   26220   26020       -0.8%
                                        test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test    3156    3132       -0.8%
                                   test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test    3156    3132       -0.8%
                                       test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test    5460    5420       -0.7%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test   34852   34628       -0.6%
                             test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   99124   98540       -0.6%
                                           test-suite :: MultiSource/Applications/oggenc/oggenc.test  107024  106456       -0.5%
                                            test-suite :: SingleSource/Benchmarks/McGill/queens.test    1516    1508       -0.5%
                                  test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test   36012   35836       -0.5%
                       test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   82264   81864       -0.5%
                                              test-suite :: SingleSource/Benchmarks/McGill/misr.test    1668    1660       -0.5%
                            test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test    3372    3356       -0.5%
                                         test-suite :: MultiSource/Applications/minisat/minisat.test   13836   13772       -0.5%
                                         test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test    3460    3444       -0.5%
                                   test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test    3996    3980       -0.4%
                                               test-suite :: SingleSource/Benchmarks/Misc/flops.test    4116    4100       -0.4%
                                     test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test   27396   27292       -0.4%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test    8852    8820       -0.4%
                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test   32168   32056       -0.3%
                                          test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test    4892    4876       -0.3%
                 test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   19708   19644       -0.3%
                                             test-suite :: MultiSource/Benchmarks/nbench/nbench.test   20916   20852       -0.3%
                                             test-suite :: MultiSource/Applications/spiff/spiff.test   16028   15980       -0.3%
                                      test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  203964  203460       -0.2%
                               test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test    3380    3372       -0.2%
                                             test-suite :: MultiSource/Applications/SPASS/SPASS.test  288136  287464       -0.2%
                                     test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test    3596    3588       -0.2%
                                        test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  161852  161492       -0.2%
                                                   test-suite :: MultiSource/Benchmarks/sim/sim.test   11044   11020       -0.2%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   41156   41068       -0.2%
                                        test-suite :: MultiSource/Applications/JM/lencod/lencod.test  383100  382340       -0.2%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   28852   28796       -0.2%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test   27236   27188       -0.2%
               test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test    9132    9116       -0.2%
                                            test-suite :: SingleSource/Benchmarks/Misc/oourafft.test    5444    5436       -0.1%
                                   test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test   16428   16404       -0.1%
                                       test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  471400  470712       -0.1%
                                      test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test   11324   11308       -0.1%
                                             test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  378596  378116       -0.1%
                                                    test-suite :: MicroBenchmarks/harris/harris.test  104836  104708       -0.1%
                                             test-suite :: MultiSource/Applications/sgefa/sgefa.test    6812    6804       -0.1%
                                         test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  244840  244568       -0.1%
                                         test-suite :: MultiSource/Applications/obsequi/Obsequi.test   21980   21956       -0.1%
                                       test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test   15124   15108       -0.1%
                               test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  116900  116780       -0.1%
                                                 test-suite :: MultiSource/Applications/lua/lua.test   90508   90420       -0.1%
                                         test-suite :: MultiSource/Applications/ClamAV/clamscan.test  331984  331680       -0.1%
                                               test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test   54364   54316       -0.1%
                                               test-suite :: MultiSource/Applications/siod/siod.test   64448   64392       -0.1%
                       test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   83348   83276       -0.1%
                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test    9548    9540       -0.1%
                             test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   86428   86356       -0.1%
                                       test-suite :: MultiSource/Applications/hexxagon/hexxagon.test   10180   10172       -0.1%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test  142752  142656       -0.1%
                            test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test  142792  142712       -0.1%
                                          test-suite :: MultiSource/Applications/d/make_dparser.test   62912   62880       -0.1%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test  139296  139232       -0.0%

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2020, 1:06 AM

That are really nice code size savings!

As only the improvements are shown, just curious if there are no regressions? Thus, this is overall an improvement too?
And for extra confidence and as the numbers are easy to obtain, probably best to get numbers for x86 too?

Thanks!. That's the full set of results, there's a few little regressions for thumb and I think only one for aarch64... I'll get the X86 numbers now.

Similar story on X86 as well:

Metric: size..text

Program                                                                                              master  scev-expander diff 
                                           test-suite :: SingleSource/Benchmarks/Misc/whetstone.test    2437    2325       -4.6%
                                           test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test    1637    1573       -3.9%
                                    test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   58885   56645       -3.8%
                                     test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test     821     805       -1.9%
                                          test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test    5845    5733       -1.9%
                                   test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test    8597    8437       -1.9%
                        test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test    2661    2613       -1.8%
                                          test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test    1781    1749       -1.8%
                            test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test    2917    2869       -1.6%
                                        test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test    3253    3205       -1.5%
                         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test   47093   46469       -1.3%
                                 test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test    2485    2453       -1.3%
                                            test-suite :: SingleSource/Benchmarks/McGill/queens.test    1301    1285       -1.2%
                 test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   20485   20245       -1.2%
                                              test-suite :: SingleSource/Benchmarks/McGill/misr.test    1525    1509       -1.0%
                                          test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    9221    9125       -1.0%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test    3205    3173       -1.0%
                                   test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test    3365    3333       -1.0%
                       test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   88756   88148       -0.7%
                          test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test   28709   28517       -0.7%
                          test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test   28709   28517       -0.7%
                           test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test    2501    2517        0.6%
                                     test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test   33429   33221       -0.6%
                                       test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test    5445    5413       -0.6%
                                  test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test   34293   34101       -0.6%
                                               test-suite :: SingleSource/Benchmarks/Misc/flops.test    6117    6085       -0.5%
                                             test-suite :: MultiSource/Applications/spiff/spiff.test   15413   15333       -0.5%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test   37109   36917       -0.5%
                                         test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test    3109    3125        0.5%
                                            test-suite :: SingleSource/Benchmarks/Misc/oourafft.test    6453    6421       -0.5%
                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test   29668   29524       -0.5%
                          test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test    3461    3445       -0.5%
                                     test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test    3493    3477       -0.5%
                             test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   95733   95349       -0.4%
                               test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test    4181    4165       -0.4%
                                         test-suite :: MultiSource/Applications/minisat/minisat.test   13413   13365       -0.4%
                                        test-suite :: MultiSource/Applications/JM/lencod/lencod.test  417589  416293       -0.3%
                                      test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test   10645   10613       -0.3%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test   11173   11205        0.3%
                                           test-suite :: MultiSource/Applications/oggenc/oggenc.test  115716  115396       -0.3%
                                        test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  171989  171525       -0.3%
                                                   test-suite :: MultiSource/Benchmarks/sim/sim.test   11957   11925       -0.3%
                                      test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  221205  220629       -0.3%
                                             test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test   25061   24997       -0.3%
                                         test-suite :: MultiSource/Applications/obsequi/Obsequi.test   21141   21093       -0.2%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   43445   43349       -0.2%
                                       test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  444644  443812       -0.2%
                                         test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test   27477   27525        0.2%
                                                 test-suite :: MultiSource/Applications/lua/lua.test   83317   83173       -0.2%
               test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test   10389   10373       -0.2%
                                             test-suite :: MultiSource/Benchmarks/nbench/nbench.test   20981   20949       -0.2%
                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test   10821   10837        0.1%
                                 test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test   23077   23045       -0.1%
                                             test-suite :: MultiSource/Applications/SPASS/SPASS.test  284754  284386       -0.1%
                                             test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  400821  400325       -0.1%
                                       test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test   14341   14325       -0.1%
                                         test-suite :: MultiSource/Applications/ClamAV/clamscan.test  352852  352484       -0.1%
                            test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test  141716  141572       -0.1%
                                   test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test   16037   16021       -0.1%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test  141636  141508       -0.1%
                                               test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test   56085   56037       -0.1%
                                               test-suite :: MultiSource/Applications/siod/siod.test   60580   60532       -0.1%
                                         test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  241315  241139       -0.1%
                                                    test-suite :: MicroBenchmarks/harris/harris.test   99381   99317       -0.1%
                            test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test  139956  139876       -0.1%
                                          test-suite :: MultiSource/Applications/d/make_dparser.test   61347   61315       -0.1%
                               test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  138581  138645        0.0%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test  138932  138868       -0.0%
                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   36805   36789       -0.0%
                                        test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test   98981   98949       -0.0%
                                    test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test   99285   99253       -0.0%
                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test  321508  321412       -0.0%
                                  test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test  134756  134724       -0.0%
                             test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   84661   84677        0.0%
        test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test   98453   98437       -0.0%
                      test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   99397   99381       -0.0%
                                         test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  102437  102421       -0.0%
                 test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test  391620  391572       -0.0%
                            test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test  134788  134772       -0.0%
                                        test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test  177893  177877       -0.0%
                                     test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  296165  296149       -0.0%
                                            test-suite :: MultiSource/Applications/kimwitu++/kc.test  297476  297460       -0.0%
                                                                                  Geomean difference                       -0.1%

Nice one!

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2408	This change here looks like an exact duplication of the change above (lines 2355 - 2362). Can this be in a helper?

samparker added inline comments.Aug 12 2020, 1:41 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2408	Yeah, that would be nicer.

Apologies if i'm missing the point here.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2204	s/get/accountFor/
2214–2217	I'm not convinced this modelling is correct. (which is why i didn't respond, but apparently i forgot to actually post that) If we have SCEV `x + y + 42`, `42` will be modelled as-if it's at index `2`, but we should model this as `(x + y) + 42`, because there's usually some form of an `add` that takes an immediate as second param.
2218–2221	Same
2222–2229	And again, if we have `umin(x, y, 42)`, it's lowered as `z = (x u< y) ? x : y ; z u< 42 ? z : 42`, so you can't possibly have third operand here.
2356–2357	enumerate(NAry->operands())
2409–2410	enumerate(NAry->operands())

This revision now requires changes to proceed.Aug 12 2020, 1:50 AM

samparker added inline comments.Aug 12 2020, 2:04 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2214–2217	Ah! Right, I'll try to fix that.

Changed how operands are visited and costed, hopefully now translating the SCEV operand index correctly to an Instruction index.

With @lebedev.ri last comment fixed, this now looks good to me. Please wait a day or so just in case there are more comments.

I think this is going in the wrong direction.
I think the worklist needs to be changed, to be a struct { unsigned ParentOpcode; int OperandIdx; SCEV* S; },
then everything should suddenly become less convoluted/hand-wavey.

This revision now requires changes to proceed.Aug 14 2020, 2:30 AM

Yeah, that sounds better, I'll put together a separate patch. Are you happy with the operand index clamping though?

In D76434#2217823, @samparker wrote:

Yeah, that sounds better, I'll put together a separate patch.

Are you happy with the operand index clamping though?

Looks about right i guess.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2195	Can we still get a constant here?
2279–2282	We are not taxing constants for right-shifts.

samparker added inline comments.Aug 14 2020, 6:59 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2364	Now I notice that I wouldn't have been handling AddRec expressions... So should these operands be added to the worklist for both Add and Mul or would just Add be okay?

All this is (becoming?) so incredibly fragile...
Have you checked what happens if you simply make isHighCostExpansionHelper() always return true for -Oz? :)
This really needs refactoring/generalization.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2364	Given `A + Bx`, you'd want to model `A` as being at index 1, and `B` as being part of multipler (again, at index 1. And for higher orders `A + Bx + Cx^2`, again, `B` and `C` are part of multiply, and it should be modelled as `(Bx + C*x^2) + A`. So i think the generalization is that all nary operands except the first one are at index 1 of `mul`, and the first nary operand is at index 1 of `add`.

This really needs refactoring/generalization.

I will try...

Have you checked what happens if you simply make isHighCostExpansionHelper() always return true for -Oz? :)

I thought I did, but actually it was just for rewriting loop exit values... I will run the numbers though! But I'm also interested in whether this can still be beneficial for execution speed too.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2364	Okay, thanks.

I just ran some numbers for when -Oz == HighCost and it's interesting... For Arm it's not good:

              master  minsize-high-cost-expand        diff                                                                                                                                        
count  310.000000     310.000000                310.000000                                                                                                                                        
mean   19710.077419   19662.567742              0.000666                                                                                                                                          
std    48882.246111   48773.870505              0.010779                                                                                                                                          
min    292.000000     292.000000               -0.042781                                                                                                                                          
25%    1090.000000    1084.000000              -0.002887                                                                                                                                          
50%    2776.000000    2784.000000               0.000000                                                                                                                                          
75%    10185.000000   10138.000000              0.002424                                                                                                                                          
max    324060.000000  323508.000000             0.058974
Geomean difference                                   0.1%

For AArch64 it's okay, but there are still plenty of large regressions:

              master  minsize-high-cost-expand        diff                                                                                                                                        
count  310.000000     310.000000                310.000000                                                                                                                                        
mean   26712.283871   26650.709677             -0.000930                                                                                                                                          
std    66147.455977   66033.469245              0.010996                                                                                                                                          
min    476.000000     476.000000               -0.050374                                                                                                                                          
25%    1294.000000    1304.000000              -0.003676                                                                                                                                          
50%    3168.000000    3168.000000               0.000000                                                                                                                                          
75%    11252.000000   11216.000000              0.001597                                                                                                                                          
max    471648.000000  470888.000000             0.051485
Geomean difference                                  -0.1%

But for X86, it's great:

              master  minsize-high-cost-expand        diff                                                                                                                                        
count  313.000000     313.000000                313.000000                                                                                                                                        
mean   27472.361022   27354.431310             -0.005084                                                                                                                                          
std    68350.278069   68101.639463              0.013028                                                                                                                                          
min    389.000000     389.000000               -0.065654                                                                                                                                          
25%    1157.000000    1157.000000              -0.007393                                                                                                                                          
50%    2933.000000    2949.000000               0.000000                                                                                                                                          
75%    11093.000000   11077.000000              0.000000                                                                                                                                          
max    444676.000000  442660.000000             0.039900
Geomean difference                                  -0.5%

samparker mentioned this in D86050: [SCEV] Refactor isHighCostExpansionHelper.Aug 17 2020, 1:41 AM

samparker mentioned this in D86072: [SCEV] Cost Add and Mul Expr consistently.Aug 17 2020, 7:09 AM

lebedev.ri added inline comments.Aug 17 2020, 7:15 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2406	In D86072#2221367, @samparker wrote: In D76434 you highlighted by SCEVNAry expressions can have more than two operands, which would expand to a chain of operations, and the existing costs for AddRecExprs tries to account for that. But this was missing for normal Add and Mul expressions. Have I misunderstood you? Doesn't look missing to me?

samparker added inline comments.Aug 17 2020, 7:26 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2406	Ah, thanks! I'm getting lost amongst the patches.

Rebased on top of D86050 and introduced a struct to map operands for the worklist, instead of just storing the opcode.

samparker added a parent revision: D86050: [SCEV] Refactor isHighCostExpansionHelper.Aug 18 2020, 4:49 AM

Ping

Rebased now that the refactor patch is in.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 10 2020, 12:23 AM

Closed by commit rG0bdf8c912724: [SCEV] Constant expansion cost at minsize (authored by samparker). · Explain Why

This revision was automatically updated to reflect the committed changes.

samparker added a commit: rG0bdf8c912724: [SCEV] Constant expansion cost at minsize.

Let's discuss any further desired tweaks in a post-commit review.

@samparker why did you commit this?
I have not finished reviewing this, and i've requested changes to the previous revisions.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2187	This needs a better comment. Min/Max Idx are pretty magical on the first look.
2189	Not llvm coding style in any case. These should likely be `Min`/`Max`
2192–2193	`SCEVOperand::OperandIdx` is `int`
2195	Please do mark done comments as such.
2344	Consider constants to be free unless we are optforsize
2347–2348	These are only used in a single place
2352	This is inconsistent with every other return here.

We had gone through many revisions where I've addressed all of your (helpful) comments and I hadn't heard anything else for three weeks. I sincerely expected the remaining issues to be style changes which seem appropriate for a post-commit review.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2192–2193	But when we enumerate the SCEV operands, the index is size_t and the types need to be compatible for std::max and min.

lebedev.ri added inline comments.Sep 10 2020, 1:01 AM

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2192–2193	int MinIdx = std::max((int)SCEVOp.index(), CostOp.MinIdx); int OpIdx = std::min(MinIdx, CostOp.MaxIdx);

Addressed comments.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

ScalarEvolutionExpander.cpp

62 lines

test/

Transforms/

IndVarSimplify/

ARM/

indvar-unroll-imm-cost.ll

462 lines

Diff 286253

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp

	Show First 20 Lines • Show All 1,992 Lines • ▼ Show 20 Lines

	template<typename T> static int costAndCollectOperands(			template<typename T> static int costAndCollectOperands(
	SCEVOperand &WorkItem, const TargetTransformInfo &TTI,			SCEVOperand &WorkItem, const TargetTransformInfo &TTI,
	TargetTransformInfo::TargetCostKind CostKind,			TargetTransformInfo::TargetCostKind CostKind,
	SmallVectorImpl<SCEVOperand> &Worklist) {			SmallVectorImpl<SCEVOperand> &Worklist) {

	const T *S = cast<T>(WorkItem.S);			const T *S = cast<T>(WorkItem.S);
	int Cost = 0;			int Cost = 0;
	SmallVector<unsigned, 2> Opcodes;			// Object to help map SCEV operands to expanded IR operands.
				struct OperationIndices {
				OperationIndices(unsigned Opc, size_t min, size_t max) :
				Opcode(Opc), MinIdx(min), MaxIdx(max) { }
				unsigned Opcode;
				size_t MinIdx;
				lebedev.riUnsubmitted Done Reply Inline Actions This needs a better comment. Min/Max Idx are pretty magical on the first look. lebedev.ri: This needs a better comment. Min/Max Idx are pretty magical on the first look.
				size_t MaxIdx;
				};
				lebedev.riUnsubmitted Done Reply Inline Actions Not llvm coding style in any case. These should likely be `Min`/`Max` lebedev.ri: Not llvm coding style in any case. These should likely be `Min`/`Max`
				SmallVector<OperationIndices, 2> Operations;

	// In this polynominal, we may have some zero operands, and we shouldn't			// In this polynominal, we may have some zero operands, and we shouldn't
	// really charge for those. So how many non-zero coeffients are there?			// really charge for those. So how many non-zero coeffients are there?
				lebedev.riUnsubmitted Not Done Reply Inline Actions `SCEVOperand::OperandIdx` is `int` lebedev.ri: `SCEVOperand::OperandIdx` is `int`
				samparkerAuthorUnsubmitted Done Reply Inline Actions But when we enumerate the SCEV operands, the index is size_t and the types need to be compatible for std::max and min. samparker: But when we enumerate the SCEV operands, the index is size_t and the types need to be…
				lebedev.riUnsubmitted Not Done Reply Inline Actions int MinIdx = std::max((int)SCEVOp.index(), CostOp.MinIdx); int OpIdx = std::min(MinIdx, CostOp.MaxIdx); lebedev.ri: ``` int MinIdx = std::max((int)SCEVOp.index(), CostOp.MinIdx); int OpIdx = std::min…
	int NumTerms = llvm::count_if(S->operands(), [](const SCEV *Op) {			int NumTerms = llvm::count_if(S->operands(), [](const SCEV *Op) {
	return !Op->isZero();			return !Op->isZero();
				lebedev.riUnsubmitted Not Done Reply Inline Actions Can we still get a constant here? lebedev.ri: Can we still get a constant here?
				lebedev.riUnsubmitted Not Done Reply Inline Actions Please do mark done comments as such. lebedev.ri: Please do mark done comments as such.
	});			});

	// Ignoring constant term (operand 0), how many of the coeffients are u> 1?			// Ignoring constant term (operand 0), how many of the coeffients are u> 1?
	int NumNonZeroDegreeNonOneTerms =			int NumNonZeroDegreeNonOneTerms =
	llvm::count_if(S->operands(), [](const SCEV *Op) {			llvm::count_if(S->operands(), [](const SCEV *Op) {
	auto *SConst = dyn_cast<SCEVConstant>(Op);			auto *SConst = dyn_cast<SCEVConstant>(Op);
	return !SConst \|\| SConst->getAPInt().ugt(1);			return !SConst \|\| SConst->getAPInt().ugt(1);
	});			});

				lebedev.riUnsubmitted Not Done Reply Inline Actions s/get/accountFor/ lebedev.ri: s/get/accountFor/
	auto CastCost = [&](unsigned Opcode) {			auto CastCost = [&](unsigned Opcode) {
	Opcodes.push_back(Opcode);			Operations.emplace_back(Opcode, 0, 0);
	return TTI.getCastInstrCost(Opcode, S->getType(),			return TTI.getCastInstrCost(Opcode, S->getType(),
	S->getOperand(0)->getType(),			S->getOperand(0)->getType(),
	TTI::CastContextHint::None, CostKind);			TTI::CastContextHint::None, CostKind);
	};			};

	auto ArithCost = [&](unsigned Opcode, unsigned NumRequired) {			auto ArithCost = [&](unsigned Opcode, unsigned NumRequired,
	Opcodes.push_back(Opcode);			unsigned MinIdx = 0, unsigned MaxIdx = 1) {
				Operations.emplace_back(Opcode, MinIdx, MaxIdx);
	return NumRequired *			return NumRequired *
	TTI.getArithmeticInstrCost(Opcode, S->getType(), CostKind);			TTI.getArithmeticInstrCost(Opcode, S->getType(), CostKind);
	};			};
				lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not convinced this modelling is correct. (which is why i didn't respond, but apparently i forgot to actually post that) If we have SCEV `x + y + 42`, `42` will be modelled as-if it's at index `2`, but we should model this as `(x + y) + 42`, because there's usually some form of an `add` that takes an immediate as second param. lebedev.ri: I'm not convinced this modelling is correct. (which is why i didn't respond, but apparently i…
				samparkerAuthorUnsubmitted Done Reply Inline Actions Ah! Right, I'll try to fix that. samparker: Ah! Right, I'll try to fix that.

	switch (S->getSCEVType()) {			switch (S->getSCEVType()) {
	default:			default:
	return 0;			return 0;
				lebedev.riUnsubmitted Not Done Reply Inline Actions Same lebedev.ri: Same
	case scTruncate:			case scTruncate:
	Cost = CastCost(Instruction::Trunc);			Cost = CastCost(Instruction::Trunc);
	break;			break;
	case scZeroExtend:			case scZeroExtend:
	Cost = CastCost(Instruction::ZExt);			Cost = CastCost(Instruction::ZExt);
	break;			break;
	case scSignExtend:			case scSignExtend:
	Cost = CastCost(Instruction::SExt);			Cost = CastCost(Instruction::SExt);
				lebedev.riUnsubmitted Not Done Reply Inline Actions And again, if we have `umin(x, y, 42)`, it's lowered as `z = (x u< y) ? x : y ; z u< 42 ? z : 42`, so you can't possibly have third operand here. lebedev.ri: And again, if we have `umin(x, y, 42)`, it's lowered as `z = (x u< y) ? x : y ; z u< 42 ? z…
	break;			break;
	case scUDivExpr: {			case scUDivExpr: {
	unsigned Opcode = Instruction::UDiv;			unsigned Opcode = Instruction::UDiv;
	if (auto *SC = dyn_cast<SCEVConstant>(S->getOperand(1)))			if (auto *SC = dyn_cast<SCEVConstant>(S->getOperand(1)))
	if (SC->getAPInt().isPowerOf2())			if (SC->getAPInt().isPowerOf2())
	Opcode = Instruction::LShr;			Opcode = Instruction::LShr;
	Cost = ArithCost(Opcode, 1);			Cost = ArithCost(Opcode, 1);
	break;			break;
	}			}
	case scAddExpr:			case scAddExpr:
	Cost = ArithCost(Instruction::Add, NumTerms - 1);			Cost = ArithCost(Instruction::Add, NumTerms - 1);
	break;			break;
	case scMulExpr:			case scMulExpr:
	// TODO: this is a very pessimistic cost modelling for Mul,			// TODO: this is a very pessimistic cost modelling for Mul,
	// because of Bin Pow algorithm actually used by the expander,			// because of Bin Pow algorithm actually used by the expander,
	// see SCEVExpander::visitMulExpr(), ExpandOpBinPowN().			// see SCEVExpander::visitMulExpr(), ExpandOpBinPowN().
	Cost = ArithCost(Instruction::Mul, NumNonZeroDegreeNonOneTerms);			Cost = ArithCost(Instruction::Mul, NumNonZeroDegreeNonOneTerms);
	break;			break;
	case scSMaxExpr:			case scSMaxExpr:
	case scUMaxExpr:			case scUMaxExpr:
	case scSMinExpr:			case scSMinExpr:
	case scUMinExpr: {			case scUMinExpr: {
	Type *OpType = S->getOperand(0)->getType();			Type *OpType = S->getOperand(0)->getType();
	Opcodes.push_back(Instruction::ICmp);			Operations.emplace_back(Instruction::ICmp, 0, 1);
	Opcodes.push_back(Instruction::Select);			Operations.emplace_back(Instruction::Select, 0, 2);
	Cost = TTI.getCmpSelInstrCost(Instruction::ICmp, OpType,			Cost = TTI.getCmpSelInstrCost(Instruction::ICmp, OpType,
	CmpInst::makeCmpResultType(OpType),			CmpInst::makeCmpResultType(OpType),
	CostKind) +			CostKind) +
	TTI.getCmpSelInstrCost(Instruction::Select, OpType,			TTI.getCmpSelInstrCost(Instruction::Select, OpType,
	CmpInst::makeCmpResultType(OpType),			CmpInst::makeCmpResultType(OpType),
	CostKind);			CostKind);
	Cost *= (S->getNumOperands() - 1);			Cost *= (S->getNumOperands() - 1);
	break;			break;
	}			}
	case scAddRecExpr: {			case scAddRecExpr: {
	assert(NumTerms >= 1 && "Polynominal should have at least one term.");			assert(NumTerms >= 1 && "Polynominal should have at least one term.");
	assert(!(*std::prev(S->operands().end()))->isZero() &&			assert(!(*std::prev(S->operands().end()))->isZero() &&
	"Last operand should not be zero");			"Last operand should not be zero");

	// Much like with normal add expr, the polynominal will require			// Much like with normal add expr, the polynominal will require
	// one less addition than the number of it's terms.			// one less addition than the number of it's terms.
	int AddCost = ArithCost(Instruction::Add, NumTerms - 1);			int AddCost = ArithCost(Instruction::Add, NumTerms - 1,
				/MinIdx/1, /MaxIdx/1);
	// Here, each one of those will require a multiplication.			// Here, each one of those will require a multiplication.
	int MulCost = ArithCost(Instruction::Mul, NumNonZeroDegreeNonOneTerms);			int MulCost = ArithCost(Instruction::Mul, NumNonZeroDegreeNonOneTerms);
	Cost = AddCost + MulCost;			Cost = AddCost + MulCost;

	// What is the degree of this polynominal?			// What is the degree of this polynominal?
	int PolyDegree = S->getNumOperands() - 1;			int PolyDegree = S->getNumOperands() - 1;
	assert(PolyDegree >= 1 && "Should be at least affine.");			assert(PolyDegree >= 1 && "Should be at least affine.");

	// The final term will be:			// The final term will be:
	// Op_{PolyDegree} * x ^ {PolyDegree}			// Op_{PolyDegree} * x ^ {PolyDegree}
				lebedev.riUnsubmitted Not Done Reply Inline Actions We are not taxing constants for right-shifts. lebedev.ri: We are not taxing constants for right-shifts.
	// Where x ^ {PolyDegree} will again require PolyDegree-1 mul operations.			// Where x ^ {PolyDegree} will again require PolyDegree-1 mul operations.
	// Note that x ^ {PolyDegree} = x * x ^ {PolyDegree-1} so charging for			// Note that x ^ {PolyDegree} = x * x ^ {PolyDegree-1} so charging for
	// x ^ {PolyDegree} will give us x ^ {2} .. x ^ {PolyDegree-1} for free.			// x ^ {PolyDegree} will give us x ^ {2} .. x ^ {PolyDegree-1} for free.
	// FIXME: this is conservatively correct, but might be overly pessimistic.			// FIXME: this is conservatively correct, but might be overly pessimistic.
	Cost += MulCost * (PolyDegree - 1);			Cost += MulCost * (PolyDegree - 1);
				break;
	}			}
	}			}

	for (unsigned Opc : Opcodes) {			for (auto &CostOp : Operations) {
	for (unsigned OpIdx = 0; OpIdx < S->getNumOperands(); ++OpIdx) {			for (auto SCEVOp : enumerate(S->operands())) {
	Worklist.emplace_back(Opc, OpIdx, S->getOperand(OpIdx));			// Clamp the index to account for multiple IR operations being chained.
				size_t MinIdx = std::max(SCEVOp.index(), CostOp.MinIdx);
				size_t OpIdx = std::min(MinIdx, CostOp.MaxIdx);
				Worklist.emplace_back(CostOp.Opcode, OpIdx, SCEVOp.value());
	}			}
	}			}
	return Cost;			return Cost;
	}			}

	bool SCEVExpander::isHighCostExpansionHelper(			bool SCEVExpander::isHighCostExpansionHelper(
	SCEVOperand &WorkItem, Loop *L, const Instruction &At, int &BudgetRemaining,			SCEVOperand &WorkItem, Loop *L, const Instruction &At, int &BudgetRemaining,
	const TargetTransformInfo &TTI, SmallPtrSetImpl<const SCEV *> &Processed,			const TargetTransformInfo &TTI, SmallPtrSetImpl<const SCEV *> &Processed,
	SmallVectorImpl<SCEVOperand> &Worklist) {			SmallVectorImpl<SCEVOperand> &Worklist) {
	if (BudgetRemaining < 0)			if (BudgetRemaining < 0)
	return true; // Already run out of budget, give up.			return true; // Already run out of budget, give up.

	const SCEV *S = WorkItem.S;			const SCEV *S = WorkItem.S;
	// Was the cost of expansion of this expression already accounted for?			// Was the cost of expansion of this expression already accounted for?
	if (!Processed.insert(S).second)			if (!isa<SCEVConstant>(S) && !Processed.insert(S).second)
	return false; // We have already accounted for this expression.			return false; // We have already accounted for this expression.

	// If we can find an existing value for this scev available at the point "At"			// If we can find an existing value for this scev available at the point "At"
	// then consider the expression cheap.			// then consider the expression cheap.
	if (getRelatedExistingExpansion(S, &At, L))			if (getRelatedExistingExpansion(S, &At, L))
	return false; // Consider the expression to be free.			return false; // Consider the expression to be free.

	switch (S->getSCEVType()) {			// Assume to be zero-cost.
	case scUnknown:			if (isa<SCEVUnknown>(S))
	case scConstant:			return false;
	return false; // Assume to be zero-cost.
	}

	TargetTransformInfo::TargetCostKind CostKind =			TargetTransformInfo::TargetCostKind CostKind =
	TargetTransformInfo::TCK_RecipThroughput;			L->getHeader()->getParent()->hasMinSize()
				? TargetTransformInfo::TCK_CodeSize
	if (isa<SCEVCastExpr>(S)) {			: TargetTransformInfo::TCK_RecipThroughput;

				if (auto *Constant = dyn_cast<SCEVConstant>(S)) {
				// Only evalulate the costs of constants when optimizing for size.
				if (CostKind != TargetTransformInfo::TCK_CodeSize)
				return 0;
				const APInt &Imm = Constant->getAPInt();
				Type *Ty = S->getType();
				BudgetRemaining -=
				TTI.getIntImmCostInst(WorkItem.ParentOpcode, WorkItem.OperandIdx,
				Imm, Ty, CostKind);
				return BudgetRemaining < 0;
				} else if (isa<SCEVCastExpr>(S)) {
	int Cost =			int Cost =
	costAndCollectOperands<SCEVCastExpr>(WorkItem, TTI, CostKind, Worklist);			costAndCollectOperands<SCEVCastExpr>(WorkItem, TTI, CostKind, Worklist);
	BudgetRemaining -= Cost;			BudgetRemaining -= Cost;
	return false; // Will answer upon next entry into this function.			return false; // Will answer upon next entry into this function.
	} else if (isa<SCEVUDivExpr>(S)) {			} else if (isa<SCEVUDivExpr>(S)) {
				lebedev.riUnsubmitted Done Reply Inline Actions Consider constants to be free unless we are optforsize lebedev.ri: Consider constants to be free unless we are optforsize
	int Cost =			int Cost =
	costAndCollectOperands<SCEVUDivExpr>(WorkItem, TTI, CostKind, Worklist);			costAndCollectOperands<SCEVUDivExpr>(WorkItem, TTI, CostKind, Worklist);

	// UDivExpr is very likely a UDiv that ScalarEvolution's HowFarToZero or			// UDivExpr is very likely a UDiv that ScalarEvolution's HowFarToZero or
				lebedev.riUnsubmitted Done Reply Inline Actions These are only used in a single place lebedev.ri: These are only used in a single place
	// HowManyLessThans produced to compute a precise expression, rather than a			// HowManyLessThans produced to compute a precise expression, rather than a
	// UDiv from the user's code. If we can't find a UDiv in the code with some			// UDiv from the user's code. If we can't find a UDiv in the code with some
	// simple searching, we need to account for it's cost.			// simple searching, we need to account for it's cost.

				lebedev.riUnsubmitted Done Reply Inline Actions This is inconsistent with every other return here. lebedev.ri: This is inconsistent with every other return here.
	// At the beginning of this function we already tried to find existing			// At the beginning of this function we already tried to find existing
	// value for plain 'S'. Now try to lookup 'S + 1' since it is common			// value for plain 'S'. Now try to lookup 'S + 1' since it is common
	// pattern involving division. This is just a simple search heuristic.			// pattern involving division. This is just a simple search heuristic.
	if (getRelatedExistingExpansion(			if (getRelatedExistingExpansion(
	SE.getAddExpr(S, SE.getConstant(S->getType(), 1)), &At, L))			SE.getAddExpr(S, SE.getConstant(S->getType(), 1)), &At, L))
				lebedev.riUnsubmitted Not Done Reply Inline Actions enumerate(NAry->operands()) lebedev.ri: enumerate(NAry->operands())
	return false; // Consider it to be free.			return false; // Consider it to be free.

	// Need to count the cost of this UDiv.			// Need to count the cost of this UDiv.
	BudgetRemaining -= Cost;			BudgetRemaining -= Cost;
	return false; // Will answer upon next entry into this function.			return false; // Will answer upon next entry into this function.
	} else if (const SCEVNAryExpr *NAry = dyn_cast<SCEVNAryExpr>(S)) {			} else if (const SCEVNAryExpr *NAry = dyn_cast<SCEVNAryExpr>(S)) {
	assert(NAry->getNumOperands() > 1 &&			assert(NAry->getNumOperands() > 1 &&
				samparkerAuthorUnsubmitted Done Reply Inline Actions Now I notice that I wouldn't have been handling AddRec expressions... So should these operands be added to the worklist for both Add and Mul or would just Add be okay? samparker: Now I notice that I wouldn't have been handling AddRec expressions... So should these operands…
				lebedev.riUnsubmitted Not Done Reply Inline Actions Given `A + Bx`, you'd want to model `A` as being at index 1, and `B` as being part of multipler (again, at index 1. And for higher orders `A + Bx + Cx^2`, again, `B` and `C` are part of multiply, and it should be modelled as `(Bx + Cx^2) + A`. So i think the generalization is that all nary operands except the first one are at index 1 of `mul`, and the first nary operand is at index 1 of `add`. lebedev.ri:* Given `A + B*x`, you'd want to model `A` as being at index 1, and `B` as being part of…
				samparkerAuthorUnsubmitted Done Reply Inline Actions Okay, thanks. samparker: Okay, thanks.
	"Nary expr should have more than 1 operand.");			"Nary expr should have more than 1 operand.");
	// The simple nary expr will require one less op (or pair of ops)			// The simple nary expr will require one less op (or pair of ops)
	// than the number of it's terms.			// than the number of it's terms.
	int Cost =			int Cost =
	costAndCollectOperands<SCEVNAryExpr>(WorkItem, TTI, CostKind, Worklist);			costAndCollectOperands<SCEVNAryExpr>(WorkItem, TTI, CostKind, Worklist);
	BudgetRemaining -= Cost;			BudgetRemaining -= Cost;
	return BudgetRemaining < 0;			return BudgetRemaining < 0;
	} else if (const auto *NAry = dyn_cast<SCEVAddRecExpr>(S)) {			} else if (const auto *NAry = dyn_cast<SCEVAddRecExpr>(S)) {
	Show All 25 Lines
	Value SCEVExpander::expandEqualPredicate(const SCEVEqualPredicate Pred,			Value SCEVExpander::expandEqualPredicate(const SCEVEqualPredicate Pred,
	Instruction *IP) {			Instruction *IP) {
	Value *Expr0 =			Value *Expr0 =
	expandCodeForImpl(Pred->getLHS(), Pred->getLHS()->getType(), IP, false);			expandCodeForImpl(Pred->getLHS(), Pred->getLHS()->getType(), IP, false);
	Value *Expr1 =			Value *Expr1 =
	expandCodeForImpl(Pred->getRHS(), Pred->getRHS()->getType(), IP, false);			expandCodeForImpl(Pred->getRHS(), Pred->getRHS()->getType(), IP, false);

	Builder.SetInsertPoint(IP);			Builder.SetInsertPoint(IP);
	auto *I = Builder.CreateICmpNE(Expr0, Expr1, "ident.check");			auto *I = Builder.CreateICmpNE(Expr0, Expr1, "ident.check");
				lebedev.riUnsubmitted Not Done Reply Inline Actions In D86072#2221367, @samparker wrote: In D76434 you highlighted by SCEVNAry expressions can have more than two operands, which would expand to a chain of operations, and the existing costs for AddRecExprs tries to account for that. But this was missing for normal Add and Mul expressions. Have I misunderstood you? Doesn't look missing to me? lebedev.ri: >>! In D86072#2221367, @samparker wrote: > In D76434 you highlighted by SCEVNAry expressions…
				samparkerAuthorUnsubmitted Done Reply Inline Actions Ah, thanks! I'm getting lost amongst the patches. samparker: Ah, thanks! I'm getting lost amongst the patches.
	return I;			return I;
	}			}
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This change here looks like an exact duplication of the change above (lines 2355 - 2362). Can this be in a helper? SjoerdMeijer: This change here looks like an exact duplication of the change above (lines 2355 - 2362). Can…
				samparkerAuthorUnsubmitted Done Reply Inline Actions Yeah, that would be nicer. samparker: Yeah, that would be nicer.

	Value SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr AR,			Value SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr AR,
				lebedev.riUnsubmitted Not Done Reply Inline Actions enumerate(NAry->operands()) lebedev.ri: enumerate(NAry->operands())
	Instruction *Loc, bool Signed) {			Instruction *Loc, bool Signed) {
	assert(AR->isAffine() && "Cannot generate RT check for "			assert(AR->isAffine() && "Cannot generate RT check for "
	"non-affine expression");			"non-affine expression");

	SCEVUnionPredicate Pred;			SCEVUnionPredicate Pred;
	const SCEV *ExitCount =			const SCEV *ExitCount =
	SE.getPredicatedBackedgeTakenCount(AR->getLoop(), Pred);			SE.getPredicatedBackedgeTakenCount(AR->getLoop(), Pred);

	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll

	Show All 12 Lines
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_092:%.]] = phi i32 [ [[INC42:%.]], [[FOR_END40:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[I_092:%.]] = phi i32 [ [[INC42:%.]], [[FOR_END40:%.*]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[PDEST_ADDR_091:%.]] = phi i32 [ [[PDEST_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PDEST:%.]], [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[PDEST_ADDR_091:%.]] = phi i32 [ [[PDEST_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PDEST:%.]], [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_090:%.]] = phi i16 [ [[PSRCA_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PSRCA:%.]], [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[PSRCA_ADDR_090:%.]] = phi i16 [ [[PSRCA_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PSRCA:%.]], [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_089:%.]] = phi i16 [ [[PSRCB_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PSRCB:%.]], [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[PSRCB_ADDR_089:%.]] = phi i16 [ [[PSRCB_ADDR_2_LCSSA:%.]], [[FOR_END40]] ], [ [[PSRCB:%.]], [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = lshr i32 [[I_092]], 2			; CHECK-NEXT: [[TMP0:%.*]] = lshr i32 [[I_092]], 2
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[TMP0]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i32 [[TMP0]], 3
	; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 2			; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 2147483644
	; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i32 [[TMP2]], 1			; CHECK-NEXT: [[CMP272:%.*]] = icmp eq i32 [[TMP0]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = lshr i32 [[I_092]], 2
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i32 [[TMP4]], 3
	; CHECK-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], 2147483644
	; CHECK-NEXT: [[CMP272:%.*]] = icmp eq i32 [[TMP4]], 0
	; CHECK-NEXT: br i1 [[CMP272]], label [[FOR_END:%.]], label [[FOR_BODY3_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP272]], label [[FOR_END:%.]], label [[FOR_BODY3_PREHEADER:%.]]
	; CHECK: for.body3.preheader:			; CHECK: for.body3.preheader:
	; CHECK-NEXT: [[XTRAITER:%.*]] = and i32 [[TMP3]], 3
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ult i32 [[TMP2]], 3
	; CHECK-NEXT: br i1 [[TMP7]], label [[FOR_END_LOOPEXIT_UNR_LCSSA:%.]], label [[FOR_BODY3_PREHEADER_NEW:%.]]
	; CHECK: for.body3.preheader.new:
	; CHECK-NEXT: [[UNROLL_ITER:%.*]] = sub i32 [[TMP3]], [[XTRAITER]]
	; CHECK-NEXT: br label [[FOR_BODY3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: [[J_076:%.]] = phi i32 [ 0, [[FOR_BODY3_PREHEADER_NEW]] ], [ [[ADD24_3:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[J_076:%.]] = phi i32 [ [[ADD24:%.]], [[FOR_BODY3]] ], [ 0, [[FOR_BODY3_PREHEADER]] ]
	; CHECK-NEXT: [[PDEST_ADDR_175:%.]] = phi i32 [ [[PDEST_ADDR_091]], [[FOR_BODY3_PREHEADER_NEW]] ], [ [[INCDEC_PTR_3:%.*]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[PDEST_ADDR_175:%.]] = phi i32 [ [[INCDEC_PTR:%.*]], [[FOR_BODY3]] ], [ [[PDEST_ADDR_091]], [[FOR_BODY3_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_174:%.]] = phi i16 [ [[PSRCA_ADDR_090]], [[FOR_BODY3_PREHEADER_NEW]] ], [ [[ADD_PTR_3:%.*]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[PSRCA_ADDR_174:%.]] = phi i16 [ [[ADD_PTR:%.*]], [[FOR_BODY3]] ], [ [[PSRCA_ADDR_090]], [[FOR_BODY3_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_173:%.]] = phi i16 [ [[PSRCB_ADDR_089]], [[FOR_BODY3_PREHEADER_NEW]] ], [ [[ADD_PTR23_3:%.*]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[PSRCB_ADDR_173:%.]] = phi i16 [ [[ADD_PTR23:%.*]], [[FOR_BODY3]] ], [ [[PSRCB_ADDR_089]], [[FOR_BODY3_PREHEADER]] ]
	; CHECK-NEXT: [[NITER:%.]] = phi i32 [ [[UNROLL_ITER]], [[FOR_BODY3_PREHEADER_NEW]] ], [ [[NITER_NSUB_3:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[TMP3:%.]] = load i16, i16 [[PSRCA_ADDR_174]], align 2
	; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[PSRCA_ADDR_174]], align 2			; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP3]] to i32
	; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP8]] to i32			; CHECK-NEXT: [[TMP4:%.]] = load i16, i16 [[PSRCB_ADDR_173]], align 2
	; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[PSRCB_ADDR_173]], align 2			; CHECK-NEXT: [[CONV5:%.*]] = sext i16 [[TMP4]] to i32
	; CHECK-NEXT: [[CONV5:%.*]] = sext i16 [[TMP9]] to i32
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV5]], [[CONV]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV5]], [[CONV]]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 1			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 1
	; CHECK-NEXT: [[TMP10:%.]] = load i16, i16 [[ARRAYIDX6]], align 2			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[ARRAYIDX6]], align 2
	; CHECK-NEXT: [[CONV7:%.*]] = sext i16 [[TMP10]] to i32			; CHECK-NEXT: [[CONV7:%.*]] = sext i16 [[TMP5]] to i32
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173]], i32 1			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173]], i32 1
	; CHECK-NEXT: [[TMP11:%.]] = load i16, i16 [[ARRAYIDX8]], align 2			; CHECK-NEXT: [[TMP6:%.]] = load i16, i16 [[ARRAYIDX8]], align 2
	; CHECK-NEXT: [[CONV9:%.*]] = sext i16 [[TMP11]] to i32			; CHECK-NEXT: [[CONV9:%.*]] = sext i16 [[TMP6]] to i32
	; CHECK-NEXT: [[MUL10:%.*]] = mul nsw i32 [[CONV9]], [[CONV7]]			; CHECK-NEXT: [[MUL10:%.*]] = mul nsw i32 [[CONV9]], [[CONV7]]
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 2			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 2
	; CHECK-NEXT: [[TMP12:%.]] = load i16, i16 [[ARRAYIDX11]], align 2			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[ARRAYIDX11]], align 2
	; CHECK-NEXT: [[CONV12:%.*]] = sext i16 [[TMP12]] to i32			; CHECK-NEXT: [[CONV12:%.*]] = sext i16 [[TMP7]] to i32
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173]], i32 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173]], i32 3
	; CHECK-NEXT: [[TMP13:%.]] = load i16, i16 [[ARRAYIDX13]], align 2			; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[ARRAYIDX13]], align 2
	; CHECK-NEXT: [[CONV14:%.*]] = sext i16 [[TMP13]] to i32			; CHECK-NEXT: [[CONV14:%.*]] = sext i16 [[TMP8]] to i32
	; CHECK-NEXT: [[MUL15:%.*]] = mul nsw i32 [[CONV14]], [[CONV12]]			; CHECK-NEXT: [[MUL15:%.*]] = mul nsw i32 [[CONV14]], [[CONV12]]
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 3			; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 3
	; CHECK-NEXT: [[TMP14:%.]] = load i16, i16 [[ARRAYIDX17]], align 2			; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[ARRAYIDX17]], align 2
	; CHECK-NEXT: [[CONV18:%.*]] = sext i16 [[TMP14]] to i32			; CHECK-NEXT: [[CONV18:%.*]] = sext i16 [[TMP9]] to i32
	; CHECK-NEXT: [[ADD21:%.*]] = add i32 [[MUL10]], [[MUL]]			; CHECK-NEXT: [[ADD21:%.*]] = add i32 [[MUL10]], [[MUL]]
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[ADD21]], [[CONV14]]			; CHECK-NEXT: [[ADD:%.*]] = add i32 [[ADD21]], [[CONV14]]
	; CHECK-NEXT: [[ADD16:%.*]] = add i32 [[ADD]], [[MUL15]]			; CHECK-NEXT: [[ADD16:%.*]] = add i32 [[ADD]], [[MUL15]]
	; CHECK-NEXT: [[ADD22:%.*]] = add i32 [[ADD16]], [[CONV18]]			; CHECK-NEXT: [[ADD22:%.*]] = add i32 [[ADD16]], [[CONV18]]
	; CHECK-NEXT: store i32 [[ADD22]], i32* [[PDEST_ADDR_175]], align 4			; CHECK-NEXT: store i32 [[ADD22]], i32* [[PDEST_ADDR_175]], align 4
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174]], i32 4			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i16, i16* [[PSRCA_ADDR_174]], i32 4
	; CHECK-NEXT: [[ADD_PTR23:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173]], i32 4			; CHECK-NEXT: [[ADD_PTR23]] = getelementptr inbounds i16, i16* [[PSRCB_ADDR_173]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[PDEST_ADDR_175]], i32 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[PDEST_ADDR_175]], i32 1
	; CHECK-NEXT: [[ADD24:%.*]] = add nuw nsw i32 [[J_076]], 4			; CHECK-NEXT: [[ADD24]] = add nuw nsw i32 [[J_076]], 4
	; CHECK-NEXT: [[NITER_NSUB:%.*]] = sub i32 [[NITER]], 1			; CHECK-NEXT: [[CMP2:%.*]] = icmp ult i32 [[ADD24]], [[TMP0]]
	; CHECK-NEXT: [[TMP15:%.]] = load i16, i16 [[ADD_PTR]], align 2			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK-NEXT: [[CONV_1:%.*]] = sext i16 [[TMP15]] to i32
	; CHECK-NEXT: [[TMP16:%.]] = load i16, i16 [[ADD_PTR23]], align 2
	; CHECK-NEXT: [[CONV5_1:%.*]] = sext i16 [[TMP16]] to i32
	; CHECK-NEXT: [[MUL_1:%.*]] = mul nsw i32 [[CONV5_1]], [[CONV_1]]
	; CHECK-NEXT: [[ARRAYIDX6_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR]], i32 1
	; CHECK-NEXT: [[TMP17:%.]] = load i16, i16 [[ARRAYIDX6_1]], align 2
	; CHECK-NEXT: [[CONV7_1:%.*]] = sext i16 [[TMP17]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23]], i32 1
	; CHECK-NEXT: [[TMP18:%.]] = load i16, i16 [[ARRAYIDX8_1]], align 2
	; CHECK-NEXT: [[CONV9_1:%.*]] = sext i16 [[TMP18]] to i32
	; CHECK-NEXT: [[MUL10_1:%.*]] = mul nsw i32 [[CONV9_1]], [[CONV7_1]]
	; CHECK-NEXT: [[ARRAYIDX11_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR]], i32 2
	; CHECK-NEXT: [[TMP19:%.]] = load i16, i16 [[ARRAYIDX11_1]], align 2
	; CHECK-NEXT: [[CONV12_1:%.*]] = sext i16 [[TMP19]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23]], i32 3
	; CHECK-NEXT: [[TMP20:%.]] = load i16, i16 [[ARRAYIDX13_1]], align 2
	; CHECK-NEXT: [[CONV14_1:%.*]] = sext i16 [[TMP20]] to i32
	; CHECK-NEXT: [[MUL15_1:%.*]] = mul nsw i32 [[CONV14_1]], [[CONV12_1]]
	; CHECK-NEXT: [[ARRAYIDX17_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR]], i32 3
	; CHECK-NEXT: [[TMP21:%.]] = load i16, i16 [[ARRAYIDX17_1]], align 2
	; CHECK-NEXT: [[CONV18_1:%.*]] = sext i16 [[TMP21]] to i32
	; CHECK-NEXT: [[ADD21_1:%.*]] = add i32 [[MUL10_1]], [[MUL_1]]
	; CHECK-NEXT: [[ADD_1:%.*]] = add i32 [[ADD21_1]], [[CONV14_1]]
	; CHECK-NEXT: [[ADD16_1:%.*]] = add i32 [[ADD_1]], [[MUL15_1]]
	; CHECK-NEXT: [[ADD22_1:%.*]] = add i32 [[ADD16_1]], [[CONV18_1]]
	; CHECK-NEXT: store i32 [[ADD22_1]], i32* [[INCDEC_PTR]], align 4
	; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_1:%.]] = getelementptr inbounds i32, i32 [[INCDEC_PTR]], i32 1
	; CHECK-NEXT: [[ADD24_1:%.*]] = add nuw nsw i32 [[ADD24]], 4
	; CHECK-NEXT: [[NITER_NSUB_1:%.*]] = sub i32 [[NITER_NSUB]], 1
	; CHECK-NEXT: [[TMP22:%.]] = load i16, i16 [[ADD_PTR_1]], align 2
	; CHECK-NEXT: [[CONV_2:%.*]] = sext i16 [[TMP22]] to i32
	; CHECK-NEXT: [[TMP23:%.]] = load i16, i16 [[ADD_PTR23_1]], align 2
	; CHECK-NEXT: [[CONV5_2:%.*]] = sext i16 [[TMP23]] to i32
	; CHECK-NEXT: [[MUL_2:%.*]] = mul nsw i32 [[CONV5_2]], [[CONV_2]]
	; CHECK-NEXT: [[ARRAYIDX6_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_1]], i32 1
	; CHECK-NEXT: [[TMP24:%.]] = load i16, i16 [[ARRAYIDX6_2]], align 2
	; CHECK-NEXT: [[CONV7_2:%.*]] = sext i16 [[TMP24]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_1]], i32 1
	; CHECK-NEXT: [[TMP25:%.]] = load i16, i16 [[ARRAYIDX8_2]], align 2
	; CHECK-NEXT: [[CONV9_2:%.*]] = sext i16 [[TMP25]] to i32
	; CHECK-NEXT: [[MUL10_2:%.*]] = mul nsw i32 [[CONV9_2]], [[CONV7_2]]
	; CHECK-NEXT: [[ARRAYIDX11_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_1]], i32 2
	; CHECK-NEXT: [[TMP26:%.]] = load i16, i16 [[ARRAYIDX11_2]], align 2
	; CHECK-NEXT: [[CONV12_2:%.*]] = sext i16 [[TMP26]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_1]], i32 3
	; CHECK-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX13_2]], align 2
	; CHECK-NEXT: [[CONV14_2:%.*]] = sext i16 [[TMP27]] to i32
	; CHECK-NEXT: [[MUL15_2:%.*]] = mul nsw i32 [[CONV14_2]], [[CONV12_2]]
	; CHECK-NEXT: [[ARRAYIDX17_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_1]], i32 3
	; CHECK-NEXT: [[TMP28:%.]] = load i16, i16 [[ARRAYIDX17_2]], align 2
	; CHECK-NEXT: [[CONV18_2:%.*]] = sext i16 [[TMP28]] to i32
	; CHECK-NEXT: [[ADD21_2:%.*]] = add i32 [[MUL10_2]], [[MUL_2]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i32 [[ADD21_2]], [[CONV14_2]]
	; CHECK-NEXT: [[ADD16_2:%.*]] = add i32 [[ADD_2]], [[MUL15_2]]
	; CHECK-NEXT: [[ADD22_2:%.*]] = add i32 [[ADD16_2]], [[CONV18_2]]
	; CHECK-NEXT: store i32 [[ADD22_2]], i32* [[INCDEC_PTR_1]], align 4
	; CHECK-NEXT: [[ADD_PTR_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_1]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_1]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_2:%.]] = getelementptr inbounds i32, i32 [[INCDEC_PTR_1]], i32 1
	; CHECK-NEXT: [[ADD24_2:%.*]] = add nuw nsw i32 [[ADD24_1]], 4
	; CHECK-NEXT: [[NITER_NSUB_2:%.*]] = sub i32 [[NITER_NSUB_1]], 1
	; CHECK-NEXT: [[TMP29:%.]] = load i16, i16 [[ADD_PTR_2]], align 2
	; CHECK-NEXT: [[CONV_3:%.*]] = sext i16 [[TMP29]] to i32
	; CHECK-NEXT: [[TMP30:%.]] = load i16, i16 [[ADD_PTR23_2]], align 2
	; CHECK-NEXT: [[CONV5_3:%.*]] = sext i16 [[TMP30]] to i32
	; CHECK-NEXT: [[MUL_3:%.*]] = mul nsw i32 [[CONV5_3]], [[CONV_3]]
	; CHECK-NEXT: [[ARRAYIDX6_3:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_2]], i32 1
	; CHECK-NEXT: [[TMP31:%.]] = load i16, i16 [[ARRAYIDX6_3]], align 2
	; CHECK-NEXT: [[CONV7_3:%.*]] = sext i16 [[TMP31]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_3:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_2]], i32 1
	; CHECK-NEXT: [[TMP32:%.]] = load i16, i16 [[ARRAYIDX8_3]], align 2
	; CHECK-NEXT: [[CONV9_3:%.*]] = sext i16 [[TMP32]] to i32
	; CHECK-NEXT: [[MUL10_3:%.*]] = mul nsw i32 [[CONV9_3]], [[CONV7_3]]
	; CHECK-NEXT: [[ARRAYIDX11_3:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_2]], i32 2
	; CHECK-NEXT: [[TMP33:%.]] = load i16, i16 [[ARRAYIDX11_3]], align 2
	; CHECK-NEXT: [[CONV12_3:%.*]] = sext i16 [[TMP33]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_3:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_2]], i32 3
	; CHECK-NEXT: [[TMP34:%.]] = load i16, i16 [[ARRAYIDX13_3]], align 2
	; CHECK-NEXT: [[CONV14_3:%.*]] = sext i16 [[TMP34]] to i32
	; CHECK-NEXT: [[MUL15_3:%.*]] = mul nsw i32 [[CONV14_3]], [[CONV12_3]]
	; CHECK-NEXT: [[ARRAYIDX17_3:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_2]], i32 3
	; CHECK-NEXT: [[TMP35:%.]] = load i16, i16 [[ARRAYIDX17_3]], align 2
	; CHECK-NEXT: [[CONV18_3:%.*]] = sext i16 [[TMP35]] to i32
	; CHECK-NEXT: [[ADD21_3:%.*]] = add i32 [[MUL10_3]], [[MUL_3]]
	; CHECK-NEXT: [[ADD_3:%.*]] = add i32 [[ADD21_3]], [[CONV14_3]]
	; CHECK-NEXT: [[ADD16_3:%.*]] = add i32 [[ADD_3]], [[MUL15_3]]
	; CHECK-NEXT: [[ADD22_3:%.*]] = add i32 [[ADD16_3]], [[CONV18_3]]
	; CHECK-NEXT: store i32 [[ADD22_3]], i32* [[INCDEC_PTR_2]], align 4
	; CHECK-NEXT: [[ADD_PTR_3]] = getelementptr inbounds i16, i16* [[ADD_PTR_2]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_3]] = getelementptr inbounds i16, i16* [[ADD_PTR23_2]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_3]] = getelementptr inbounds i32, i32* [[INCDEC_PTR_2]], i32 1
	; CHECK-NEXT: [[ADD24_3]] = add nuw nsw i32 [[ADD24_2]], 4
	; CHECK-NEXT: [[NITER_NSUB_3]] = sub i32 [[NITER_NSUB_2]], 1
	; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp ne i32 [[NITER_NSUB_3]], 0
	; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[FOR_BODY3]], label [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT:%.*]]
	; CHECK: for.end.loopexit.unr-lcssa.loopexit:
	; CHECK-NEXT: [[ADD_PTR_LCSSA_PH_PH:%.]] = phi i16 [ [[ADD_PTR_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[ADD_PTR23_LCSSA_PH_PH:%.]] = phi i16 [ [[ADD_PTR23_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INCDEC_PTR_LCSSA_PH_PH:%.]] = phi i32 [ [[INCDEC_PTR_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[J_076_UNR_PH:%.*]] = phi i32 [ [[ADD24_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[PDEST_ADDR_175_UNR_PH:%.]] = phi i32 [ [[INCDEC_PTR_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_174_UNR_PH:%.]] = phi i16 [ [[ADD_PTR_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_173_UNR_PH:%.]] = phi i16 [ [[ADD_PTR23_3]], [[FOR_BODY3]] ]
	; CHECK-NEXT: br label [[FOR_END_LOOPEXIT_UNR_LCSSA]]
	; CHECK: for.end.loopexit.unr-lcssa:
	; CHECK-NEXT: [[ADD_PTR_LCSSA_PH:%.]] = phi i16 [ undef, [[FOR_BODY3_PREHEADER]] ], [ [[ADD_PTR_LCSSA_PH_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[ADD_PTR23_LCSSA_PH:%.]] = phi i16 [ undef, [[FOR_BODY3_PREHEADER]] ], [ [[ADD_PTR23_LCSSA_PH_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[INCDEC_PTR_LCSSA_PH:%.]] = phi i32 [ undef, [[FOR_BODY3_PREHEADER]] ], [ [[INCDEC_PTR_LCSSA_PH_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[J_076_UNR:%.*]] = phi i32 [ 0, [[FOR_BODY3_PREHEADER]] ], [ [[J_076_UNR_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[PDEST_ADDR_175_UNR:%.]] = phi i32 [ [[PDEST_ADDR_091]], [[FOR_BODY3_PREHEADER]] ], [ [[PDEST_ADDR_175_UNR_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_174_UNR:%.]] = phi i16 [ [[PSRCA_ADDR_090]], [[FOR_BODY3_PREHEADER]] ], [ [[PSRCA_ADDR_174_UNR_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_173_UNR:%.]] = phi i16 [ [[PSRCB_ADDR_089]], [[FOR_BODY3_PREHEADER]] ], [ [[PSRCB_ADDR_173_UNR_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp ne i32 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[FOR_BODY3_EPIL_PREHEADER:%.]], label [[FOR_END_LOOPEXIT:%.]]
	; CHECK: for.body3.epil.preheader:
	; CHECK-NEXT: br label [[FOR_BODY3_EPIL:%.*]]
	; CHECK: for.body3.epil:
	; CHECK-NEXT: [[TMP36:%.]] = load i16, i16 [[PSRCA_ADDR_174_UNR]], align 2
	; CHECK-NEXT: [[CONV_EPIL:%.*]] = sext i16 [[TMP36]] to i32
	; CHECK-NEXT: [[TMP37:%.]] = load i16, i16 [[PSRCB_ADDR_173_UNR]], align 2
	; CHECK-NEXT: [[CONV5_EPIL:%.*]] = sext i16 [[TMP37]] to i32
	; CHECK-NEXT: [[MUL_EPIL:%.*]] = mul nsw i32 [[CONV5_EPIL]], [[CONV_EPIL]]
	; CHECK-NEXT: [[ARRAYIDX6_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174_UNR]], i32 1
	; CHECK-NEXT: [[TMP38:%.]] = load i16, i16 [[ARRAYIDX6_EPIL]], align 2
	; CHECK-NEXT: [[CONV7_EPIL:%.*]] = sext i16 [[TMP38]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173_UNR]], i32 1
	; CHECK-NEXT: [[TMP39:%.]] = load i16, i16 [[ARRAYIDX8_EPIL]], align 2
	; CHECK-NEXT: [[CONV9_EPIL:%.*]] = sext i16 [[TMP39]] to i32
	; CHECK-NEXT: [[MUL10_EPIL:%.*]] = mul nsw i32 [[CONV9_EPIL]], [[CONV7_EPIL]]
	; CHECK-NEXT: [[ARRAYIDX11_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174_UNR]], i32 2
	; CHECK-NEXT: [[TMP40:%.]] = load i16, i16 [[ARRAYIDX11_EPIL]], align 2
	; CHECK-NEXT: [[CONV12_EPIL:%.*]] = sext i16 [[TMP40]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173_UNR]], i32 3
	; CHECK-NEXT: [[TMP41:%.]] = load i16, i16 [[ARRAYIDX13_EPIL]], align 2
	; CHECK-NEXT: [[CONV14_EPIL:%.*]] = sext i16 [[TMP41]] to i32
	; CHECK-NEXT: [[MUL15_EPIL:%.*]] = mul nsw i32 [[CONV14_EPIL]], [[CONV12_EPIL]]
	; CHECK-NEXT: [[ARRAYIDX17_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174_UNR]], i32 3
	; CHECK-NEXT: [[TMP42:%.]] = load i16, i16 [[ARRAYIDX17_EPIL]], align 2
	; CHECK-NEXT: [[CONV18_EPIL:%.*]] = sext i16 [[TMP42]] to i32
	; CHECK-NEXT: [[ADD21_EPIL:%.*]] = add i32 [[MUL10_EPIL]], [[MUL_EPIL]]
	; CHECK-NEXT: [[ADD_EPIL:%.*]] = add i32 [[ADD21_EPIL]], [[CONV14_EPIL]]
	; CHECK-NEXT: [[ADD16_EPIL:%.*]] = add i32 [[ADD_EPIL]], [[MUL15_EPIL]]
	; CHECK-NEXT: [[ADD22_EPIL:%.*]] = add i32 [[ADD16_EPIL]], [[CONV18_EPIL]]
	; CHECK-NEXT: store i32 [[ADD22_EPIL]], i32* [[PDEST_ADDR_175_UNR]], align 4
	; CHECK-NEXT: [[ADD_PTR_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_174_UNR]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_EPIL:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_173_UNR]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_EPIL:%.]] = getelementptr inbounds i32, i32 [[PDEST_ADDR_175_UNR]], i32 1
	; CHECK-NEXT: [[ADD24_EPIL:%.*]] = add nuw nsw i32 [[J_076_UNR]], 4
	; CHECK-NEXT: [[EPIL_ITER_SUB:%.*]] = sub i32 [[XTRAITER]], 1
	; CHECK-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp ne i32 [[EPIL_ITER_SUB]], 0
	; CHECK-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_BODY3_EPIL_1:%.]], label [[FOR_END_LOOPEXIT_EPILOG_LCSSA:%.]]
	; CHECK: for.end.loopexit.epilog-lcssa:
	; CHECK-NEXT: [[ADD_PTR_LCSSA_PH1:%.]] = phi i16 [ [[ADD_PTR_EPIL]], [[FOR_BODY3_EPIL]] ], [ [[ADD_PTR_EPIL_1:%.]], [[FOR_BODY3_EPIL_1]] ], [ [[ADD_PTR_EPIL_2:%.]], [[FOR_BODY3_EPIL_2:%.*]] ]
	; CHECK-NEXT: [[ADD_PTR23_LCSSA_PH2:%.]] = phi i16 [ [[ADD_PTR23_EPIL]], [[FOR_BODY3_EPIL]] ], [ [[ADD_PTR23_EPIL_1:%.]], [[FOR_BODY3_EPIL_1]] ], [ [[ADD_PTR23_EPIL_2:%.]], [[FOR_BODY3_EPIL_2]] ]
	; CHECK-NEXT: [[INCDEC_PTR_LCSSA_PH3:%.]] = phi i32 [ [[INCDEC_PTR_EPIL]], [[FOR_BODY3_EPIL]] ], [ [[INCDEC_PTR_EPIL_1:%.]], [[FOR_BODY3_EPIL_1]] ], [ [[INCDEC_PTR_EPIL_2:%.]], [[FOR_BODY3_EPIL_2]] ]
	; CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: [[ADD_PTR_LCSSA:%.]] = phi i16 [ [[ADD_PTR_LCSSA_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_PTR_LCSSA_PH1]], [[FOR_END_LOOPEXIT_EPILOG_LCSSA]] ]			; CHECK-NEXT: [[ADD_PTR_LCSSA:%.]] = phi i16 [ [[ADD_PTR]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[ADD_PTR23_LCSSA:%.]] = phi i16 [ [[ADD_PTR23_LCSSA_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_PTR23_LCSSA_PH2]], [[FOR_END_LOOPEXIT_EPILOG_LCSSA]] ]			; CHECK-NEXT: [[ADD_PTR23_LCSSA:%.]] = phi i16 [ [[ADD_PTR23]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INCDEC_PTR_LCSSA:%.]] = phi i32 [ [[INCDEC_PTR_LCSSA_PH]], [[FOR_END_LOOPEXIT_UNR_LCSSA]] ], [ [[INCDEC_PTR_LCSSA_PH3]], [[FOR_END_LOOPEXIT_EPILOG_LCSSA]] ]			; CHECK-NEXT: [[INCDEC_PTR_LCSSA:%.]] = phi i32 [ [[INCDEC_PTR]], [[FOR_BODY3]] ]
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[PSRCB_ADDR_1_LCSSA:%.]] = phi i16 [ [[PSRCB_ADDR_089]], [[FOR_BODY]] ], [ [[ADD_PTR23_LCSSA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[PSRCB_ADDR_1_LCSSA:%.]] = phi i16 [ [[PSRCB_ADDR_089]], [[FOR_BODY]] ], [ [[ADD_PTR23_LCSSA]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_1_LCSSA:%.]] = phi i16 [ [[PSRCA_ADDR_090]], [[FOR_BODY]] ], [ [[ADD_PTR_LCSSA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[PSRCA_ADDR_1_LCSSA:%.]] = phi i16 [ [[PSRCA_ADDR_090]], [[FOR_BODY]] ], [ [[ADD_PTR_LCSSA]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[PDEST_ADDR_1_LCSSA:%.]] = phi i32 [ [[PDEST_ADDR_091]], [[FOR_BODY]] ], [ [[INCDEC_PTR_LCSSA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[PDEST_ADDR_1_LCSSA:%.]] = phi i32 [ [[PDEST_ADDR_091]], [[FOR_BODY]] ], [ [[INCDEC_PTR_LCSSA]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[J_0_LCSSA:%.*]] = phi i32 [ 0, [[FOR_BODY]] ], [ [[TMP6]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[J_0_LCSSA:%.*]] = phi i32 [ 0, [[FOR_BODY]] ], [ [[TMP2]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[REM:%.*]] = and i32 [[TMP4]], 3			; CHECK-NEXT: [[REM:%.*]] = and i32 [[TMP0]], 3
	; CHECK-NEXT: [[ADD25:%.*]] = or i32 [[J_0_LCSSA]], [[REM]]			; CHECK-NEXT: [[ADD25:%.*]] = or i32 [[J_0_LCSSA]], [[REM]]
	; CHECK-NEXT: [[CMP2780:%.*]] = icmp ugt i32 [[ADD25]], [[J_0_LCSSA]]			; CHECK-NEXT: [[CMP2780:%.*]] = icmp ugt i32 [[ADD25]], [[J_0_LCSSA]]
	; CHECK-NEXT: br i1 [[CMP2780]], label [[FOR_BODY29_PREHEADER:%.*]], label [[FOR_END40]]			; CHECK-NEXT: br i1 [[CMP2780]], label [[FOR_BODY29_PREHEADER:%.*]], label [[FOR_END40]]
	; CHECK: for.body29.preheader:			; CHECK: for.body29.preheader:
	; CHECK-NEXT: [[TMP43:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]
	; CHECK-NEXT: [[TMP44:%.*]] = sub i32 [[ADD25]], [[J_0_LCSSA]]
	; CHECK-NEXT: [[TMP45:%.*]] = add i32 [[ADD25]], -1
	; CHECK-NEXT: [[TMP46:%.*]] = sub i32 [[TMP45]], [[J_0_LCSSA]]
	; CHECK-NEXT: [[XTRAITER4:%.*]] = and i32 [[TMP44]], 3
	; CHECK-NEXT: [[LCMP_MOD5:%.*]] = icmp ne i32 [[XTRAITER4]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD5]], label [[FOR_BODY29_PROL_PREHEADER:%.]], label [[FOR_BODY29_PROL_LOOPEXIT:%.]]
	; CHECK: for.body29.prol.preheader:
	; CHECK-NEXT: br label [[FOR_BODY29_PROL:%.*]]
	; CHECK: for.body29.prol:
	; CHECK-NEXT: [[ARRAYIDX30_PROL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_1_LCSSA]], i32 [[J_0_LCSSA]]
	; CHECK-NEXT: [[TMP47:%.]] = load i16, i16 [[ARRAYIDX30_PROL]], align 2
	; CHECK-NEXT: [[CONV31_PROL:%.*]] = sext i16 [[TMP47]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_PROL:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_1_LCSSA]], i32 [[J_0_LCSSA]]
	; CHECK-NEXT: [[TMP48:%.]] = load i16, i16 [[ARRAYIDX32_PROL]], align 2
	; CHECK-NEXT: [[CONV33_PROL:%.*]] = sext i16 [[TMP48]] to i32
	; CHECK-NEXT: [[MUL34_PROL:%.*]] = mul nsw i32 [[CONV33_PROL]], [[CONV31_PROL]]
	; CHECK-NEXT: [[TMP49:%.]] = load i32, i32 [[PDEST_ADDR_1_LCSSA]], align 4
	; CHECK-NEXT: [[ADD35_PROL:%.*]] = add nsw i32 [[MUL34_PROL]], [[TMP49]]
	; CHECK-NEXT: store i32 [[ADD35_PROL]], i32* [[PDEST_ADDR_1_LCSSA]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_PROL:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_1_LCSSA]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_PROL:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_1_LCSSA]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_PROL:%.]] = getelementptr inbounds i32, i32 [[PDEST_ADDR_1_LCSSA]], i32 1
	; CHECK-NEXT: [[INC_PROL:%.*]] = add nuw i32 [[J_0_LCSSA]], 1
	; CHECK-NEXT: [[PROL_ITER_SUB:%.*]] = sub i32 [[XTRAITER4]], 1
	; CHECK-NEXT: [[PROL_ITER_CMP:%.*]] = icmp ne i32 [[PROL_ITER_SUB]], 0
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY29_PROL_1:%.]], label [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA:%.]]
	; CHECK: for.body29.prol.loopexit.unr-lcssa:
	; CHECK-NEXT: [[J_184_UNR_PH:%.]] = phi i32 [ [[INC_PROL]], [[FOR_BODY29_PROL]] ], [ [[INC_PROL_1:%.]], [[FOR_BODY29_PROL_1]] ], [ [[INC_PROL_2:%.]], [[FOR_BODY29_PROL_2:%.]] ]
	; CHECK-NEXT: [[PDEST_ADDR_283_UNR_PH:%.]] = phi i32 [ [[INCDEC_PTR38_PROL]], [[FOR_BODY29_PROL]] ], [ [[INCDEC_PTR38_PROL_1:%.]], [[FOR_BODY29_PROL_1]] ], [ [[INCDEC_PTR38_PROL_2:%.]], [[FOR_BODY29_PROL_2]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_282_UNR_PH:%.]] = phi i16 [ [[INCDEC_PTR36_PROL]], [[FOR_BODY29_PROL]] ], [ [[INCDEC_PTR36_PROL_1:%.]], [[FOR_BODY29_PROL_1]] ], [ [[INCDEC_PTR36_PROL_2:%.]], [[FOR_BODY29_PROL_2]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_281_UNR_PH:%.]] = phi i16 [ [[INCDEC_PTR37_PROL]], [[FOR_BODY29_PROL]] ], [ [[INCDEC_PTR37_PROL_1:%.]], [[FOR_BODY29_PROL_1]] ], [ [[INCDEC_PTR37_PROL_2:%.]], [[FOR_BODY29_PROL_2]] ]
	; CHECK-NEXT: br label [[FOR_BODY29_PROL_LOOPEXIT]]
	; CHECK: for.body29.prol.loopexit:
	; CHECK-NEXT: [[J_184_UNR:%.*]] = phi i32 [ [[J_0_LCSSA]], [[FOR_BODY29_PREHEADER]] ], [ [[J_184_UNR_PH]], [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[PDEST_ADDR_283_UNR:%.]] = phi i32 [ [[PDEST_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ], [ [[PDEST_ADDR_283_UNR_PH]], [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_282_UNR:%.]] = phi i16 [ [[PSRCA_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ], [ [[PSRCA_ADDR_282_UNR_PH]], [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_281_UNR:%.]] = phi i16 [ [[PSRCB_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ], [ [[PSRCB_ADDR_281_UNR_PH]], [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[TMP50:%.*]] = icmp ult i32 [[TMP46]], 3
	; CHECK-NEXT: br i1 [[TMP50]], label [[FOR_END40_LOOPEXIT:%.]], label [[FOR_BODY29_PREHEADER_NEW:%.]]
	; CHECK: for.body29.preheader.new:
	; CHECK-NEXT: br label [[FOR_BODY29:%.*]]			; CHECK-NEXT: br label [[FOR_BODY29:%.*]]
	; CHECK: for.body29:			; CHECK: for.body29:
	; CHECK-NEXT: [[J_184:%.]] = phi i32 [ [[J_184_UNR]], [[FOR_BODY29_PREHEADER_NEW]] ], [ [[INC_3:%.]], [[FOR_BODY29]] ]			; CHECK-NEXT: [[J_184:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY29]] ], [ [[J_0_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
	; CHECK-NEXT: [[PDEST_ADDR_283:%.]] = phi i32 [ [[PDEST_ADDR_283_UNR]], [[FOR_BODY29_PREHEADER_NEW]] ], [ [[INCDEC_PTR38_3:%.*]], [[FOR_BODY29]] ]			; CHECK-NEXT: [[PDEST_ADDR_283:%.]] = phi i32 [ [[INCDEC_PTR38:%.*]], [[FOR_BODY29]] ], [ [[PDEST_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_282:%.]] = phi i16 [ [[PSRCA_ADDR_282_UNR]], [[FOR_BODY29_PREHEADER_NEW]] ], [ [[INCDEC_PTR36_3:%.*]], [[FOR_BODY29]] ]			; CHECK-NEXT: [[PSRCA_ADDR_282:%.]] = phi i16 [ [[INCDEC_PTR36:%.*]], [[FOR_BODY29]] ], [ [[PSRCA_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
	; CHECK-NEXT: [[PSRCB_ADDR_281:%.]] = phi i16 [ [[PSRCB_ADDR_281_UNR]], [[FOR_BODY29_PREHEADER_NEW]] ], [ [[INCDEC_PTR37_3:%.*]], [[FOR_BODY29]] ]			; CHECK-NEXT: [[PSRCB_ADDR_281:%.]] = phi i16 [ [[INCDEC_PTR37:%.*]], [[FOR_BODY29]] ], [ [[PSRCB_ADDR_1_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_282]], i32 [[J_184]]			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_282]], i32 [[J_184]]
	; CHECK-NEXT: [[TMP51:%.]] = load i16, i16 [[ARRAYIDX30]], align 2			; CHECK-NEXT: [[TMP11:%.]] = load i16, i16 [[ARRAYIDX30]], align 2
	; CHECK-NEXT: [[CONV31:%.*]] = sext i16 [[TMP51]] to i32			; CHECK-NEXT: [[CONV31:%.*]] = sext i16 [[TMP11]] to i32
	; CHECK-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_281]], i32 [[J_184]]			; CHECK-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_281]], i32 [[J_184]]
	; CHECK-NEXT: [[TMP52:%.]] = load i16, i16 [[ARRAYIDX32]], align 2			; CHECK-NEXT: [[TMP12:%.]] = load i16, i16 [[ARRAYIDX32]], align 2
	; CHECK-NEXT: [[CONV33:%.*]] = sext i16 [[TMP52]] to i32			; CHECK-NEXT: [[CONV33:%.*]] = sext i16 [[TMP12]] to i32
	; CHECK-NEXT: [[MUL34:%.*]] = mul nsw i32 [[CONV33]], [[CONV31]]			; CHECK-NEXT: [[MUL34:%.*]] = mul nsw i32 [[CONV33]], [[CONV31]]
	; CHECK-NEXT: [[TMP53:%.]] = load i32, i32 [[PDEST_ADDR_283]], align 4			; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 [[PDEST_ADDR_283]], align 4
	; CHECK-NEXT: [[ADD35:%.*]] = add nsw i32 [[MUL34]], [[TMP53]]			; CHECK-NEXT: [[ADD35:%.*]] = add nsw i32 [[MUL34]], [[TMP13]]
	; CHECK-NEXT: store i32 [[ADD35]], i32* [[PDEST_ADDR_283]], align 4			; CHECK-NEXT: store i32 [[ADD35]], i32* [[PDEST_ADDR_283]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36:%.]] = getelementptr inbounds i16, i16 [[PSRCA_ADDR_282]], i32 1			; CHECK-NEXT: [[INCDEC_PTR36]] = getelementptr inbounds i16, i16* [[PSRCA_ADDR_282]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37:%.]] = getelementptr inbounds i16, i16 [[PSRCB_ADDR_281]], i32 1			; CHECK-NEXT: [[INCDEC_PTR37]] = getelementptr inbounds i16, i16* [[PSRCB_ADDR_281]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38:%.]] = getelementptr inbounds i32, i32 [[PDEST_ADDR_283]], i32 1			; CHECK-NEXT: [[INCDEC_PTR38]] = getelementptr inbounds i32, i32* [[PDEST_ADDR_283]], i32 1
	; CHECK-NEXT: [[INC:%.*]] = add nuw i32 [[J_184]], 1			; CHECK-NEXT: [[INC]] = add nuw i32 [[J_184]], 1
	; CHECK-NEXT: [[ARRAYIDX30_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36]], i32 [[INC]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ADD25]]
	; CHECK-NEXT: [[TMP54:%.]] = load i16, i16 [[ARRAYIDX30_1]], align 2			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END40_LOOPEXIT:%.*]], label [[FOR_BODY29]]
	; CHECK-NEXT: [[CONV31_1:%.*]] = sext i16 [[TMP54]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37]], i32 [[INC]]
	; CHECK-NEXT: [[TMP55:%.]] = load i16, i16 [[ARRAYIDX32_1]], align 2
	; CHECK-NEXT: [[CONV33_1:%.*]] = sext i16 [[TMP55]] to i32
	; CHECK-NEXT: [[MUL34_1:%.*]] = mul nsw i32 [[CONV33_1]], [[CONV31_1]]
	; CHECK-NEXT: [[TMP56:%.]] = load i32, i32 [[INCDEC_PTR38]], align 4
	; CHECK-NEXT: [[ADD35_1:%.*]] = add nsw i32 [[MUL34_1]], [[TMP56]]
	; CHECK-NEXT: store i32 [[ADD35_1]], i32* [[INCDEC_PTR38]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_1:%.]] = getelementptr inbounds i32, i32 [[INCDEC_PTR38]], i32 1
	; CHECK-NEXT: [[INC_1:%.*]] = add nuw i32 [[INC]], 1
	; CHECK-NEXT: [[ARRAYIDX30_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36_1]], i32 [[INC_1]]
	; CHECK-NEXT: [[TMP57:%.]] = load i16, i16 [[ARRAYIDX30_2]], align 2
	; CHECK-NEXT: [[CONV31_2:%.*]] = sext i16 [[TMP57]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37_1]], i32 [[INC_1]]
	; CHECK-NEXT: [[TMP58:%.]] = load i16, i16 [[ARRAYIDX32_2]], align 2
	; CHECK-NEXT: [[CONV33_2:%.*]] = sext i16 [[TMP58]] to i32
	; CHECK-NEXT: [[MUL34_2:%.*]] = mul nsw i32 [[CONV33_2]], [[CONV31_2]]
	; CHECK-NEXT: [[TMP59:%.]] = load i32, i32 [[INCDEC_PTR38_1]], align 4
	; CHECK-NEXT: [[ADD35_2:%.*]] = add nsw i32 [[MUL34_2]], [[TMP59]]
	; CHECK-NEXT: store i32 [[ADD35_2]], i32* [[INCDEC_PTR38_1]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36_1]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37_1]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_2:%.]] = getelementptr inbounds i32, i32 [[INCDEC_PTR38_1]], i32 1
	; CHECK-NEXT: [[INC_2:%.*]] = add nuw i32 [[INC_1]], 1
	; CHECK-NEXT: [[ARRAYIDX30_3:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36_2]], i32 [[INC_2]]
	; CHECK-NEXT: [[TMP60:%.]] = load i16, i16 [[ARRAYIDX30_3]], align 2
	; CHECK-NEXT: [[CONV31_3:%.*]] = sext i16 [[TMP60]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_3:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37_2]], i32 [[INC_2]]
	; CHECK-NEXT: [[TMP61:%.]] = load i16, i16 [[ARRAYIDX32_3]], align 2
	; CHECK-NEXT: [[CONV33_3:%.*]] = sext i16 [[TMP61]] to i32
	; CHECK-NEXT: [[MUL34_3:%.*]] = mul nsw i32 [[CONV33_3]], [[CONV31_3]]
	; CHECK-NEXT: [[TMP62:%.]] = load i32, i32 [[INCDEC_PTR38_2]], align 4
	; CHECK-NEXT: [[ADD35_3:%.*]] = add nsw i32 [[MUL34_3]], [[TMP62]]
	; CHECK-NEXT: store i32 [[ADD35_3]], i32* [[INCDEC_PTR38_2]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_3]] = getelementptr inbounds i16, i16* [[INCDEC_PTR36_2]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_3]] = getelementptr inbounds i16, i16* [[INCDEC_PTR37_2]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_3]] = getelementptr inbounds i32, i32* [[INCDEC_PTR38_2]], i32 1
	; CHECK-NEXT: [[INC_3]] = add nuw i32 [[INC_2]], 1
	; CHECK-NEXT: [[EXITCOND_3:%.*]] = icmp eq i32 [[INC_3]], [[ADD25]]
	; CHECK-NEXT: br i1 [[EXITCOND_3]], label [[FOR_END40_LOOPEXIT_UNR_LCSSA:%.*]], label [[FOR_BODY29]]
	; CHECK: for.end40.loopexit.unr-lcssa:
	; CHECK-NEXT: br label [[FOR_END40_LOOPEXIT]]
	; CHECK: for.end40.loopexit:			; CHECK: for.end40.loopexit:
	; CHECK-NEXT: [[SCEVGEP93:%.]] = getelementptr i16, i16 [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP43]]			; CHECK-NEXT: [[SCEVGEP93:%.]] = getelementptr i16, i16 [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP10]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i16, i16 [[PSRCA_ADDR_1_LCSSA]], i32 [[TMP43]]			; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i16, i16 [[PSRCA_ADDR_1_LCSSA]], i32 [[TMP10]]
	; CHECK-NEXT: [[SCEVGEP94:%.]] = getelementptr i32, i32 [[PDEST_ADDR_1_LCSSA]], i32 [[TMP43]]			; CHECK-NEXT: [[SCEVGEP94:%.]] = getelementptr i32, i32 [[PDEST_ADDR_1_LCSSA]], i32 [[TMP10]]
	; CHECK-NEXT: br label [[FOR_END40]]			; CHECK-NEXT: br label [[FOR_END40]]
	; CHECK: for.end40:			; CHECK: for.end40:
	; CHECK-NEXT: [[PSRCB_ADDR_2_LCSSA]] = phi i16* [ [[PSRCB_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP93]], [[FOR_END40_LOOPEXIT]] ]			; CHECK-NEXT: [[PSRCB_ADDR_2_LCSSA]] = phi i16* [ [[PSRCB_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP93]], [[FOR_END40_LOOPEXIT]] ]
	; CHECK-NEXT: [[PSRCA_ADDR_2_LCSSA]] = phi i16* [ [[PSRCA_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP]], [[FOR_END40_LOOPEXIT]] ]			; CHECK-NEXT: [[PSRCA_ADDR_2_LCSSA]] = phi i16* [ [[PSRCA_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP]], [[FOR_END40_LOOPEXIT]] ]
	; CHECK-NEXT: [[PDEST_ADDR_2_LCSSA]] = phi i32* [ [[PDEST_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP94]], [[FOR_END40_LOOPEXIT]] ]			; CHECK-NEXT: [[PDEST_ADDR_2_LCSSA]] = phi i32* [ [[PDEST_ADDR_1_LCSSA]], [[FOR_END]] ], [ [[SCEVGEP94]], [[FOR_END40_LOOPEXIT]] ]
	; CHECK-NEXT: [[INC42]] = add nuw i32 [[I_092]], 1			; CHECK-NEXT: [[INC42]] = add nuw i32 [[I_092]], 1
	; CHECK-NEXT: [[EXITCOND95:%.*]] = icmp eq i32 [[INC42]], [[BLKCNT]]			; CHECK-NEXT: [[EXITCOND95:%.*]] = icmp eq i32 [[INC42]], [[BLKCNT]]
	; CHECK-NEXT: br i1 [[EXITCOND95]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND95]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: for.body3.epil.1:
	; CHECK-NEXT: [[TMP63:%.]] = load i16, i16 [[ADD_PTR_EPIL]], align 2
	; CHECK-NEXT: [[CONV_EPIL_1:%.*]] = sext i16 [[TMP63]] to i32
	; CHECK-NEXT: [[TMP64:%.]] = load i16, i16 [[ADD_PTR23_EPIL]], align 2
	; CHECK-NEXT: [[CONV5_EPIL_1:%.*]] = sext i16 [[TMP64]] to i32
	; CHECK-NEXT: [[MUL_EPIL_1:%.*]] = mul nsw i32 [[CONV5_EPIL_1]], [[CONV_EPIL_1]]
	; CHECK-NEXT: [[ARRAYIDX6_EPIL_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL]], i32 1
	; CHECK-NEXT: [[TMP65:%.]] = load i16, i16 [[ARRAYIDX6_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV7_EPIL_1:%.*]] = sext i16 [[TMP65]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_EPIL_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_EPIL]], i32 1
	; CHECK-NEXT: [[TMP66:%.]] = load i16, i16 [[ARRAYIDX8_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV9_EPIL_1:%.*]] = sext i16 [[TMP66]] to i32
	; CHECK-NEXT: [[MUL10_EPIL_1:%.*]] = mul nsw i32 [[CONV9_EPIL_1]], [[CONV7_EPIL_1]]
	; CHECK-NEXT: [[ARRAYIDX11_EPIL_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL]], i32 2
	; CHECK-NEXT: [[TMP67:%.]] = load i16, i16 [[ARRAYIDX11_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV12_EPIL_1:%.*]] = sext i16 [[TMP67]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_EPIL_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_EPIL]], i32 3
	; CHECK-NEXT: [[TMP68:%.]] = load i16, i16 [[ARRAYIDX13_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV14_EPIL_1:%.*]] = sext i16 [[TMP68]] to i32
	; CHECK-NEXT: [[MUL15_EPIL_1:%.*]] = mul nsw i32 [[CONV14_EPIL_1]], [[CONV12_EPIL_1]]
	; CHECK-NEXT: [[ARRAYIDX17_EPIL_1:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL]], i32 3
	; CHECK-NEXT: [[TMP69:%.]] = load i16, i16 [[ARRAYIDX17_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV18_EPIL_1:%.*]] = sext i16 [[TMP69]] to i32
	; CHECK-NEXT: [[ADD21_EPIL_1:%.*]] = add i32 [[MUL10_EPIL_1]], [[MUL_EPIL_1]]
	; CHECK-NEXT: [[ADD_EPIL_1:%.*]] = add i32 [[ADD21_EPIL_1]], [[CONV14_EPIL_1]]
	; CHECK-NEXT: [[ADD16_EPIL_1:%.*]] = add i32 [[ADD_EPIL_1]], [[MUL15_EPIL_1]]
	; CHECK-NEXT: [[ADD22_EPIL_1:%.*]] = add i32 [[ADD16_EPIL_1]], [[CONV18_EPIL_1]]
	; CHECK-NEXT: store i32 [[ADD22_EPIL_1]], i32* [[INCDEC_PTR_EPIL]], align 4
	; CHECK-NEXT: [[ADD_PTR_EPIL_1]] = getelementptr inbounds i16, i16* [[ADD_PTR_EPIL]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_EPIL_1]] = getelementptr inbounds i16, i16* [[ADD_PTR23_EPIL]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_EPIL_1]] = getelementptr inbounds i32, i32* [[INCDEC_PTR_EPIL]], i32 1
	; CHECK-NEXT: [[ADD24_EPIL_1:%.*]] = add nuw nsw i32 [[ADD24_EPIL]], 4
	; CHECK-NEXT: [[EPIL_ITER_SUB_1:%.*]] = sub i32 [[EPIL_ITER_SUB]], 1
	; CHECK-NEXT: [[EPIL_ITER_CMP_1:%.*]] = icmp ne i32 [[EPIL_ITER_SUB_1]], 0
	; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_1]], label [[FOR_BODY3_EPIL_2]], label [[FOR_END_LOOPEXIT_EPILOG_LCSSA]]
	; CHECK: for.body3.epil.2:
	; CHECK-NEXT: [[TMP70:%.]] = load i16, i16 [[ADD_PTR_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV_EPIL_2:%.*]] = sext i16 [[TMP70]] to i32
	; CHECK-NEXT: [[TMP71:%.]] = load i16, i16 [[ADD_PTR23_EPIL_1]], align 2
	; CHECK-NEXT: [[CONV5_EPIL_2:%.*]] = sext i16 [[TMP71]] to i32
	; CHECK-NEXT: [[MUL_EPIL_2:%.*]] = mul nsw i32 [[CONV5_EPIL_2]], [[CONV_EPIL_2]]
	; CHECK-NEXT: [[ARRAYIDX6_EPIL_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL_1]], i32 1
	; CHECK-NEXT: [[TMP72:%.]] = load i16, i16 [[ARRAYIDX6_EPIL_2]], align 2
	; CHECK-NEXT: [[CONV7_EPIL_2:%.*]] = sext i16 [[TMP72]] to i32
	; CHECK-NEXT: [[ARRAYIDX8_EPIL_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_EPIL_1]], i32 1
	; CHECK-NEXT: [[TMP73:%.]] = load i16, i16 [[ARRAYIDX8_EPIL_2]], align 2
	; CHECK-NEXT: [[CONV9_EPIL_2:%.*]] = sext i16 [[TMP73]] to i32
	; CHECK-NEXT: [[MUL10_EPIL_2:%.*]] = mul nsw i32 [[CONV9_EPIL_2]], [[CONV7_EPIL_2]]
	; CHECK-NEXT: [[ARRAYIDX11_EPIL_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL_1]], i32 2
	; CHECK-NEXT: [[TMP74:%.]] = load i16, i16 [[ARRAYIDX11_EPIL_2]], align 2
	; CHECK-NEXT: [[CONV12_EPIL_2:%.*]] = sext i16 [[TMP74]] to i32
	; CHECK-NEXT: [[ARRAYIDX13_EPIL_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR23_EPIL_1]], i32 3
	; CHECK-NEXT: [[TMP75:%.]] = load i16, i16 [[ARRAYIDX13_EPIL_2]], align 2
	; CHECK-NEXT: [[CONV14_EPIL_2:%.*]] = sext i16 [[TMP75]] to i32
	; CHECK-NEXT: [[MUL15_EPIL_2:%.*]] = mul nsw i32 [[CONV14_EPIL_2]], [[CONV12_EPIL_2]]
	; CHECK-NEXT: [[ARRAYIDX17_EPIL_2:%.]] = getelementptr inbounds i16, i16 [[ADD_PTR_EPIL_1]], i32 3
	; CHECK-NEXT: [[TMP76:%.]] = load i16, i16 [[ARRAYIDX17_EPIL_2]], align 2
	; CHECK-NEXT: [[CONV18_EPIL_2:%.*]] = sext i16 [[TMP76]] to i32
	; CHECK-NEXT: [[ADD21_EPIL_2:%.*]] = add i32 [[MUL10_EPIL_2]], [[MUL_EPIL_2]]
	; CHECK-NEXT: [[ADD_EPIL_2:%.*]] = add i32 [[ADD21_EPIL_2]], [[CONV14_EPIL_2]]
	; CHECK-NEXT: [[ADD16_EPIL_2:%.*]] = add i32 [[ADD_EPIL_2]], [[MUL15_EPIL_2]]
	; CHECK-NEXT: [[ADD22_EPIL_2:%.*]] = add i32 [[ADD16_EPIL_2]], [[CONV18_EPIL_2]]
	; CHECK-NEXT: store i32 [[ADD22_EPIL_2]], i32* [[INCDEC_PTR_EPIL_1]], align 4
	; CHECK-NEXT: [[ADD_PTR_EPIL_2]] = getelementptr inbounds i16, i16* [[ADD_PTR_EPIL_1]], i32 4
	; CHECK-NEXT: [[ADD_PTR23_EPIL_2]] = getelementptr inbounds i16, i16* [[ADD_PTR23_EPIL_1]], i32 4
	; CHECK-NEXT: [[INCDEC_PTR_EPIL_2]] = getelementptr inbounds i32, i32* [[INCDEC_PTR_EPIL_1]], i32 1
	; CHECK-NEXT: [[ADD24_EPIL_2:%.*]] = add nuw nsw i32 [[ADD24_EPIL_1]], 4
	; CHECK-NEXT: [[EPIL_ITER_SUB_2:%.*]] = sub i32 [[EPIL_ITER_SUB_1]], 1
	; CHECK-NEXT: br label [[FOR_END_LOOPEXIT_EPILOG_LCSSA]]
	; CHECK: for.body29.prol.1:
	; CHECK-NEXT: [[ARRAYIDX30_PROL_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36_PROL]], i32 [[INC_PROL]]
	; CHECK-NEXT: [[TMP77:%.]] = load i16, i16 [[ARRAYIDX30_PROL_1]], align 2
	; CHECK-NEXT: [[CONV31_PROL_1:%.*]] = sext i16 [[TMP77]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_PROL_1:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37_PROL]], i32 [[INC_PROL]]
	; CHECK-NEXT: [[TMP78:%.]] = load i16, i16 [[ARRAYIDX32_PROL_1]], align 2
	; CHECK-NEXT: [[CONV33_PROL_1:%.*]] = sext i16 [[TMP78]] to i32
	; CHECK-NEXT: [[MUL34_PROL_1:%.*]] = mul nsw i32 [[CONV33_PROL_1]], [[CONV31_PROL_1]]
	; CHECK-NEXT: [[TMP79:%.]] = load i32, i32 [[INCDEC_PTR38_PROL]], align 4
	; CHECK-NEXT: [[ADD35_PROL_1:%.*]] = add nsw i32 [[MUL34_PROL_1]], [[TMP79]]
	; CHECK-NEXT: store i32 [[ADD35_PROL_1]], i32* [[INCDEC_PTR38_PROL]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_PROL_1]] = getelementptr inbounds i16, i16* [[INCDEC_PTR36_PROL]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_PROL_1]] = getelementptr inbounds i16, i16* [[INCDEC_PTR37_PROL]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_PROL_1]] = getelementptr inbounds i32, i32* [[INCDEC_PTR38_PROL]], i32 1
	; CHECK-NEXT: [[INC_PROL_1]] = add nuw i32 [[INC_PROL]], 1
	; CHECK-NEXT: [[PROL_ITER_SUB_1:%.*]] = sub i32 [[PROL_ITER_SUB]], 1
	; CHECK-NEXT: [[PROL_ITER_CMP_1:%.*]] = icmp ne i32 [[PROL_ITER_SUB_1]], 0
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP_1]], label [[FOR_BODY29_PROL_2]], label [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]]
	; CHECK: for.body29.prol.2:
	; CHECK-NEXT: [[ARRAYIDX30_PROL_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR36_PROL_1]], i32 [[INC_PROL_1]]
	; CHECK-NEXT: [[TMP80:%.]] = load i16, i16 [[ARRAYIDX30_PROL_2]], align 2
	; CHECK-NEXT: [[CONV31_PROL_2:%.*]] = sext i16 [[TMP80]] to i32
	; CHECK-NEXT: [[ARRAYIDX32_PROL_2:%.]] = getelementptr inbounds i16, i16 [[INCDEC_PTR37_PROL_1]], i32 [[INC_PROL_1]]
	; CHECK-NEXT: [[TMP81:%.]] = load i16, i16 [[ARRAYIDX32_PROL_2]], align 2
	; CHECK-NEXT: [[CONV33_PROL_2:%.*]] = sext i16 [[TMP81]] to i32
	; CHECK-NEXT: [[MUL34_PROL_2:%.*]] = mul nsw i32 [[CONV33_PROL_2]], [[CONV31_PROL_2]]
	; CHECK-NEXT: [[TMP82:%.]] = load i32, i32 [[INCDEC_PTR38_PROL_1]], align 4
	; CHECK-NEXT: [[ADD35_PROL_2:%.*]] = add nsw i32 [[MUL34_PROL_2]], [[TMP82]]
	; CHECK-NEXT: store i32 [[ADD35_PROL_2]], i32* [[INCDEC_PTR38_PROL_1]], align 4
	; CHECK-NEXT: [[INCDEC_PTR36_PROL_2]] = getelementptr inbounds i16, i16* [[INCDEC_PTR36_PROL_1]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR37_PROL_2]] = getelementptr inbounds i16, i16* [[INCDEC_PTR37_PROL_1]], i32 1
	; CHECK-NEXT: [[INCDEC_PTR38_PROL_2]] = getelementptr inbounds i32, i32* [[INCDEC_PTR38_PROL_1]], i32 1
	; CHECK-NEXT: [[INC_PROL_2]] = add nuw i32 [[INC_PROL_1]], 1
	; CHECK-NEXT: [[PROL_ITER_SUB_2:%.*]] = sub i32 [[PROL_ITER_SUB_1]], 1
	; CHECK-NEXT: br label [[FOR_BODY29_PROL_LOOPEXIT_UNR_LCSSA]]
	;			;
	entry:			entry:
	%cmp88 = icmp eq i32 %blkCnt, 0			%cmp88 = icmp eq i32 %blkCnt, 0
	br i1 %cmp88, label %for.cond.cleanup, label %for.body			br i1 %cmp88, label %for.cond.cleanup, label %for.body

	for.cond.cleanup: ; preds = %for.end40, %entry			for.cond.cleanup: ; preds = %for.end40, %entry
	ret void			ret void

	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	for.end40: ; preds = %for.end40.loopexit, %for.end			for.end40: ; preds = %for.end40.loopexit, %for.end
	%pSrcB.addr.2.lcssa = phi i16* [ %pSrcB.addr.1.lcssa, %for.end ], [ %scevgep93, %for.end40.loopexit ]			%pSrcB.addr.2.lcssa = phi i16* [ %pSrcB.addr.1.lcssa, %for.end ], [ %scevgep93, %for.end40.loopexit ]
	%pSrcA.addr.2.lcssa = phi i16* [ %pSrcA.addr.1.lcssa, %for.end ], [ %scevgep, %for.end40.loopexit ]			%pSrcA.addr.2.lcssa = phi i16* [ %pSrcA.addr.1.lcssa, %for.end ], [ %scevgep, %for.end40.loopexit ]
	%pDest.addr.2.lcssa = phi i32* [ %pDest.addr.1.lcssa, %for.end ], [ %scevgep94, %for.end40.loopexit ]			%pDest.addr.2.lcssa = phi i32* [ %pDest.addr.1.lcssa, %for.end ], [ %scevgep94, %for.end40.loopexit ]
	%inc42 = add nuw i32 %i.092, 1			%inc42 = add nuw i32 %i.092, 1
	%exitcond95 = icmp eq i32 %inc42, %blkCnt			%exitcond95 = icmp eq i32 %inc42, %blkCnt
	br i1 %exitcond95, label %for.cond.cleanup, label %for.body			br i1 %exitcond95, label %for.cond.cleanup, label %for.body
	}			}

				attributes #0 = { minsize optsize }