As code size is the only thing we care about at minsize, query the cost of materialising immediates when calculating the cost of a SCEV expansion. We also modify the CostKind to TCK_CodeSize for minsize, instead of RecipThroughput. This gives -0.1% geomean reduction of the llvm test suite at -Oz for both thumbv7a and aarch64.
Details
Diff Detail
Event Timeline
Hmm, i'm not a ventillator. Some initial thoughts.
- This is rather pessimistic. If we really want to do this, we need to use TargetTransformInfo::getIntImmCostInst().
- What cost does that model? I'm under impression that TargetTransformInfo::getIntImmCost*() model TargetCostKind::TCK_CodeSize, since it is mainly used in ConstantHoistingPass. Here we model TargetCostKind::TCK_RecipThroughput.
- Assuming that TargetTransformInfo::getIntImmCost*() actually models TargetCostKind::TCK_RecipThroughput, i believe, all X86 changes should not be here. Either most of them will get fixed via using TargetTransformInfo::getIntImmCostInst(), or the llvm::SCEVCheapExpansionBudget will need to be bumped.
@samparker are you trying to mitigate perf impact for in-order CPU?
I wonder if D73501 simply is counter-productive then.
Though yes, vectorizers need to be taught that trick.
Thanks for taking a look.
This is rather pessimistic. If we really want to do this, we need to use TargetTransformInfo::getIntImmCostInst().
This sounds good and what I was also considering.
What cost does that model?
I was under the impression that throughput/codesize costs for immediates would be highly correlated, where a 'high cost' constant would introduce instruction(s) to generate it, increasing code size and reducing throughput. As @spatel said in D76124, the lines have become blurred but we should be modelling at least something here.
I wonder if D73501 simply is counter-productive then.
I would be lying if I said that the patch didn't cause a whole world of pain :) But I would like to try to resolve this, if we can, by modelling costs better. The SCEV changes may have just broken our we do unrolling for our little microcontrollers, so I'll be looking at ARM TTI too.
Do we model rthrougput for constants elsewhere?
I may be wrong, but i don't recall seeing any such modelling previously.
Let's take a step back here. How about we simply revert D73501?
Do we model rthrougput for constants elsewhere?
I guess that depends on who 'we' are because, again, different backends will be modelling different things. Why the focus on throughput now anyway? When this code was using getOperationCost, it would have been getting some performance/code size cost, which I expect most backends will be modelling for constants. And I'm not really sure why throughput is specifically important here, we're not concerning ourselves with the throughput of casts and compares, right?
Let's take a step back here. How about we simply revert D73501?
Sounds good to me, although I still need to prod around here to see if codegen risk being erratic in the future.
Sorry, i meant llvm transforms in general, not backends.
Why the focus on throughput now anyway? When this code was using getOperationCost,
it would have been getting some performance/code size cost,
which I expect most backends will be modelling for constants.
And I'm not really sure why throughput is specifically important here,
we're not concerning ourselves with the throughput of casts and compares, right?
We have 3 cost models - latency, size and rthroughput.
My aspirational goal in this scevexpander budget was: "how much more computations are we willing to do without it being too much of a burden?"
Size model isn't really applicable here - we *could* lower any sequence into a libcall, regardless of it's native instruction count.
Likewise i'm not sure we're really after latency here, which leaves us with rthroughput.
But we can't subtract oranges from cucumbers, so all cost modelling should be consistently using rthroughput cost model.
Thus i'm asking, what does getIntImmCost() model? rthroughput or size?
Let's take a step back here. How about we simply revert D73501?
Sounds good to me, although I still need to prod around here to see if codegen risk being erratic in the future.
I'm pretty sure there's only two APIs that were designed around throughput: 'getInstructionThroughput' and 'getArithmeticInstrCost' and maybe the latter should be named more explicitly. The code size calls are generally a bit more explicit, so there's 'getIntImmCodeSizeCost', which is not the one I've used.
"how much more computations are we willing to do without it being too much of a burden?"
And this also depends what we're considering as a 'burden', and another angle that I'll probably need to look at in the near future is the burden of code size... Either way, this code shouldn't assume that constants are free (in any sense of the term).
Introduced a lambda to look at an expressions operand and, if it's a constant, query for cost using getIntImmCostInst.
The API is there to figure out whether the constant will be folded into the given instruction, otherwise there will be some 'cost' to materialize it. Having to generate instruction(s) for the materialization is likely to increase code size, but more importantly, reduce throughput and increase latency - which is why it's sometimes beneficial to hoist expensive constants out of loops.
Rebased. Thanks for reverting rewriteLoopExitValues @lebedev.ri, but I'm still seeing value in this patch. I'm not seeing any changes when running the test suite on my X86 box, but for Arm's DSP suite this affects 42/166 benchmarks for Thumb1 and 30/166 for Thumb2. Out of those changes I'm seeing a 1.6% geomean improvement for both targets.
Now only performing the checks when optimising for minsize.
thumbv7a results:
Metric: size..text Program master scev-expander diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 47596 46528 -2.2% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 6484 6340 -2.2% test-suite :: MultiSource/Benchmarks/VersaBench/bmm/bmm.test 924 908 -1.7% test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test 2916 2868 -1.6% test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test 4748 4696 -1.1% test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test 3120 3088 -1.0% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 2408 2384 -1.0% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 19920 19748 -0.9% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 19920 19748 -0.9% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 32516 32236 -0.9% test-suite :: SingleSource/Benchmarks/Shootout/Shootout-lists.test 956 964 0.8% test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test 2920 2896 -0.8% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 14600 14480 -0.8% test-suite :: SingleSource/Benchmarks/McGill/queens.test 976 968 -0.8% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 31508 31252 -0.8% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 68296 67752 -0.8% test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 1604 1592 -0.7% test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test 4936 4904 -0.6% test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 23836 23684 -0.6% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 27552 27384 -0.6% test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test 1320 1312 -0.6% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 1348 1340 -0.6% test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 1376 1368 -0.6% test-suite :: MultiSource/Applications/minisat/minisat.test 9980 9924 -0.6% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 3008 3024 0.5% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 69828 69460 -0.5% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 20116 20012 -0.5% test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test 3332 3316 -0.5% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 168600 167800 -0.5% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 26104 25984 -0.5% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test 3592 3576 -0.4% test-suite :: MultiSource/Applications/oggenc/oggenc.test 90092 89700 -0.4% test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test 2788 2776 -0.4% test-suite :: SingleSource/Benchmarks/McGill/misr.test 1864 1872 0.4% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 7540 7508 -0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 318860 317620 -0.4% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test 2176 2168 -0.4% test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test 4828 4812 -0.3% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 29164 29068 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 24400 24320 -0.3% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 5344 5328 -0.3% test-suite :: MultiSource/Applications/spiff/spiff.test 12184 12152 -0.3% test-suite :: MultiSource/Applications/sgefa/sgefa.test 6096 6080 -0.3% test-suite :: MultiSource/Applications/SPASS/SPASS.test 201512 201000 -0.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 128336 128016 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test 9640 9616 -0.2% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 246116 245556 -0.2% test-suite :: MultiSource/Applications/lua/lua.test 61520 61392 -0.2% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 40916 40832 -0.2% test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test 7796 7780 -0.2% test-suite :: SingleSource/Benchmarks/Misc/flops.test 3936 3928 -0.2% test-suite :: MultiSource/Applications/obsequi/Obsequi.test 17468 17436 -0.2% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 89716 89552 -0.2% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 324352 323820 -0.2% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 296700 296220 -0.2% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 2524 2520 -0.2% test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test 20760 20728 -0.2% test-suite :: MultiSource/Benchmarks/nbench/nbench.test 16872 16848 -0.1% test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test 2948 2952 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.test 6084 6076 -0.1% test-suite :: MultiSource/Applications/d/make_dparser.test 43812 43756 -0.1% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 20152 20176 0.1% test-suite :: MultiSource/Applications/hexxagon/hexxagon.test 7252 7244 -0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test 14828 14812 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test 92020 91924 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 8228 8220 -0.1% test-suite :: MultiSource/Benchmarks/sim/sim.test 8704 8696 -0.1% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 168288 168136 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test 91924 91844 -0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 218764 218580 -0.1% test-suite :: MicroBenchmarks/harris/harris.test 62584 62536 -0.1% test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test 11224 11216 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 90788 90724 -0.1% test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 12352 12344 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 62016 61976 -0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test 7120 7116 -0.1% test-suite :: MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter.test 61768 61736 -0.1% test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test 61800 61768 -0.1% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 62136 62104 -0.1% test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test 62440 62408 -0.1% test-suite :: MultiSource/Applications/siod/siod.test 47092 47068 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 60088 60064 -0.0%
aarch64 results:
Metric: size..text Program master scev-expander diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 72100 68220 -5.4% test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 2092 2004 -4.2% test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test 2588 2500 -3.4% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 8420 8196 -2.7% test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 2212 2156 -2.5% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 1652 1620 -1.9% test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test 2948 2908 -1.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 43972 43428 -1.2% test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test 3668 3628 -1.1% test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test 5876 5812 -1.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 22276 22068 -0.9% test-suite :: MultiSource/Benchmarks/Olden/em3d/em3d.test 2812 2836 0.9% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 26804 26580 -0.8% test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test 980 972 -0.8% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 9132 9060 -0.8% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 26220 26020 -0.8% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 26220 26020 -0.8% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 3156 3132 -0.8% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test 3156 3132 -0.8% test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test 5460 5420 -0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 34852 34628 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 99124 98540 -0.6% test-suite :: MultiSource/Applications/oggenc/oggenc.test 107024 106456 -0.5% test-suite :: SingleSource/Benchmarks/McGill/queens.test 1516 1508 -0.5% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 36012 35836 -0.5% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 82264 81864 -0.5% test-suite :: SingleSource/Benchmarks/McGill/misr.test 1668 1660 -0.5% test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test 3372 3356 -0.5% test-suite :: MultiSource/Applications/minisat/minisat.test 13836 13772 -0.5% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 3460 3444 -0.5% test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test 3996 3980 -0.4% test-suite :: SingleSource/Benchmarks/Misc/flops.test 4116 4100 -0.4% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 27396 27292 -0.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test 8852 8820 -0.4% test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 32168 32056 -0.3% test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test 4892 4876 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 19708 19644 -0.3% test-suite :: MultiSource/Benchmarks/nbench/nbench.test 20916 20852 -0.3% test-suite :: MultiSource/Applications/spiff/spiff.test 16028 15980 -0.3% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 203964 203460 -0.2% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test 3380 3372 -0.2% test-suite :: MultiSource/Applications/SPASS/SPASS.test 288136 287464 -0.2% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 3596 3588 -0.2% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 161852 161492 -0.2% test-suite :: MultiSource/Benchmarks/sim/sim.test 11044 11020 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 41156 41068 -0.2% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 383100 382340 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 28852 28796 -0.2% test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test 27236 27188 -0.2% test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test 9132 9116 -0.2% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 5444 5436 -0.1% test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 16428 16404 -0.1% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 471400 470712 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 11324 11308 -0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 378596 378116 -0.1% test-suite :: MicroBenchmarks/harris/harris.test 104836 104708 -0.1% test-suite :: MultiSource/Applications/sgefa/sgefa.test 6812 6804 -0.1% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 244840 244568 -0.1% test-suite :: MultiSource/Applications/obsequi/Obsequi.test 21980 21956 -0.1% test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test 15124 15108 -0.1% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 116900 116780 -0.1% test-suite :: MultiSource/Applications/lua/lua.test 90508 90420 -0.1% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 331984 331680 -0.1% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 54364 54316 -0.1% test-suite :: MultiSource/Applications/siod/siod.test 64448 64392 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 83348 83276 -0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test 9548 9540 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 86428 86356 -0.1% test-suite :: MultiSource/Applications/hexxagon/hexxagon.test 10180 10172 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test 142752 142656 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test 142792 142712 -0.1% test-suite :: MultiSource/Applications/d/make_dparser.test 62912 62880 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 139296 139232 -0.0%
That are really nice code size savings!
As only the improvements are shown, just curious if there are no regressions? Thus, this is overall an improvement too?
And for extra confidence and as the numbers are easy to obtain, probably best to get numbers for x86 too?
Thanks!. That's the full set of results, there's a few little regressions for thumb and I think only one for aarch64... I'll get the X86 numbers now.
Similar story on X86 as well:
Metric: size..text Program master scev-expander diff test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 2437 2325 -4.6% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 1637 1573 -3.9% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 58885 56645 -3.8% test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test 821 805 -1.9% test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test 5845 5733 -1.9% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 8597 8437 -1.9% test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test 2661 2613 -1.8% test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 1781 1749 -1.8% test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test 2917 2869 -1.6% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 3253 3205 -1.5% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 47093 46469 -1.3% test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test 2485 2453 -1.3% test-suite :: SingleSource/Benchmarks/McGill/queens.test 1301 1285 -1.2% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 20485 20245 -1.2% test-suite :: SingleSource/Benchmarks/McGill/misr.test 1525 1509 -1.0% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 9221 9125 -1.0% test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test 3205 3173 -1.0% test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test 3365 3333 -1.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 88756 88148 -0.7% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 28709 28517 -0.7% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 28709 28517 -0.7% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test 2501 2517 0.6% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 33429 33221 -0.6% test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test 5445 5413 -0.6% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 34293 34101 -0.6% test-suite :: SingleSource/Benchmarks/Misc/flops.test 6117 6085 -0.5% test-suite :: MultiSource/Applications/spiff/spiff.test 15413 15333 -0.5% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 37109 36917 -0.5% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 3109 3125 0.5% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 6453 6421 -0.5% test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 29668 29524 -0.5% test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test 3461 3445 -0.5% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 3493 3477 -0.5% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 95733 95349 -0.4% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test 4181 4165 -0.4% test-suite :: MultiSource/Applications/minisat/minisat.test 13413 13365 -0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 417589 416293 -0.3% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 10645 10613 -0.3% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test 11173 11205 0.3% test-suite :: MultiSource/Applications/oggenc/oggenc.test 115716 115396 -0.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 171989 171525 -0.3% test-suite :: MultiSource/Benchmarks/sim/sim.test 11957 11925 -0.3% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 221205 220629 -0.3% test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test 25061 24997 -0.3% test-suite :: MultiSource/Applications/obsequi/Obsequi.test 21141 21093 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 43445 43349 -0.2% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 444644 443812 -0.2% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 27477 27525 0.2% test-suite :: MultiSource/Applications/lua/lua.test 83317 83173 -0.2% test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test 10389 10373 -0.2% test-suite :: MultiSource/Benchmarks/nbench/nbench.test 20981 20949 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test 10821 10837 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 23077 23045 -0.1% test-suite :: MultiSource/Applications/SPASS/SPASS.test 284754 284386 -0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 400821 400325 -0.1% test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test 14341 14325 -0.1% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 352852 352484 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test 141716 141572 -0.1% test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 16037 16021 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test 141636 141508 -0.1% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 56085 56037 -0.1% test-suite :: MultiSource/Applications/siod/siod.test 60580 60532 -0.1% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 241315 241139 -0.1% test-suite :: MicroBenchmarks/harris/harris.test 99381 99317 -0.1% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 139956 139876 -0.1% test-suite :: MultiSource/Applications/d/make_dparser.test 61347 61315 -0.1% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 138581 138645 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 138932 138868 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 36805 36789 -0.0% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 98981 98949 -0.0% test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test 99285 99253 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 321508 321412 -0.0% test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test 134756 134724 -0.0% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 84661 84677 0.0% test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test 98453 98437 -0.0% test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 99397 99381 -0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 102437 102421 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 391620 391572 -0.0% test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test 134788 134772 -0.0% test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test 177893 177877 -0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 296165 296149 -0.0% test-suite :: MultiSource/Applications/kimwitu++/kc.test 297476 297460 -0.0% Geomean difference -0.1%
Nice one!
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2383–2384 | This change here looks like an exact duplication of the change above (lines 2355 - 2362). Can this be in a helper? |
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2383–2384 | Yeah, that would be nicer. |
Apologies if i'm missing the point here.
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2272 | s/get/accountFor/ | |
2282–2285 | I'm not convinced this modelling is correct. If we have SCEV x + y + 42, 42 will be modelled as-if it's at index 2, | |
2286–2289 | Same | |
2290–2297 | And again, if we have umin(x, y, 42), it's lowered as z = (x u< y) ? x : y ; z u< 42 ? z : 42, | |
2305–2306 | enumerate(NAry->operands()) | |
2384–2385 | enumerate(NAry->operands()) |
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2282–2285 | Ah! Right, I'll try to fix that. |
Changed how operands are visited and costed, hopefully now translating the SCEV operand index correctly to an Instruction index.
With @lebedev.ri last comment fixed, this now looks good to me. Please wait a day or so just in case there are more comments.
I think this is going in the wrong direction.
I think the worklist needs to be changed, to be a struct { unsigned ParentOpcode; int OperandIdx; SCEV* S; },
then everything should suddenly become less convoluted/hand-wavey.
Yeah, that sounds better, I'll put together a separate patch. Are you happy with the operand index clamping though?
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2305 | Now I notice that I wouldn't have been handling AddRec expressions... So should these operands be added to the worklist for both Add and Mul or would just Add be okay? |
All this is (becoming?) so incredibly fragile...
Have you checked what happens if you simply make isHighCostExpansionHelper() always return true for -Oz? :)
This really needs refactoring/generalization.
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2305 | Given A + B*x, you'd want to model A as being at index 1, And for higher orders A + B*x + C*x^2, again, B and C are part of multiply, So i think the generalization is that all nary operands except the first one are at index 1 of mul, |
This really needs refactoring/generalization.
I will try...
Have you checked what happens if you simply make isHighCostExpansionHelper() always return true for -Oz? :)
I thought I did, but actually it was just for rewriting loop exit values... I will run the numbers though! But I'm also interested in whether this can still be beneficial for execution speed too.
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2305 | Okay, thanks. |
I just ran some numbers for when -Oz == HighCost and it's interesting... For Arm it's not good:
master minsize-high-cost-expand diff count 310.000000 310.000000 310.000000 mean 19710.077419 19662.567742 0.000666 std 48882.246111 48773.870505 0.010779 min 292.000000 292.000000 -0.042781 25% 1090.000000 1084.000000 -0.002887 50% 2776.000000 2784.000000 0.000000 75% 10185.000000 10138.000000 0.002424 max 324060.000000 323508.000000 0.058974 Geomean difference 0.1%
For AArch64 it's okay, but there are still plenty of large regressions:
master minsize-high-cost-expand diff count 310.000000 310.000000 310.000000 mean 26712.283871 26650.709677 -0.000930 std 66147.455977 66033.469245 0.010996 min 476.000000 476.000000 -0.050374 25% 1294.000000 1304.000000 -0.003676 50% 3168.000000 3168.000000 0.000000 75% 11252.000000 11216.000000 0.001597 max 471648.000000 470888.000000 0.051485 Geomean difference -0.1%
But for X86, it's great:
master minsize-high-cost-expand diff count 313.000000 313.000000 313.000000 mean 27472.361022 27354.431310 -0.005084 std 68350.278069 68101.639463 0.013028 min 389.000000 389.000000 -0.065654 25% 1157.000000 1157.000000 -0.007393 50% 2933.000000 2949.000000 0.000000 75% 11093.000000 11077.000000 0.000000 max 444676.000000 442660.000000 0.039900 Geomean difference -0.5%
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2383–2384 | Doesn't look missing to me? |
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2383–2384 | Ah, thanks! I'm getting lost amongst the patches. |
Rebased on top of D86050 and introduced a struct to map operands for the worklist, instead of just storing the opcode.
@samparker why did you commit this?
I have not finished reviewing this, and i've requested changes to the previous revisions.
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2187–2204 | This needs a better comment. | |
2189 | Not llvm coding style in any case. | |
2192–2193 | SCEVOperand::OperandIdx is int | |
2233 | Please do mark done comments as such. | |
2349 | Consider constants to be free unless we are optforsize | |
2352–2353 | These are only used in a single place | |
2357 | This is inconsistent with every other return here. |
We had gone through many revisions where I've addressed all of your (helpful) comments and I hadn't heard anything else for three weeks. I sincerely expected the remaining issues to be style changes which seem appropriate for a post-commit review.
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2192–2193 | But when we enumerate the SCEV operands, the index is size_t and the types need to be compatible for std::max and min. |
llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp | ||
---|---|---|
2192–2193 | int MinIdx = std::max((int)SCEVOp.index(), CostOp.MinIdx); int OpIdx = std::min(MinIdx, CostOp.MaxIdx); |
Not llvm coding style in any case.
These should likely be Min/Max