Adds a simple fast-path check for the pattern:
v = load ptr store v to ptr
I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent).
Differential D79391
[DSE] Remove noop stores in MSSA. zoecarver on May 4 2020, 9:01 PM. Authored by
Details Adds a simple fast-path check for the pattern: v = load ptr store v to ptr I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent).
Diff Detail
Event Timeline
Comment Actions Thanks for working on this! Some comments inline. Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)
Comment Actions
I'm happy to work on this next.
Comment Actions
Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit? Thanks for all the help with this patch :) Comment Actions Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats. The final .json file after running the test-suite will contain the statistics collected. For more details on how to use the test-suite, see https://llvm.org/docs/TestSuiteGuide.html I usually use something like cmake -G Ninja \ -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \ -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \ -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \ -DTEST_SUITE_COLLECT_STATS=On \ /path/to/test-suite ninja -j 16
Comment Actions Accidentally hit send too early. .... I usually use something like cmake -G Ninja \ -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \ -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \ -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \ -DTEST_SUITE_SUBDIRS=MultiSource;External \ -DTEST_SUITE_COLLECT_STATS=On \ /path/to/test-suite ninja -j 16 /path/to/llvm-lit -j16 . -o stats.json
Comment Actions
Thanks that's super helpful. Here are the results: Redundant Stores: (no change) Metric: dse.NumRedundantStores Program stats_opt stats_master diff test-suite...lications/ClamAV/clamscan.test 2.00 2.00 0.0% test-suite...ications/JM/ldecod/ldecod.test 3.00 3.00 0.0% test-suite...ications/JM/lencod/lencod.test 4.00 4.00 0.0% test-suite.../Applications/SPASS/SPASS.test 2.00 2.00 0.0% test-suite.../Applications/lemon/lemon.test 2.00 2.00 0.0% test-suite...pplications/oggenc/oggenc.test 8.00 8.00 0.0% test-suite...pplications/treecc/treecc.test 4.00 4.00 0.0% test-suite...marks/7zip/7zip-benchmark.test 6.00 6.00 0.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 1.00 1.00 0.0% test-suite...hmarks/McCat/15-trie/trie.test 1.00 1.00 0.0% test-suite...nsumer-lame/consumer-lame.test 3.00 3.00 0.0% test-suite...TimberWolfMC/timberwolfmc.test 1.00 1.00 0.0% test-suite.../Prolangs-C/loader/loader.test 1.00 1.00 0.0% test-suite.../Benchmarks/Ptrdist/bc/bc.test 1.00 1.00 0.0% test-suite...arks/mafft/pairlocalalign.test 6.00 6.00 0.0% Geomean difference nan% stats_opt stats_master diff count 15.000000 15.000000 15.0 mean 3.000000 3.000000 0.0 std 2.203893 2.203893 0.0 min 1.000000 1.000000 0.0 25% 1.000000 1.000000 0.0 50% 2.000000 2.000000 0.0 75% 4.000000 4.000000 0.0 max 8.000000 8.000000 0.0 Execution Time: (slight improvement) Metric: exec_time Program stats_opt stats_master diff test-suite...urce/Applications/hbd/hbd.test 0.00 0.00 -28.1% test-suite...adpcm/rawcaudio/rawcaudio.test 0.00 0.00 -23.1% test-suite...itBench/uudecode/uudecode.test 0.10 0.12 20.6% test-suite...urce/Applications/aha/aha.test 1.26 1.02 -19.1% test-suite.../Benchmarks/Ptrdist/bc/bc.test 0.69 0.83 18.9% test-suite...flt/LoopRestructuring-flt.test 4.32 3.57 -17.4% test-suite...count/automotive-bitcount.test 0.09 0.08 -16.9% test-suite...ks/McCat/12-IOtest/iotest.test 0.28 0.33 16.4% test-suite.../Benchmarks/Ptrdist/ft/ft.test 0.88 1.01 14.7% test-suite...chmarks/MallocBench/gs/gs.test 0.04 0.05 14.6% test-suite...arching-dbl/Searching-dbl.test 4.16 3.55 -14.6% test-suite...ve-susan/automotive-susan.test 0.04 0.04 -14.4% test-suite...t/StatementReordering-flt.test 3.36 2.89 -14.0% test-suite...mbolics-flt/Symbolics-flt.test 1.26 1.09 -13.8% test-suite...mbolics-dbl/Symbolics-dbl.test 3.16 2.73 -13.7% Geomean difference -1.4% stats_opt stats_master diff count 198.000000 198.000000 198.000000 mean 2.136898 2.102667 -0.010418 std 4.097594 4.069014 0.079837 min 0.000800 0.000900 -0.281250 25% 0.036075 0.035250 -0.068807 50% 0.616700 0.650100 0.000000 75% 3.317450 3.089325 0.045511 max 32.665900 31.948900 0.205853 Execution Time (filter short): (mostly improved) Tests: 198 Short Running: 96 (filtered out) Remaining: 102 Metric: exec_time Program stats_opt stats_master diff test-suite...urce/Applications/aha/aha.test 1.26 1.02 -19.1% test-suite.../Benchmarks/Ptrdist/bc/bc.test 0.69 0.83 18.9% test-suite...flt/LoopRestructuring-flt.test 4.32 3.57 -17.4% test-suite.../Benchmarks/Ptrdist/ft/ft.test 0.88 1.01 14.7% test-suite...arching-dbl/Searching-dbl.test 4.16 3.55 -14.6% test-suite...t/StatementReordering-flt.test 3.36 2.89 -14.0% test-suite...mbolics-flt/Symbolics-flt.test 1.26 1.09 -13.8% test-suite...mbolics-dbl/Symbolics-dbl.test 3.16 2.73 -13.7% test-suite...C/Packing-flt/Packing-flt.test 4.71 4.10 -13.0% test-suite...ctions-dbl/Reductions-dbl.test 4.06 3.54 -12.7% test-suite...rimaran/enc-3des/enc-3des.test 1.90 1.66 -12.6% test-suite.../Trimaran/enc-md5/enc-md5.test 1.69 1.48 -12.4% test-suite...yApps-C++/PENNANT/PENNANT.test 0.62 0.70 12.3% test-suite...arching-flt/Searching-flt.test 3.98 3.49 -12.3% test-suite...s/Ptrdist/anagram/anagram.test 0.64 0.72 12.2% Geomean difference nan% stats_opt stats_master diff count 102.000000 101.000000 101.000000 mean 4.070663 4.036584 -0.015677 std 4.994630 4.988885 0.080786 min 0.604600 0.635000 -0.191148 25% 1.660050 1.549400 -0.083075 50% 3.237450 3.072300 -0.005657 75% 4.359050 4.313500 0.046250 max 32.665900 31.948900 0.189496 Compile Time: (unexpectedly improved but, that might just be noise) Metric: compile_time Program stats_opt stats_master diff test-suite...rks/tramp3d-v4/tramp3d-v4.test 45.98 52.93 15.1% test-suite.../Trimaran/enc-pc1/enc-pc1.test 0.18 0.20 9.6% test-suite...netbench-crc/netbench-crc.test 0.20 0.21 9.0% test-suite...hmarks/VersaBench/bmm/bmm.test 0.14 0.15 8.4% test-suite...marks/SciMark2-C/scimark2.test 0.94 1.02 8.1% test-suite.../Benchmarks/Ptrdist/ft/ft.test 0.57 0.61 7.6% test-suite...marks/7zip/7zip-benchmark.test 122.38 131.64 7.6% test-suite.../Trimaran/enc-rc4/enc-rc4.test 0.13 0.14 7.5% test-suite.../Trimaran/enc-md5/enc-md5.test 0.29 0.31 7.5% test-suite...ngs-C/fixoutput/fixoutput.test 0.16 0.18 7.2% test-suite...netbench-url/netbench-url.test 0.50 0.53 7.1% test-suite.../Prolangs-C/loader/loader.test 0.73 0.78 7.1% test-suite...arks/VersaBench/dbms/dbms.test 2.30 2.47 7.0% test-suite...langs-C/allroots/allroots.test 0.20 0.22 7.0% test-suite...ngs-C/assembler/assembler.test 1.38 1.48 6.9% Geomean difference 0.3% stats_opt stats_master diff count 198.000000 198.000000 198.000000 mean 5.392878 5.432042 0.003577 std 13.798743 14.112571 0.039915 min 0.053700 0.052200 -0.067272 25% 0.318475 0.314125 -0.027627 50% 1.606700 1.594650 -0.000062 75% 3.390850 3.419850 0.031306 max 122.376700 131.644500 0.151016
Comment Actions
Regressed kinda a lot ? Comment Actions Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression. Comment Actions @asbirlea thank you for the review! Sorry I didn't get to this sooner, I was super busy last week.
I'm pretty sure it's because I recursively walk *every* store's accesses. I suspect checking Load->getPointerOperand() != Store->getOperand(1) *first* will fix this. Running the stats again.
Comment Actions Update based on review:
Comment Actions Execution time: Tests: 198 Short Running: 97 (filtered out) Remaining: 101 Metric: exec_time Program stats_opt stats_master diff test-suite...ks/Prolangs-C++/life/life.test 2.30 3.01 30.9% test-suite...arks/mafft/pairlocalalign.test 12.66 15.31 20.9% test-suite.../Benchmarks/Ptrdist/bc/bc.test 0.68 0.83 20.7% test-suite...Source/Benchmarks/sim/sim.test 2.81 3.38 20.2% test-suite...urce/Applications/aha/aha.test 1.25 1.02 -18.4% test-suite...s/Ptrdist/anagram/anagram.test 0.62 0.72 16.9% test-suite.../Benchmarks/nbench/nbench.test 1.32 1.55 16.9% test-suite...ks/VersaBench/8b10b/8b10b.test 4.31 5.02 16.4% test-suite...nchmarks/llubenchmark/llu.test 3.19 3.68 15.1% test-suite.../Applications/lemon/lemon.test 0.87 0.75 -14.1% test-suite...tions/lambda-0.1.3/lambda.test 3.50 3.04 -13.1% test-suite.../VersaBench/ecbdes/ecbdes.test 1.81 2.02 11.7% test-suite...netbench-url/netbench-url.test 2.97 3.31 11.3% test-suite...hmarks/VersaBench/bmm/bmm.test 2.34 2.60 11.3% test-suite...marks/SciMark2-C/scimark2.test 28.48 31.68 11.2% Geomean difference nan% stats_opt stats_master diff count 99.000000 101.000000 99.000000 mean 3.921419 4.036584 0.035598 std 4.557934 4.988885 0.071592 min 0.617100 0.635000 -0.184291 25% 1.575900 1.549400 -0.003149 50% 3.090300 3.072300 0.022620 75% 4.272200 4.313500 0.070729 max 28.946700 31.948900 0.308558 Compile time: Tests: 198 Metric: compile_time Program stats_opt stats_master diff test-suite...rks/tramp3d-v4/tramp3d-v4.test 43.40 52.93 22.0% test-suite...arks/mafft/pairlocalalign.test 26.05 30.58 17.4% test-suite...ce/Benchmarks/PAQ8p/paq8p.test 4.32 4.97 15.1% test-suite...frame_layout/frame_layout.test 2.47 2.81 13.5% test-suite...marks/7zip/7zip-benchmark.test 122.67 131.64 7.3% test-suite...fice-ispell/office-ispell.test 3.99 4.28 7.2% test-suite...earch/office-stringsearch.test 0.44 0.47 6.8% test-suite...netbench-crc/netbench-crc.test 0.20 0.21 6.8% test-suite...nsumer-lame/consumer-lame.test 11.69 12.47 6.6% test-suite...math/automotive-basicmath.test 0.34 0.32 -6.3% test-suite...lowfish/security-blowfish.test 0.64 0.68 6.2% test-suite...comm-adpcm/telecomm-adpcm.test 0.15 0.16 6.2% test-suite.../Prolangs-C++/ocean/ocean.test 0.38 0.35 -6.0% test-suite.../Trimaran/enc-pc1/enc-pc1.test 0.19 0.20 5.4% test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 2.84 2.69 -5.3% Geomean difference 0.4% stats_opt stats_master diff count 198.000000 198.000000 198.000000 mean 5.349209 5.432042 0.004814 std 13.699984 14.112571 0.038243 min 0.053100 0.052200 -0.063188 25% 0.312425 0.314125 -0.025223 50% 1.604600 1.594650 0.001472 75% 3.347775 3.419850 0.026751 max 122.668900 131.644500 0.219606 Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better? There may be some noise here so I'm going to try running the benchmarks a few more times to make sure they line up. I'm also going to add a statistic for noop stores. Comment Actions Runtimes/execution times depend a lot on the system and can be quite noisy. For comparable numbers you have to make sure that the system noise is reduced to a minimum (and you don't build & run the benchmarks in parallel). I would suggest to focus on the statistics about removed stores and how the patch impacts them. With respect to compile-time, I don't think there's much to worry about with the patch in its current form.. Also, I recently found out that EarlyCSE also does some simple form of load-store forwarding (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/EarlyCSE.cpp#L1214). It might be worth consolidating the DSE logic in the DeadStoreElimination pass in the future, but that's a discussion for the future.
Comment Actions Thanks for the review.
What do you think about adding a statistic for noop stores? Or should I increment NumRedundantStores or a different statistic? I think as-is this patch doesn't change any statistics.
Comment Actions
Comment Actions Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly). Comment Actions I think it may be that currently most of those stores are eliminated earlier by EarlyCSE (see earlier comment). Might be interesting to see what happens with MSSA disabled for EarlyCSE. (just to double check: MSSA based DSE is not enabled by default, so for your measurements you have to make sure it is turned on via -mllvm -enable-dse-memoryssa=true or by setting EnableMemorySSA) I think it is almost ready, just a question about some of the test changes remaining.
Comment Actions
I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded. Comment Actions Hm I think cmake might not pick up CXXFLAGS, if it's not the initial run. Probably better to pass -DCMAKE_C_FLAGS=/-DCMAKE_CXX_FLAGS=. Or just flip the default for EnableMemorySSA (which I usually do :P) and rebuild clang/libLTO. To verify, you can check dse.NumFastStores. There should be a lot of changes with/without MemorySSA. For example, with -O3 -flto: ~/projects/test-suites/test-suite/utils/compare.py --filter-hash -m dse.NumFastStores base.json mssa.json Tests: 237 Same hash: 104 (filtered out) Remaining: 133 Metric: dse.NumFastStores Program base mssa diff test-suite...-typeset/consumer-typeset.test 162.00 891.00 450.0% test-suite...INT2000/164.gzip/164.gzip.test 15.00 67.00 346.7% test-suite...ks/Prolangs-C/agrep/agrep.test 2.00 8.00 300.0% test-suite...000/183.equake/183.equake.test 10.00 40.00 300.0% test-suite...lications/sqlite3/sqlite3.test 22.00 76.00 245.5% test-suite...CFP2006/433.milc/433.milc.test 64.00 195.00 204.7% test-suite...000/197.parser/197.parser.test 3.00 9.00 200.0% test-suite...chmarks/Olden/power/power.test 11.00 33.00 200.0% test-suite...lications/ClamAV/clamscan.test 72.00 210.00 191.7% test-suite.../Applications/SPASS/SPASS.test 48.00 139.00 189.6% test-suite...T2006/445.gobmk/445.gobmk.test 19.00 52.00 173.7% test-suite...Source/Benchmarks/sim/sim.test 2.00 5.00 150.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 9.00 22.00 144.4% test-suite...ks/McCat/12-IOtest/iotest.test 6.00 14.00 133.3% test-suite...nsumer-jpeg/consumer-jpeg.test 12.00 27.00 125.0% Geomean difference nan% For noop stores, it looks like there are the following nook-store changes for the configuration above: test-suite...:: External/Povray/povray.test NaN 1.00 nan% test-suite...CFP2000/177.mesa/177.mesa.test NaN 1.00 nan% test-suite...000/183.equake/183.equake.test NaN 11.00 nan% test-suite...CFP2006/433.milc/433.milc.test NaN 1.00 nan% test-suite...006/447.dealII/447.dealII.test NaN 2.00 nan% test-suite...006/453.povray/453.povray.test NaN 13.00 nan% test-suite...6/482.sphinx3/482.sphinx3.test NaN 1.00 nan% test-suite.../CINT2000/176.gcc/176.gcc.test NaN 6.00 nan% test-suite...0/253.perlbmk/253.perlbmk.test NaN 21.00 nan% test-suite...0.perlbench/400.perlbench.test NaN 24.00 nan% test-suite...T2006/445.gobmk/445.gobmk.test NaN 25.00 nan% test-suite...lications/ClamAV/clamscan.test NaN 2.00 nan% test-suite.../Applications/SPASS/SPASS.test NaN 2.00 nan% test-suite...pplications/oggenc/oggenc.test NaN 2.00 nan% test-suite...s/MallocBench/cfrac/cfrac.test NaN 6.00 nan% test-suite...chmarks/MallocBench/gs/gs.test NaN 1.00 nan% test-suite...TimberWolfMC/timberwolfmc.test NaN 1.00 nan% test-suite.../Prolangs-C/loader/loader.test NaN 1.00 nan% As mentioned earlier, I think this looks good, once the comments for the tests are addressed :)
Comment Actions @fhahn I really appreciate all your help with this patch. The tests are updated.
I was passing the flags to the llvm-test-suite build but, I also tried when building the mono-repo. I tried both with CXXFLAGS and with DCMAKE_C_FLAGS as suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both) but I still get the same error when running utils/compare.py. I'll take another look at the docs and keep playing around with it to see if I can get it working tomorrow. Comment Actions LGTM, thanks! This should only remove stores which we couldn't kill other stores, as the load would block the store killing any other stores.
Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;) Comment Actions
Ah, there we go! That works now :) |
Forgot about the naming convention, my bad. Will fix.