Adds a simple fast-path check for the pattern:
v = load ptr store v to ptr
I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent).
|  Differential  D79391  
[DSE] Remove noop stores in MSSA. Authored by zoecarver on May 4 2020, 9:01 PM. 
Details Adds a simple fast-path check for the pattern: v = load ptr store v to ptr I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent). 
Diff Detail 
 Event Timeline
 
 
 Comment Actions Thanks for working on this! Some comments inline. Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :) 
 
 
 Comment Actions 
 I'm happy to work on this next. 
 Comment Actions 
 Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit? Thanks for all the help with this patch :) Comment Actions Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats. The final .json file after running the test-suite will contain the statistics collected. For more details on how to use the test-suite, see https://llvm.org/docs/TestSuiteGuide.html I usually use something like cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite
ninja -j 16
 Comment Actions Accidentally hit send too early. .... I usually use something like cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
   -DTEST_SUITE_SUBDIRS=MultiSource;External \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite
ninja -j 16
/path/to/llvm-lit -j16 . -o  stats.json
 Comment Actions 
 Thanks that's super helpful. Here are the results: Redundant Stores: (no change) Metric: dse.NumRedundantStores
Program                                        stats_opt stats_master diff
test-suite...lications/ClamAV/clamscan.test     2.00      2.00        0.0%
test-suite...ications/JM/ldecod/ldecod.test     3.00      3.00        0.0%
test-suite...ications/JM/lencod/lencod.test     4.00      4.00        0.0%
test-suite.../Applications/SPASS/SPASS.test     2.00      2.00        0.0%
test-suite.../Applications/lemon/lemon.test     2.00      2.00        0.0%
test-suite...pplications/oggenc/oggenc.test     8.00      8.00        0.0%
test-suite...pplications/treecc/treecc.test     4.00      4.00        0.0%
test-suite...marks/7zip/7zip-benchmark.test     6.00      6.00        0.0%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test     1.00      1.00        0.0%
test-suite...hmarks/McCat/15-trie/trie.test     1.00      1.00        0.0%
test-suite...nsumer-lame/consumer-lame.test     3.00      3.00        0.0%
test-suite...TimberWolfMC/timberwolfmc.test     1.00      1.00        0.0%
test-suite.../Prolangs-C/loader/loader.test     1.00      1.00        0.0%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     1.00      1.00        0.0%
test-suite...arks/mafft/pairlocalalign.test     6.00      6.00        0.0%
Geomean difference                                                    nan%
       stats_opt  stats_master  diff
count  15.000000  15.000000     15.0
mean   3.000000   3.000000      0.0
std    2.203893   2.203893      0.0
min    1.000000   1.000000      0.0
25%    1.000000   1.000000      0.0
50%    2.000000   2.000000      0.0
75%    4.000000   4.000000      0.0
max    8.000000   8.000000      0.0Execution Time: (slight improvement) Metric: exec_time
Program                                        stats_opt stats_master diff
test-suite...urce/Applications/hbd/hbd.test     0.00      0.00       -28.1%
test-suite...adpcm/rawcaudio/rawcaudio.test     0.00      0.00       -23.1%
test-suite...itBench/uudecode/uudecode.test     0.10      0.12       20.6%
test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite...count/automotive-bitcount.test     0.09      0.08       -16.9%
test-suite...ks/McCat/12-IOtest/iotest.test     0.28      0.33       16.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...chmarks/MallocBench/gs/gs.test     0.04      0.05       14.6%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...ve-susan/automotive-susan.test     0.04      0.04       -14.4%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
Geomean difference                                                   -1.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   2.136898    2.102667     -0.010418
std    4.097594    4.069014      0.079837
min    0.000800    0.000900     -0.281250
25%    0.036075    0.035250     -0.068807
50%    0.616700    0.650100      0.000000
75%    3.317450    3.089325      0.045511
max    32.665900   31.948900     0.205853Execution Time (filter short): (mostly improved) Tests: 198
Short Running: 96 (filtered out)
Remaining: 102
Metric: exec_time
Program                                        stats_opt stats_master diff
test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
test-suite...C/Packing-flt/Packing-flt.test     4.71      4.10       -13.0%
test-suite...ctions-dbl/Reductions-dbl.test     4.06      3.54       -12.7%
test-suite...rimaran/enc-3des/enc-3des.test     1.90      1.66       -12.6%
test-suite.../Trimaran/enc-md5/enc-md5.test     1.69      1.48       -12.4%
test-suite...yApps-C++/PENNANT/PENNANT.test     0.62      0.70       12.3%
test-suite...arching-flt/Searching-flt.test     3.98      3.49       -12.3%
test-suite...s/Ptrdist/anagram/anagram.test     0.64      0.72       12.2%
Geomean difference                                                    nan%
        stats_opt  stats_master        diff
count  102.000000  101.000000    101.000000
mean   4.070663    4.036584     -0.015677
std    4.994630    4.988885      0.080786
min    0.604600    0.635000     -0.191148
25%    1.660050    1.549400     -0.083075
50%    3.237450    3.072300     -0.005657
75%    4.359050    4.313500      0.046250
max    32.665900   31.948900     0.189496Compile Time: (unexpectedly improved but, that might just be noise) Metric: compile_time
Program                                        stats_opt stats_master diff
test-suite...rks/tramp3d-v4/tramp3d-v4.test    45.98     52.93       15.1%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.18      0.20        9.6%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        9.0%
test-suite...hmarks/VersaBench/bmm/bmm.test     0.14      0.15        8.4%
test-suite...marks/SciMark2-C/scimark2.test     0.94      1.02        8.1%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.57      0.61        7.6%
test-suite...marks/7zip/7zip-benchmark.test   122.38    131.64        7.6%
test-suite.../Trimaran/enc-rc4/enc-rc4.test     0.13      0.14        7.5%
test-suite.../Trimaran/enc-md5/enc-md5.test     0.29      0.31        7.5%
test-suite...ngs-C/fixoutput/fixoutput.test     0.16      0.18        7.2%
test-suite...netbench-url/netbench-url.test     0.50      0.53        7.1%
test-suite.../Prolangs-C/loader/loader.test     0.73      0.78        7.1%
test-suite...arks/VersaBench/dbms/dbms.test     2.30      2.47        7.0%
test-suite...langs-C/allroots/allroots.test     0.20      0.22        7.0%
test-suite...ngs-C/assembler/assembler.test     1.38      1.48        6.9%
Geomean difference                                                    0.3%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.392878    5.432042      0.003577
std    13.798743   14.112571     0.039915
min    0.053700    0.052200     -0.067272
25%    0.318475    0.314125     -0.027627
50%    1.606700    1.594650     -0.000062
75%    3.390850    3.419850      0.031306
max    122.376700  131.644500    0.151016
 Comment Actions 
 Regressed kinda a lot ? Comment Actions Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression. Comment Actions @asbirlea thank you for the review! Sorry I didn't get to this sooner, I was super busy last week. 
 I'm pretty sure it's because I recursively walk *every* store's accesses. I suspect checking Load->getPointerOperand() != Store->getOperand(1) *first* will fix this. Running the stats again. 
 Comment Actions Update based on review: 
 Comment Actions Execution time: Tests: 198
Short Running: 97 (filtered out)
Remaining: 101
Metric: exec_time
Program                                        stats_opt stats_master diff
test-suite...ks/Prolangs-C++/life/life.test     2.30      3.01       30.9%
test-suite...arks/mafft/pairlocalalign.test    12.66     15.31       20.9%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.68      0.83       20.7%
test-suite...Source/Benchmarks/sim/sim.test     2.81      3.38       20.2%
test-suite...urce/Applications/aha/aha.test     1.25      1.02       -18.4%
test-suite...s/Ptrdist/anagram/anagram.test     0.62      0.72       16.9%
test-suite.../Benchmarks/nbench/nbench.test     1.32      1.55       16.9%
test-suite...ks/VersaBench/8b10b/8b10b.test     4.31      5.02       16.4%
test-suite...nchmarks/llubenchmark/llu.test     3.19      3.68       15.1%
test-suite.../Applications/lemon/lemon.test     0.87      0.75       -14.1%
test-suite...tions/lambda-0.1.3/lambda.test     3.50      3.04       -13.1%
test-suite.../VersaBench/ecbdes/ecbdes.test     1.81      2.02       11.7%
test-suite...netbench-url/netbench-url.test     2.97      3.31       11.3%
test-suite...hmarks/VersaBench/bmm/bmm.test     2.34      2.60       11.3%
test-suite...marks/SciMark2-C/scimark2.test    28.48     31.68       11.2%
Geomean difference                                                    nan%
       stats_opt  stats_master       diff
count  99.000000  101.000000    99.000000
mean   3.921419   4.036584      0.035598
std    4.557934   4.988885      0.071592
min    0.617100   0.635000     -0.184291
25%    1.575900   1.549400     -0.003149
50%    3.090300   3.072300      0.022620
75%    4.272200   4.313500      0.070729
max    28.946700  31.948900     0.308558Compile time: Tests: 198
Metric: compile_time
Program                                        stats_opt stats_master diff
test-suite...rks/tramp3d-v4/tramp3d-v4.test    43.40     52.93       22.0%
test-suite...arks/mafft/pairlocalalign.test    26.05     30.58       17.4%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test     4.32      4.97       15.1%
test-suite...frame_layout/frame_layout.test     2.47      2.81       13.5%
test-suite...marks/7zip/7zip-benchmark.test   122.67    131.64        7.3%
test-suite...fice-ispell/office-ispell.test     3.99      4.28        7.2%
test-suite...earch/office-stringsearch.test     0.44      0.47        6.8%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        6.8%
test-suite...nsumer-lame/consumer-lame.test    11.69     12.47        6.6%
test-suite...math/automotive-basicmath.test     0.34      0.32       -6.3%
test-suite...lowfish/security-blowfish.test     0.64      0.68        6.2%
test-suite...comm-adpcm/telecomm-adpcm.test     0.15      0.16        6.2%
test-suite.../Prolangs-C++/ocean/ocean.test     0.38      0.35       -6.0%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.19      0.20        5.4%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test     2.84      2.69       -5.3%
Geomean difference                                                    0.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.349209    5.432042      0.004814
std    13.699984   14.112571     0.038243
min    0.053100    0.052200     -0.063188
25%    0.312425    0.314125     -0.025223
50%    1.604600    1.594650      0.001472
75%    3.347775    3.419850      0.026751
max    122.668900  131.644500    0.219606Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better? There may be some noise here so I'm going to try running the benchmarks a few more times to make sure they line up. I'm also going to add a statistic for noop stores. Comment Actions Runtimes/execution times depend a lot on the system and can be quite noisy. For comparable numbers you have to make sure that the system noise is reduced to a minimum (and you don't build & run the benchmarks in parallel). I would suggest to focus on the statistics about removed stores and how the patch impacts them. With respect to compile-time, I don't think there's much to worry about with the patch in its current form.. Also, I recently found out that EarlyCSE also does some simple form of load-store forwarding (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/EarlyCSE.cpp#L1214). It might be worth consolidating the DSE logic in the DeadStoreElimination pass in the future, but that's a discussion for the future. 
 Comment Actions Thanks for the review. 
 What do you think about adding a statistic for noop stores? Or should I increment NumRedundantStores or a different statistic? I think as-is this patch doesn't change any statistics. 
 Comment Actions 
 Comment Actions Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly). Comment Actions I think it may be that currently most of those stores are eliminated earlier by EarlyCSE (see earlier comment). Might be interesting to see what happens with MSSA disabled for EarlyCSE. (just to double check: MSSA based DSE is not enabled by default, so for your measurements you have to make sure it is turned on via -mllvm -enable-dse-memoryssa=true or by setting EnableMemorySSA) I think it is almost ready, just a question about some of the test changes remaining. 
 
 Comment Actions 
 I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded. Comment Actions Hm I think cmake might not pick up CXXFLAGS, if it's not the initial run. Probably better to pass -DCMAKE_C_FLAGS=/-DCMAKE_CXX_FLAGS=. Or just flip the default for EnableMemorySSA (which I usually do :P) and rebuild clang/libLTO. To verify, you can check dse.NumFastStores. There should be a lot of changes with/without MemorySSA. For example, with -O3 -flto: ~/projects/test-suites/test-suite/utils/compare.py --filter-hash -m dse.NumFastStores base.json mssa.json Tests: 237 Same hash: 104 (filtered out) Remaining: 133 Metric: dse.NumFastStores Program base mssa diff test-suite...-typeset/consumer-typeset.test 162.00 891.00 450.0% test-suite...INT2000/164.gzip/164.gzip.test 15.00 67.00 346.7% test-suite...ks/Prolangs-C/agrep/agrep.test 2.00 8.00 300.0% test-suite...000/183.equake/183.equake.test 10.00 40.00 300.0% test-suite...lications/sqlite3/sqlite3.test 22.00 76.00 245.5% test-suite...CFP2006/433.milc/433.milc.test 64.00 195.00 204.7% test-suite...000/197.parser/197.parser.test 3.00 9.00 200.0% test-suite...chmarks/Olden/power/power.test 11.00 33.00 200.0% test-suite...lications/ClamAV/clamscan.test 72.00 210.00 191.7% test-suite.../Applications/SPASS/SPASS.test 48.00 139.00 189.6% test-suite...T2006/445.gobmk/445.gobmk.test 19.00 52.00 173.7% test-suite...Source/Benchmarks/sim/sim.test 2.00 5.00 150.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 9.00 22.00 144.4% test-suite...ks/McCat/12-IOtest/iotest.test 6.00 14.00 133.3% test-suite...nsumer-jpeg/consumer-jpeg.test 12.00 27.00 125.0% Geomean difference nan% For noop stores, it looks like there are the following nook-store changes for the configuration above: test-suite...:: External/Povray/povray.test NaN 1.00 nan% test-suite...CFP2000/177.mesa/177.mesa.test NaN 1.00 nan% test-suite...000/183.equake/183.equake.test NaN 11.00 nan% test-suite...CFP2006/433.milc/433.milc.test NaN 1.00 nan% test-suite...006/447.dealII/447.dealII.test NaN 2.00 nan% test-suite...006/453.povray/453.povray.test NaN 13.00 nan% test-suite...6/482.sphinx3/482.sphinx3.test NaN 1.00 nan% test-suite.../CINT2000/176.gcc/176.gcc.test NaN 6.00 nan% test-suite...0/253.perlbmk/253.perlbmk.test NaN 21.00 nan% test-suite...0.perlbench/400.perlbench.test NaN 24.00 nan% test-suite...T2006/445.gobmk/445.gobmk.test NaN 25.00 nan% test-suite...lications/ClamAV/clamscan.test NaN 2.00 nan% test-suite.../Applications/SPASS/SPASS.test NaN 2.00 nan% test-suite...pplications/oggenc/oggenc.test NaN 2.00 nan% test-suite...s/MallocBench/cfrac/cfrac.test NaN 6.00 nan% test-suite...chmarks/MallocBench/gs/gs.test NaN 1.00 nan% test-suite...TimberWolfMC/timberwolfmc.test NaN 1.00 nan% test-suite.../Prolangs-C/loader/loader.test NaN 1.00 nan% As mentioned earlier, I think this looks good, once the comments for the tests are addressed :) 
 Comment Actions @fhahn I really appreciate all your help with this patch. The tests are updated. 
 I was passing the flags to the llvm-test-suite build but, I also tried when building the mono-repo. I tried both with CXXFLAGS and with DCMAKE_C_FLAGS as suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both) but I still get the same error when running utils/compare.py. I'll take another look at the docs and keep playing around with it to see if I can get it working tomorrow. Comment Actions LGTM, thanks! This should only remove stores which we couldn't kill other stores, as the load would block the store killing any other stores. 
 Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;) Comment Actions 
 Ah, there we go! That works now :) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Forgot about the naming convention, my bad. Will fix.