This is an archive of the discontinued LLVM Phabricator instance.

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
588 ↗	(On Diff #262001)	Can you add the reverse: %a = load i32, i32* %Q store volatile i32 %a, i32* %Q This one we can't eliminate. As well as, can you add these two cases with `atomic` instead? We can do some optimizations there but need to be careful around release sequences, see https://wg21.link/n4455

zoecarver marked an inline comment as done.May 5 2020, 11:19 AM

zoecarver added inline comments.

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
588 ↗	(On Diff #262001)	Good call. Luckily this is done when building the `DSEState` so I won't have to add checks for it in this patch. But I have added both the atomic and volatile tests.

Volatile and atomic tests
Local var spelling

Harbormaster completed remote builds in B55830: Diff 262179.May 5 2020, 11:53 AM

Thanks for working on this! Some comments inline.

Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	I think the check is not strict enough at the moment. I think we have to ensure that there are no memory-writes between the load and the store that may alias them. In the simple example below, IIUC the patch would currently incorrectly eliminate `store i32 %a, i32* %Q`. define void @test(i32* %Q) { %a = load i32, i32* %Q store i32 10, i32* %Q store i32 %a, i32* %Q ret void } In terms of MemorySSA, I think we are looking for loads that use the same MemoryDef as `KillingDef`. One simple way to check this would be getting the memoryDef of KillingDef and check if it dominates the load I think.

This revision now requires changes to proceed.May 6 2020, 3:16 AM

asbirlea added inline comments.May 8 2020, 10:51 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	+1. Please add a testcase similar to the example fhahn@ gave.

zoecarver marked an inline comment as done.May 8 2020, 11:17 AM

zoecarver added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	Will do. I tried the test case and it fails. I'll need to re-write most of this patch (minus the tests). I think @fhahn's solution will work. I probably won't get to this today but I should have time this weekend.

fhahn added inline comments.May 11 2020, 10:31 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	Awesome, thanks! Feel free to let us know if you have any questions.

Analyze previous writes when trying to remove noop store

Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early

I'm happy to work on this next.

zoecarver marked an inline comment as done.May 12 2020, 4:31 PM

zoecarver added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	I think we are looking for loads that use the same MemoryDef as KillingDef. I tried playing around with getting the memory access of `Load` but, I'm not sure how it should be the "same" as KillingDef (I don't know a whole lot about MSSA). I also tried analyzing `ToCheck` but, I ended up essentially having to implement this check lower down which meant missing some optimizations.

Harbormaster failed remote builds in B56512: Diff 263563!May 12 2020, 5:48 PM

Fix tests.

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit?

Thanks for all the help with this patch :)

Harbormaster completed remote builds in B56525: Diff 263588.May 12 2020, 7:24 PM

In D79391#2033000, @zoecarver wrote:

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit?

Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats. The final .json file after running the test-suite will contain the statistics collected. For more details on how to use the test-suite, see https://llvm.org/docs/TestSuiteGuide.html

I usually use something like

cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite

ninja -j 16

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	I was referring to MemoryDef/MemoryUse in MemorySSA (see http://llvm.org/docs/MemorySSA.html for a bit more details) The problem with directly looking at the def-use chains in the IR is that it does not necessarily include all memory instructions that might overwrite a the same memory location. MemorySSA provides an 'virtual SSA IR' for memory operations. I think you can make sure that there are no writes may-alias writes in between Load and SI by getting their defining accesses using MemorySSA (https://llvm.org/doxygen/classllvm_1_1MemoryUseOrDef.html#abc4e1b6cc483cfd57ca75c4ab92bd360).

Accidentally hit send too early.

....

I usually use something like

cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
   -DTEST_SUITE_SUBDIRS=MultiSource;External \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite

ninja -j 16

/path/to/llvm-lit -j16 . -o  stats.json

zoecarver marked an inline comment as done.May 19 2020, 9:45 PM

zoecarver added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857	Heh, I probably should have spent some time reading the docs before diving right in. I get it now. This is actually way simpler (and probably safer).

Rebase off master
Re-implement with MSSA

Formatting

Harbormaster failed remote builds in B57324: Diff 265130!May 19 2020, 10:32 PM

Harbormaster completed remote builds in B57325: Diff 265131.May 19 2020, 11:02 PM

Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats.

Thanks that's super helpful.

Here are the results:

Redundant Stores: (no change)

Metric: dse.NumRedundantStores

Program                                        stats_opt stats_master diff

test-suite...lications/ClamAV/clamscan.test     2.00      2.00        0.0%
test-suite...ications/JM/ldecod/ldecod.test     3.00      3.00        0.0%
test-suite...ications/JM/lencod/lencod.test     4.00      4.00        0.0%
test-suite.../Applications/SPASS/SPASS.test     2.00      2.00        0.0%
test-suite.../Applications/lemon/lemon.test     2.00      2.00        0.0%
test-suite...pplications/oggenc/oggenc.test     8.00      8.00        0.0%
test-suite...pplications/treecc/treecc.test     4.00      4.00        0.0%
test-suite...marks/7zip/7zip-benchmark.test     6.00      6.00        0.0%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test     1.00      1.00        0.0%
test-suite...hmarks/McCat/15-trie/trie.test     1.00      1.00        0.0%
test-suite...nsumer-lame/consumer-lame.test     3.00      3.00        0.0%
test-suite...TimberWolfMC/timberwolfmc.test     1.00      1.00        0.0%
test-suite.../Prolangs-C/loader/loader.test     1.00      1.00        0.0%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     1.00      1.00        0.0%
test-suite...arks/mafft/pairlocalalign.test     6.00      6.00        0.0%
Geomean difference                                                    nan%
       stats_opt  stats_master  diff
count  15.000000  15.000000     15.0
mean   3.000000   3.000000      0.0
std    2.203893   2.203893      0.0
min    1.000000   1.000000      0.0
25%    1.000000   1.000000      0.0
50%    2.000000   2.000000      0.0
75%    4.000000   4.000000      0.0
max    8.000000   8.000000      0.0

Execution Time: (slight improvement)

Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...urce/Applications/hbd/hbd.test     0.00      0.00       -28.1%
test-suite...adpcm/rawcaudio/rawcaudio.test     0.00      0.00       -23.1%
test-suite...itBench/uudecode/uudecode.test     0.10      0.12       20.6%
test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite...count/automotive-bitcount.test     0.09      0.08       -16.9%
test-suite...ks/McCat/12-IOtest/iotest.test     0.28      0.33       16.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...chmarks/MallocBench/gs/gs.test     0.04      0.05       14.6%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...ve-susan/automotive-susan.test     0.04      0.04       -14.4%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
Geomean difference                                                   -1.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   2.136898    2.102667     -0.010418
std    4.097594    4.069014      0.079837
min    0.000800    0.000900     -0.281250
25%    0.036075    0.035250     -0.068807
50%    0.616700    0.650100      0.000000
75%    3.317450    3.089325      0.045511
max    32.665900   31.948900     0.205853

Execution Time (filter short): (mostly improved)

Tests: 198
Short Running: 96 (filtered out)
Remaining: 102
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
test-suite...C/Packing-flt/Packing-flt.test     4.71      4.10       -13.0%
test-suite...ctions-dbl/Reductions-dbl.test     4.06      3.54       -12.7%
test-suite...rimaran/enc-3des/enc-3des.test     1.90      1.66       -12.6%
test-suite.../Trimaran/enc-md5/enc-md5.test     1.69      1.48       -12.4%
test-suite...yApps-C++/PENNANT/PENNANT.test     0.62      0.70       12.3%
test-suite...arching-flt/Searching-flt.test     3.98      3.49       -12.3%
test-suite...s/Ptrdist/anagram/anagram.test     0.64      0.72       12.2%
Geomean difference                                                    nan%
        stats_opt  stats_master        diff
count  102.000000  101.000000    101.000000
mean   4.070663    4.036584     -0.015677
std    4.994630    4.988885      0.080786
min    0.604600    0.635000     -0.191148
25%    1.660050    1.549400     -0.083075
50%    3.237450    3.072300     -0.005657
75%    4.359050    4.313500      0.046250
max    32.665900   31.948900     0.189496

Compile Time: (unexpectedly improved but, that might just be noise)

Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    45.98     52.93       15.1%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.18      0.20        9.6%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        9.0%
test-suite...hmarks/VersaBench/bmm/bmm.test     0.14      0.15        8.4%
test-suite...marks/SciMark2-C/scimark2.test     0.94      1.02        8.1%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.57      0.61        7.6%
test-suite...marks/7zip/7zip-benchmark.test   122.38    131.64        7.6%
test-suite.../Trimaran/enc-rc4/enc-rc4.test     0.13      0.14        7.5%
test-suite.../Trimaran/enc-md5/enc-md5.test     0.29      0.31        7.5%
test-suite...ngs-C/fixoutput/fixoutput.test     0.16      0.18        7.2%
test-suite...netbench-url/netbench-url.test     0.50      0.53        7.1%
test-suite.../Prolangs-C/loader/loader.test     0.73      0.78        7.1%
test-suite...arks/VersaBench/dbms/dbms.test     2.30      2.47        7.0%
test-suite...langs-C/allroots/allroots.test     0.20      0.22        7.0%
test-suite...ngs-C/assembler/assembler.test     1.38      1.48        6.9%
Geomean difference                                                    0.3%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.392878    5.432042      0.003577
std    13.798743   14.112571     0.039915
min    0.053700    0.052200     -0.067272
25%    0.318475    0.314125     -0.027627
50%    1.606700    1.594650     -0.000062
75%    3.390850    3.419850      0.031306
max    122.376700  131.644500    0.151016

asbirlea added inline comments.May 20 2020, 5:09 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1859	I don't entirely understand the logic here, it would help a lot to add some comments to elaborate. What happens if `IsDead` is true? Seems this would be a good case to eliminate the store if the load/store are from/to the same address. Next, check the condition `Load->getPointerOperand() != SI->getOperand(1)` first, and exit early if the equality condition is not met. I see this roughly as: if (Load->getPointerOperand() == SI->getOperand(1)) { if (LoadAccess == KillingDef->getDefiningAccess()) { State.deleteDeadInstruction(SI); } else { //Note: LoadAccess may not be optimized, if it is the walker call is a no-op. auto *LoadAccessOptimized = MSSAWalker->getClobberingMemoryAccess(MSSA.getMemoryAccess(Load)); if (MSSAWalker->getClobberingMemoryAccess(KillingDef) == LoadAccessOptimized) { State.deleteDeadInstruction(SI); } } } Please let me know if I misunderstood the intention here.
1865	Why is this inserting two accesses then starting the checks from 1? Updates above comment with rough replacement.

Compile Time: (unexpectedly improved but, that might just be noise)

Regressed kinda a lot ?

In D79391#2048050, @xbolva00 wrote:

Compile Time: (unexpectedly improved but, that might just be noise)

Regressed kinda a lot ?

Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression.

@asbirlea thank you for the review! Sorry I didn't get to this sooner, I was super busy last week.

Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression.

I'm pretty sure it's because I recursively walk *every* store's accesses. I suspect checking Load->getPointerOperand() != Store->getOperand(1) *first* will fix this. Running the stats again.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1859	I've re-factored this a bit and added comments. It should be much clearer now. Let me know if not, and I can explain further.

Update based on review:

Fix performance issues.
Factor into its own function.
Add comments.

Execution time:

Tests: 198
Short Running: 97 (filtered out)
Remaining: 101
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...ks/Prolangs-C++/life/life.test     2.30      3.01       30.9%
test-suite...arks/mafft/pairlocalalign.test    12.66     15.31       20.9%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.68      0.83       20.7%
test-suite...Source/Benchmarks/sim/sim.test     2.81      3.38       20.2%
test-suite...urce/Applications/aha/aha.test     1.25      1.02       -18.4%
test-suite...s/Ptrdist/anagram/anagram.test     0.62      0.72       16.9%
test-suite.../Benchmarks/nbench/nbench.test     1.32      1.55       16.9%
test-suite...ks/VersaBench/8b10b/8b10b.test     4.31      5.02       16.4%
test-suite...nchmarks/llubenchmark/llu.test     3.19      3.68       15.1%
test-suite.../Applications/lemon/lemon.test     0.87      0.75       -14.1%
test-suite...tions/lambda-0.1.3/lambda.test     3.50      3.04       -13.1%
test-suite.../VersaBench/ecbdes/ecbdes.test     1.81      2.02       11.7%
test-suite...netbench-url/netbench-url.test     2.97      3.31       11.3%
test-suite...hmarks/VersaBench/bmm/bmm.test     2.34      2.60       11.3%
test-suite...marks/SciMark2-C/scimark2.test    28.48     31.68       11.2%
Geomean difference                                                    nan%
       stats_opt  stats_master       diff
count  99.000000  101.000000    99.000000
mean   3.921419   4.036584      0.035598
std    4.557934   4.988885      0.071592
min    0.617100   0.635000     -0.184291
25%    1.575900   1.549400     -0.003149
50%    3.090300   3.072300      0.022620
75%    4.272200   4.313500      0.070729
max    28.946700  31.948900     0.308558

Compile time:

Tests: 198
Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    43.40     52.93       22.0%
test-suite...arks/mafft/pairlocalalign.test    26.05     30.58       17.4%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test     4.32      4.97       15.1%
test-suite...frame_layout/frame_layout.test     2.47      2.81       13.5%
test-suite...marks/7zip/7zip-benchmark.test   122.67    131.64        7.3%
test-suite...fice-ispell/office-ispell.test     3.99      4.28        7.2%
test-suite...earch/office-stringsearch.test     0.44      0.47        6.8%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        6.8%
test-suite...nsumer-lame/consumer-lame.test    11.69     12.47        6.6%
test-suite...math/automotive-basicmath.test     0.34      0.32       -6.3%
test-suite...lowfish/security-blowfish.test     0.64      0.68        6.2%
test-suite...comm-adpcm/telecomm-adpcm.test     0.15      0.16        6.2%
test-suite.../Prolangs-C++/ocean/ocean.test     0.38      0.35       -6.0%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.19      0.20        5.4%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test     2.84      2.69       -5.3%
Geomean difference                                                    0.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.349209    5.432042      0.004814
std    13.699984   14.112571     0.038243
min    0.053100    0.052200     -0.063188
25%    0.312425    0.314125     -0.025223
50%    1.604600    1.594650      0.001472
75%    3.347775    3.419850      0.026751
max    122.668900  131.644500    0.219606

Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better?

There may be some noise here so I'm going to try running the benchmarks a few more times to make sure they line up. I'm also going to add a statistic for noop stores.

Harbormaster completed remote builds in B57726: Diff 265881.May 23 2020, 2:19 PM

In D79391#2052317, @zoecarver wrote:

Execution time:

Tests: 198
Short Running: 97 (filtered out)
Remaining: 101
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...ks/Prolangs-C++/life/life.test     2.30      3.01       30.9%
test-suite...arks/mafft/pairlocalalign.test    12.66     15.31       20.9%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.68      0.83       20.7%
test-suite...Source/Benchmarks/sim/sim.test     2.81      3.38       20.2%
test-suite...urce/Applications/aha/aha.test     1.25      1.02       -18.4%
test-suite...s/Ptrdist/anagram/anagram.test     0.62      0.72       16.9%
test-suite.../Benchmarks/nbench/nbench.test     1.32      1.55       16.9%
test-suite...ks/VersaBench/8b10b/8b10b.test     4.31      5.02       16.4%
test-suite...nchmarks/llubenchmark/llu.test     3.19      3.68       15.1%
test-suite.../Applications/lemon/lemon.test     0.87      0.75       -14.1%
test-suite...tions/lambda-0.1.3/lambda.test     3.50      3.04       -13.1%
test-suite.../VersaBench/ecbdes/ecbdes.test     1.81      2.02       11.7%
test-suite...netbench-url/netbench-url.test     2.97      3.31       11.3%
test-suite...hmarks/VersaBench/bmm/bmm.test     2.34      2.60       11.3%
test-suite...marks/SciMark2-C/scimark2.test    28.48     31.68       11.2%
Geomean difference                                                    nan%
       stats_opt  stats_master       diff
count  99.000000  101.000000    99.000000
mean   3.921419   4.036584      0.035598
std    4.557934   4.988885      0.071592
min    0.617100   0.635000     -0.184291
25%    1.575900   1.549400     -0.003149
50%    3.090300   3.072300      0.022620
75%    4.272200   4.313500      0.070729
max    28.946700  31.948900     0.308558

Compile time:

Tests: 198
Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    43.40     52.93       22.0%
test-suite...arks/mafft/pairlocalalign.test    26.05     30.58       17.4%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test     4.32      4.97       15.1%
test-suite...frame_layout/frame_layout.test     2.47      2.81       13.5%
test-suite...marks/7zip/7zip-benchmark.test   122.67    131.64        7.3%
test-suite...fice-ispell/office-ispell.test     3.99      4.28        7.2%
test-suite...earch/office-stringsearch.test     0.44      0.47        6.8%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        6.8%
test-suite...nsumer-lame/consumer-lame.test    11.69     12.47        6.6%
test-suite...math/automotive-basicmath.test     0.34      0.32       -6.3%
test-suite...lowfish/security-blowfish.test     0.64      0.68        6.2%
test-suite...comm-adpcm/telecomm-adpcm.test     0.15      0.16        6.2%
test-suite.../Prolangs-C++/ocean/ocean.test     0.38      0.35       -6.0%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.19      0.20        5.4%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test     2.84      2.69       -5.3%
Geomean difference                                                    0.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.349209    5.432042      0.004814
std    13.699984   14.112571     0.038243
min    0.053100    0.052200     -0.063188
25%    0.312425    0.314125     -0.025223
50%    1.604600    1.594650      0.001472
75%    3.347775    3.419850      0.026751
max    122.668900  131.644500    0.219606

Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better?

Runtimes/execution times depend a lot on the system and can be quite noisy. For comparable numbers you have to make sure that the system noise is reduced to a minimum (and you don't build & run the benchmarks in parallel).

I would suggest to focus on the statistics about removed stores and how the patch impacts them. With respect to compile-time, I don't think there's much to worry about with the patch in its current form..

Also, I recently found out that EarlyCSE also does some simple form of load-store forwarding (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/EarlyCSE.cpp#L1214). It might be worth consolidating the DSE logic in the DeadStoreElimination pass in the future, but that's a discussion for the future.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1847	IIUC the MemoryPHI handling should be optional, right? I would recommend splitting it off in a separate patch, as smaller patches make it generally easier to review/submit.
1861	shouldn't that be `continue`? (Test case would be a scenario where we first remove a no-op store and then some other stores by another def in State.MemDefs.

Thanks for the review.

I would suggest to focus on the statistics about removed stores and how the patch impacts them.

What do you think about adding a statistic for noop stores? Or should I increment NumRedundantStores or a different statistic? I think as-is this patch doesn't change any statistics.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1847	Yes, it's optional. I'll make this a separate patch.
1861	Yes, you're right.

Rebase off master.
Continue instead of break.
Add noop store statistic.
Remove phi argument support

Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly).

Harbormaster completed remote builds in B57967: Diff 266390.May 26 2020, 7:05 PM

In D79391#2056342, @zoecarver wrote:

Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly).

I think it may be that currently most of those stores are eliminated earlier by EarlyCSE (see earlier comment). Might be interesting to see what happens with MSSA disabled for EarlyCSE. (just to double check: MSSA based DSE is not enabled by default, so for your measurements you have to make sure it is turned on via -mllvm -enable-dse-memoryssa=true or by setting EnableMemorySSA)

I think it is almost ready, just a question about some of the test changes remaining.

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗	(On Diff #266390)	Is there a reason for this change in the source (and some of the ones below)? I think we should keep the existing tests as is and add additional cases as required.

zoecarver marked an inline comment as done.May 28 2020, 8:39 PM

zoecarver added inline comments.

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗	(On Diff #266390)	The load and store here would be removed. I wasn't sure if the noop store was inadvertent and wanted to keep the test as similar as possible to what I thought it was testing (that neither store is removed). If you'd rather I can remove the changes and update the `CHECK` comments instead.

Might be interesting to see what happens with MSSA disabled for EarlyCSE.

I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded.

In D79391#2062120, @zoecarver wrote:

Might be interesting to see what happens with MSSA disabled for EarlyCSE.

I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded.

Hm I think cmake might not pick up CXXFLAGS, if it's not the initial run. Probably better to pass -DCMAKE_C_FLAGS=/-DCMAKE_CXX_FLAGS=. Or just flip the default for EnableMemorySSA (which I usually do :P) and rebuild clang/libLTO.

To verify, you can check dse.NumFastStores. There should be a lot of changes with/without MemorySSA. For example, with -O3 -flto:

~/projects/test-suites/test-suite/utils/compare.py --filter-hash -m dse.NumFastStores  base.json mssa.json
Tests: 237
Same hash: 104 (filtered out)
Remaining: 133
Metric: dse.NumFastStores

Program                                        base   mssa   diff
 test-suite...-typeset/consumer-typeset.test   162.00 891.00 450.0%
 test-suite...INT2000/164.gzip/164.gzip.test    15.00  67.00 346.7%
 test-suite...ks/Prolangs-C/agrep/agrep.test     2.00   8.00 300.0%
 test-suite...000/183.equake/183.equake.test    10.00  40.00 300.0%
 test-suite...lications/sqlite3/sqlite3.test    22.00  76.00 245.5%
 test-suite...CFP2006/433.milc/433.milc.test    64.00 195.00 204.7%
 test-suite...000/197.parser/197.parser.test     3.00   9.00 200.0%
 test-suite...chmarks/Olden/power/power.test    11.00  33.00 200.0%
 test-suite...lications/ClamAV/clamscan.test    72.00 210.00 191.7%
 test-suite.../Applications/SPASS/SPASS.test    48.00 139.00 189.6%
 test-suite...T2006/445.gobmk/445.gobmk.test    19.00  52.00 173.7%
 test-suite...Source/Benchmarks/sim/sim.test     2.00   5.00 150.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test     9.00  22.00 144.4%
 test-suite...ks/McCat/12-IOtest/iotest.test     6.00  14.00 133.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    12.00  27.00 125.0%
 Geomean difference                                           nan%

For noop stores, it looks like there are the following nook-store changes for the configuration above:

test-suite...:: External/Povray/povray.test    NaN   1.00  nan%
test-suite...CFP2000/177.mesa/177.mesa.test    NaN   1.00  nan%
test-suite...000/183.equake/183.equake.test    NaN  11.00  nan%
test-suite...CFP2006/433.milc/433.milc.test    NaN   1.00  nan%
test-suite...006/447.dealII/447.dealII.test    NaN   2.00  nan%
test-suite...006/453.povray/453.povray.test    NaN  13.00  nan%
test-suite...6/482.sphinx3/482.sphinx3.test    NaN   1.00  nan%
test-suite.../CINT2000/176.gcc/176.gcc.test    NaN   6.00  nan%
test-suite...0/253.perlbmk/253.perlbmk.test    NaN  21.00  nan%
test-suite...0.perlbench/400.perlbench.test    NaN  24.00  nan%
test-suite...T2006/445.gobmk/445.gobmk.test    NaN  25.00  nan%
test-suite...lications/ClamAV/clamscan.test    NaN   2.00  nan%
test-suite.../Applications/SPASS/SPASS.test    NaN   2.00  nan%
test-suite...pplications/oggenc/oggenc.test    NaN   2.00  nan%
test-suite...s/MallocBench/cfrac/cfrac.test    NaN   6.00  nan%
test-suite...chmarks/MallocBench/gs/gs.test    NaN   1.00  nan%
test-suite...TimberWolfMC/timberwolfmc.test    NaN   1.00  nan%
test-suite.../Prolangs-C/loader/loader.test    NaN   1.00  nan%

As mentioned earlier, I think this looks good, once the comments for the tests are addressed :)

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗	(On Diff #266390)	I don't think the current version of the patch would remove anything for this test case (and likely the other modified ones). `%g_addr` and `@g` may alias I think, so the defining access of `store i32 %g_value, i32* %g_addr, align 4` would be `store i32 -1, i32* @g, align 4` and the defining access of `%g_value = load i32, i32* %g_addr, align 4` would be `entry`. Most existing tests intentionally cover cases to exercise various code paths/guard against problematic scenarios encountered in the past. Therefor it would be good to preserve the original input and update the check lines, if there are any changes (those can then be checked during the review). I am not sure if we could remove anything here, as it would require proving it is safe to remove them in all possible alias scenarios for `%g_addr` and `@g`. If they don't alias, we can remove the store to `g_addr`. If they alias exactly (= complete overwrite), we could remove `store i32 %g_value, i32* %g_addr, align 4`, but only if we also remove `store i32 -1, i32* @g, align 4`. But we cannot remove that store if they don't alias.
627 ↗	(On Diff #266390)	Might be good to move the new tests for coop-stores into a separate test file (and please remove the tests from simple-todo.ll)

Update tests based on review

@fhahn I really appreciate all your help with this patch. The tests are updated.

Hm I think cmake might not pick up CXXFLAGS...

I was passing the flags to the llvm-test-suite build but, I also tried when building the mono-repo. I tried both with CXXFLAGS and with DCMAKE_C_FLAGS as suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both) but I still get the same error when running utils/compare.py. I'll take another look at the docs and keep playing around with it to see if I can get it working tomorrow.

Harbormaster completed remote builds in B58529: Diff 267436.May 29 2020, 11:57 PM

LGTM, thanks!

This should only remove stores which we couldn't kill other stores, as the load would block the store killing any other stores.

In D79391#2064706, @zoecarver wrote:

Hm I think cmake might not pick up CXXFLAGS...

suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both)

Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;)

This revision is now accepted and ready to land.May 30 2020, 4:33 AM

Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;)

Ah, there we go! That works now :)

Closed by commit rG065bf124fde8: [DSE] Remove noop stores in MSSA. (authored by zoecarver). · Explain WhyMay 30 2020, 10:03 AM

This revision was automatically updated to reflect the committed changes.

zoecarver mentioned this in D82588: [DSE] Look through memory PHI arguments when removing noop stores in MSSA..Jun 25 2020, 11:09 AM

zoecarver mentioned this in rG6c25816d7b68: [DSE] Look through memory PHI arguments when removing noop stores in MSSA..Oct 1 2020, 10:42 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

28 lines

test/

Transforms/

DeadStoreElimination/

MSSA/

atomic.ll

34 lines

noop-stores.ll

171 lines

simple-todo.ll

119 lines

Diff 267462

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
#define DEBUG_TYPE "dse"		#define DEBUG_TYPE "dse"

STATISTIC(NumRemainingStores, "Number of stores remaining after DSE");		STATISTIC(NumRemainingStores, "Number of stores remaining after DSE");
STATISTIC(NumRedundantStores, "Number of redundant stores deleted");		STATISTIC(NumRedundantStores, "Number of redundant stores deleted");
STATISTIC(NumFastStores, "Number of stores deleted");		STATISTIC(NumFastStores, "Number of stores deleted");
STATISTIC(NumFastOther, "Number of other instrs removed");		STATISTIC(NumFastOther, "Number of other instrs removed");
STATISTIC(NumCompletePartials, "Number of stores dead by later partials");		STATISTIC(NumCompletePartials, "Number of stores dead by later partials");
STATISTIC(NumModifiedStores, "Number of stores modified");		STATISTIC(NumModifiedStores, "Number of stores modified");
		STATISTIC(NumNoopStores, "Number of noop stores deleted");

DEBUG_COUNTER(MemorySSACounter, "dse-memoryssa",		DEBUG_COUNTER(MemorySSACounter, "dse-memoryssa",
"Controls which MemoryDefs are eliminated.");		"Controls which MemoryDefs are eliminated.");

static cl::opt<bool>		static cl::opt<bool>
EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",		EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));
▲ Show 20 Lines • Show All 1,723 Lines • ▼ Show 20 Lines	if (NI->isAtomic()) {
llvm_unreachable(		llvm_unreachable(
"Other instructions should be modeled/skipped in MemorySSA");		"Other instructions should be modeled/skipped in MemorySSA");
}		}

return false;		return false;
}		}
};		};

		/// \returns true if \p KillingDef stores the result of \p Load to the source of
		/// \p Load.
		static bool storeIsNoop(MemorySSA &MSSA, LoadInst *Load,
		MemoryDef *KillingDef) {
		Instruction *Store = KillingDef->getMemoryInst();
		// If the load's operand isn't the destination of the store, bail.
		if (Load->getPointerOperand() != Store->getOperand(1))
		return false;

		// Get the defining access for the load.
		auto *LoadAccess = MSSA.getMemoryAccess(Load)->getDefiningAccess();
		// The store is dead if the defining accesses are the same.
		return LoadAccess == KillingDef->getDefiningAccess();
		}

bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA,		bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA,
MemorySSA &MSSA, DominatorTree &DT,		MemorySSA &MSSA, DominatorTree &DT,
PostDominatorTree &PDT,		PostDominatorTree &PDT,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
const DataLayout &DL = F.getParent()->getDataLayout();		const DataLayout &DL = F.getParent()->getDataLayout();
bool MadeChange = false;		bool MadeChange = false;

DSEState State = DSEState::get(F, AA, MSSA, DT, PDT, TLI);		DSEState State = DSEState::get(F, AA, MSSA, DT, PDT, TLI);
		fhahnUnsubmitted Not Done Reply Inline Actions IIUC the MemoryPHI handling should be optional, right? I would recommend splitting it off in a separate patch, as smaller patches make it generally easier to review/submit. fhahn: IIUC the MemoryPHI handling should be optional, right? I would recommend splitting it off in a…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Yes, it's optional. I'll make this a separate patch. zoecarver: Yes, it's optional. I'll make this a separate patch.
// For each store:		// For each store:
for (unsigned I = 0; I < State.MemDefs.size(); I++) {		for (unsigned I = 0; I < State.MemDefs.size(); I++) {
MemoryDef *KillingDef = State.MemDefs[I];		MemoryDef *KillingDef = State.MemDefs[I];
if (State.SkipStores.count(KillingDef))		if (State.SkipStores.count(KillingDef))
continue;		continue;
Instruction *SI = KillingDef->getMemoryInst();		Instruction *SI = KillingDef->getMemoryInst();

		// Check if we're storing a value that we just loaded.
		if (auto *Load = dyn_cast<LoadInst>(SI->getOperand(0))) {
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Forgot about the naming convention, my bad. Will fix. zoecarver: Forgot about the naming convention, my bad. Will fix.
		if (storeIsNoop(MSSA, Load, KillingDef)) {
		fhahnUnsubmitted Not Done Reply Inline Actions I think the check is not strict enough at the moment. I think we have to ensure that there are no memory-writes between the load and the store that may alias them. In the simple example below, IIUC the patch would currently incorrectly eliminate `store i32 %a, i32* %Q`. define void @test(i32* %Q) { %a = load i32, i32* %Q store i32 10, i32* %Q store i32 %a, i32* %Q ret void } In terms of MemorySSA, I think we are looking for loads that use the same MemoryDef as `KillingDef`. One simple way to check this would be getting the memoryDef of KillingDef and check if it dominates the load I think. fhahn: I think the check is not strict enough at the moment. I think we have to ensure that there are…
		asbirleaUnsubmitted Not Done Reply Inline Actions +1. Please add a testcase similar to the example fhahn@ gave. asbirlea: +1. Please add a testcase similar to the example fhahn@ gave.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Will do. I tried the test case and it fails. I'll need to re-write most of this patch (minus the tests). I think @fhahn's solution will work. I probably won't get to this today but I should have time this weekend. zoecarver: Will do. I tried the test case and it fails. I'll need to re-write most of this patch (minus…
		fhahnUnsubmitted Not Done Reply Inline Actions Awesome, thanks! Feel free to let us know if you have any questions. fhahn: Awesome, thanks! Feel free to let us know if you have any questions.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions I think we are looking for loads that use the same MemoryDef as KillingDef. I tried playing around with getting the memory access of `Load` but, I'm not sure how it should be the "same" as KillingDef (I don't know a whole lot about MSSA). I also tried analyzing `ToCheck` but, I ended up essentially having to implement this check lower down which meant missing some optimizations. zoecarver: > I think we are looking for loads that use the same MemoryDef as KillingDef. I tried playing…
		fhahnUnsubmitted Not Done Reply Inline Actions I was referring to MemoryDef/MemoryUse in MemorySSA (see http://llvm.org/docs/MemorySSA.html for a bit more details) The problem with directly looking at the def-use chains in the IR is that it does not necessarily include all memory instructions that might overwrite a the same memory location. MemorySSA provides an 'virtual SSA IR' for memory operations. I think you can make sure that there are no writes may-alias writes in between Load and SI by getting their defining accesses using MemorySSA (https://llvm.org/doxygen/classllvm_1_1MemoryUseOrDef.html#abc4e1b6cc483cfd57ca75c4ab92bd360). fhahn: I was referring to MemoryDef/MemoryUse in MemorySSA (see http://llvm.org/docs/MemorySSA.html…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Heh, I probably should have spent some time reading the docs before diving right in. I get it now. This is actually way simpler (and probably safer). zoecarver: Heh, I probably should have spent some time reading the docs before diving right in. I get it…
		State.deleteDeadInstruction(SI);
		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *SI
		asbirleaUnsubmitted Not Done Reply Inline Actions I don't entirely understand the logic here, it would help a lot to add some comments to elaborate. What happens if `IsDead` is true? Seems this would be a good case to eliminate the store if the load/store are from/to the same address. Next, check the condition `Load->getPointerOperand() != SI->getOperand(1)` first, and exit early if the equality condition is not met. I see this roughly as: if (Load->getPointerOperand() == SI->getOperand(1)) { if (LoadAccess == KillingDef->getDefiningAccess()) { State.deleteDeadInstruction(SI); } else { //Note: LoadAccess may not be optimized, if it is the walker call is a no-op. auto LoadAccessOptimized = MSSAWalker->getClobberingMemoryAccess(MSSA.getMemoryAccess(Load)); if (MSSAWalker->getClobberingMemoryAccess(KillingDef) == LoadAccessOptimized) { State.deleteDeadInstruction(SI); } } } Please let me know if I misunderstood the intention here. asbirlea:* I don't entirely understand the logic here, it would help a lot to add some comments to…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions I've re-factored this a bit and added comments. It should be much clearer now. Let me know if not, and I can explain further. zoecarver: I've re-factored this a bit and added comments. It should be much clearer now. Let me know if…
		<< '\n');
		NumNoopStores++;
		fhahnUnsubmitted Not Done Reply Inline Actions shouldn't that be `continue`? (Test case would be a scenario where we first remove a no-op store and then some other stores by another def in State.MemDefs. fhahn: shouldn't that be `continue`? (Test case would be a scenario where we first remove a no-op…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Yes, you're right. zoecarver: Yes, you're right.
		continue;
		}
		}

		asbirleaUnsubmitted Not Done Reply Inline Actions Why is this inserting two accesses then starting the checks from 1? Updates above comment with rough replacement. asbirlea: Why is this inserting two accesses then starting the checks from 1? Updates above comment…
auto MaybeSILoc = State.getLocForWriteEx(SI);		auto MaybeSILoc = State.getLocForWriteEx(SI);
if (!MaybeSILoc) {		if (!MaybeSILoc) {
LLVM_DEBUG(dbgs() << "Failed to find analyzable write location for "		LLVM_DEBUG(dbgs() << "Failed to find analyzable write location for "
<< *SI << "\n");		<< *SI << "\n");
continue;		continue;
}		}
MemoryLocation SILoc = *MaybeSILoc;		MemoryLocation SILoc = *MaybeSILoc;
assert(SILoc.Ptr && "SILoc should not be null");		assert(SILoc.Ptr && "SILoc should not be null");
▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; CHECK: store i32 0			; CHECK: store i32 0
	; CHECK: store i32 1			; CHECK: store i32 1
	store i32 0, i32* @x			store i32 0, i32* @x
	store atomic i32 0, i32* @y release, align 4			store atomic i32 0, i32* @y release, align 4
	%x = load atomic i32, i32* @y acquire, align 4			%x = load atomic i32, i32* @y acquire, align 4
	store i32 1, i32* @x			store i32 1, i32* @x
	ret i32 %x			ret i32 %x
	}			}

				; ** Noop load->store tests ************************************************

				; We can optimize unordered atomic loads or stores.
				define void @test_load_atomic(i32* %Q) {
				; CHECK-LABEL: @test_load_atomic(
				; CHECK-NEXT: ret void
				;
				%a = load atomic i32, i32* %Q unordered, align 4
				store atomic i32 %a, i32* %Q unordered, align 4
				ret void
				}

				; We can optimize unordered atomic loads or stores.
				define void @test_store_atomic(i32* %Q) {
				; CHECK-LABEL: @test_store_atomic(
				; CHECK-NEXT: ret void
				;
				%a = load i32, i32* %Q
				store atomic i32 %a, i32* %Q unordered, align 4
				ret void
				}

				; We can NOT optimize release atomic loads or stores.
				define void @test_store_atomic_release(i32* %Q) {
				; CHECK-LABEL: @test_store_atomic_release(
				; CHECK-NEXT: load
				; CHECK-NEXT: store atomic
				; CHECK-NEXT: ret void
				;
				%a = load i32, i32* %Q
				store atomic i32 %a, i32* %Q release, align 4
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/noop-stores.ll

This file was added.

				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
				target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
				declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
				declare void @llvm.init.trampoline(i8, i8, i8*)

				; ** Noop load->store tests ************************************************

				; We CAN optimize volatile loads.
				define void @test_load_volatile(i32* %Q) {
				; CHECK-LABEL: @test_load_volatile(
				; CHECK-NEXT: [[A:%.]] = load volatile i32, i32 [[Q:%.*]]
				; CHECK-NEXT: store i32 [[A]], i32* [[Q]]
				; CHECK-NEXT: ret void
				;
				%a = load volatile i32, i32* %Q
				store i32 %a, i32* %Q
				ret void
				}

				; We can NOT optimize volatile stores.
				define void @test_store_volatile(i32* %Q) {
				; CHECK-LABEL: @test_store_volatile(
				; CHECK-NEXT: [[A:%.]] = load i32, i32 [[Q:%.*]]
				; CHECK-NEXT: store volatile i32 [[A]]
				; CHECK-NEXT: ret void
				;
				%a = load i32, i32* %Q
				store volatile i32 %a, i32* %Q
				ret void
				}

				; PR2599 - load -> store to same address.
				define void @test12({ i32, i32 }* %x) nounwind {
				; CHECK-LABEL: @test12(
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr { i32, i32 }, { i32, i32 } [[X:%.*]], i32 0, i32 1
				; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4
				; CHECK-NEXT: [[TMP17:%.*]] = sub i32 0, [[TMP8]]
				; CHECK-NEXT: store i32 [[TMP17]], i32* [[TMP7]], align 4
				; CHECK-NEXT: ret void
				;
				%tmp4 = getelementptr { i32, i32 }, { i32, i32 }* %x, i32 0, i32 0
				%tmp5 = load i32, i32* %tmp4, align 4
				%tmp7 = getelementptr { i32, i32 }, { i32, i32 }* %x, i32 0, i32 1
				%tmp8 = load i32, i32* %tmp7, align 4
				%tmp17 = sub i32 0, %tmp8
				store i32 %tmp5, i32* %tmp4, align 4
				store i32 %tmp17, i32* %tmp7, align 4
				ret void
				}

				; Remove redundant store if loaded value is in another block.
				define i32 @test26(i1 %c, i32* %p) {
				; CHECK-LABEL: @test26(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C:%.]], label [[BB1:%.]], label [[BB2:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%v = load i32, i32* %p, align 4
				br i1 %c, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				store i32 %v, i32* %p, align 4
				br label %bb3
				bb3:
				ret i32 0
				}

				; Remove redundant store if loaded value is in another block.
				define i32 @test27(i1 %c, i32* %p) {
				; CHECK-LABEL: @test27(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C:%.]], label [[BB1:%.]], label [[BB2:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%v = load i32, i32* %p, align 4
				br i1 %c, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 %v, i32* %p, align 4
				ret i32 0
				}

				declare void @unknown_func()

				; Remove redundant store, which is in the lame loop as the load.
				define i32 @test33(i1 %c, i32* %p, i32 %i) {
				; CHECK-LABEL: @test33(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB2:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br i1 undef, label [[BB1]], label [[BB3:%.*]]
				; CHECK: bb3:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %bb1
				bb1:
				%v = load i32, i32* %p, align 4
				br label %bb2
				bb2:
				store i32 %v, i32* %p, align 4
				; Might read and overwrite value at %p, but doesn't matter.
				call void @unknown_func()
				br i1 undef, label %bb1, label %bb3
				bb3:
				ret i32 0
				}

				declare void @unkown_write(i32*)

				; We can't remove the "noop" store around an unkown write.
				define void @test43(i32* %Q) {
				; CHECK-LABEL: @test43(
				; CHECK-NEXT: load
				; CHECK-NEXT: call
				; CHECK-NEXT: store
				; CHECK-NEXT: ret void
				;
				%a = load i32, i32* %Q
				call void @unkown_write(i32* %Q)
				store i32 %a, i32* %Q
				ret void
				}

				; We CAN remove it when the unkown write comes AFTER.
				define void @test44(i32* %Q) {
				; CHECK-LABEL: @test44(
				; CHECK-NEXT: call
				; CHECK-NEXT: ret void
				%a = load i32, i32* %Q
				store i32 %a, i32* %Q
				call void @unkown_write(i32* %Q)
				ret void
				}

				define void @test45(i32* %Q) {
				; CHECK-LABEL: @test45(
				; CHECK-NEXT: [[A:%.*]] = load
				; CHECK-NEXT: store i32 [[A]]
				; CHECK-NEXT: ret void
				%a = load i32, i32* %Q
				store i32 10, i32* %Q
				store i32 %a, i32* %Q
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/simple-todo.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; XFAIL: *			; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
	declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind			declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
	declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
	declare void @llvm.init.trampoline(i8, i8, i8*)			declare void @llvm.init.trampoline(i8, i8, i8*)

	define void @test5(i32* %Q) {
	; CHECK-LABEL: @test5(
	; CHECK-NEXT: [[A:%.]] = load volatile i32, i32 [[Q:%.*]]
	; CHECK-NEXT: ret void
	;
	%a = load volatile i32, i32* %Q
	store i32 %a, i32* %Q
	ret void
	}

	; Do not delete stores that are only partially killed.			; Do not delete stores that are only partially killed.
	define i32 @test8() {			define i32 @test8() {
	; CHECK-LABEL: @test8(			; CHECK-LABEL: @test8(
	; CHECK-NEXT: [[V:%.*]] = alloca i32			; CHECK-NEXT: [[V:%.*]] = alloca i32
	; CHECK-NEXT: store i32 1234567, i32* [[V]]			; CHECK-NEXT: store i32 1234567, i32* [[V]]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[V]]			; CHECK-NEXT: [[X:%.]] = load i32, i32 [[V]]
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	;			;
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%storage = alloca [10 x i8], align 16 ; <[10 x i8]*> [#uses=1]			%storage = alloca [10 x i8], align 16 ; <[10 x i8]*> [#uses=1]
	%cast = getelementptr [10 x i8], [10 x i8]* %storage, i32 0, i32 0 ; <i8*> [#uses=1]			%cast = getelementptr [10 x i8], [10 x i8]* %storage, i32 0, i32 0 ; <i8*> [#uses=1]
	call void @llvm.init.trampoline( i8* %cast, i8* bitcast (void ()* @test11f to i8), i8 null ) ; <i8*> [#uses=1]			call void @llvm.init.trampoline( i8* %cast, i8* bitcast (void ()* @test11f to i8), i8 null ) ; <i8*> [#uses=1]
	ret void			ret void
	}			}

	; PR2599 - load -> store to same address.
	define void @test12({ i32, i32 }* %x) nounwind {
	; CHECK-LABEL: @test12(
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr { i32, i32 }, { i32, i32 } [[X:%.*]], i32 0, i32 1
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4
	; CHECK-NEXT: [[TMP17:%.*]] = sub i32 0, [[TMP8]]
	; CHECK-NEXT: store i32 [[TMP17]], i32* [[TMP7]], align 4
	; CHECK-NEXT: ret void
	;
	%tmp4 = getelementptr { i32, i32 }, { i32, i32 }* %x, i32 0, i32 0
	%tmp5 = load i32, i32* %tmp4, align 4
	%tmp7 = getelementptr { i32, i32 }, { i32, i32 }* %x, i32 0, i32 1
	%tmp8 = load i32, i32* %tmp7, align 4
	%tmp17 = sub i32 0, %tmp8
	store i32 %tmp5, i32* %tmp4, align 4
	store i32 %tmp17, i32* %tmp7, align 4
	ret void
	}


	declare noalias i8* @malloc(i32)			declare noalias i8* @malloc(i32)
	declare noalias i8* @calloc(i32, i32)			declare noalias i8* @calloc(i32, i32)

	define void @test14(i32* %Q) {			define void @test14(i32* %Q) {
	; CHECK-LABEL: @test14(			; CHECK-LABEL: @test14(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	Show All 28 Lines
	;			;
	%k.addr = alloca i32			%k.addr = alloca i32
	%m.addr = alloca i32			%m.addr = alloca i32
	%k.addr.m.addr = select i1 %i, i32* %k.addr, i32* %m.addr			%k.addr.m.addr = select i1 %i, i32* %k.addr, i32* %m.addr
	store i32 0, i32* %k.addr.m.addr, align 4			store i32 0, i32* %k.addr.m.addr, align 4
	ret void			ret void
	}			}

	; Remove redundant store if loaded value is in another block.
	define i32 @test26(i1 %c, i32* %p) {
	; CHECK-LABEL: @test26(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[C:%.]], label [[BB1:%.]], label [[BB2:%.*]]
	; CHECK: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:
	; CHECK-NEXT: ret i32 0
	;
	entry:
	%v = load i32, i32* %p, align 4
	br i1 %c, label %bb1, label %bb2
	bb1:
	br label %bb3
	bb2:
	store i32 %v, i32* %p, align 4
	br label %bb3
	bb3:
	ret i32 0
	}

	; Remove redundant store if loaded value is in another block.
	define i32 @test27(i1 %c, i32* %p) {
	; CHECK-LABEL: @test27(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[C:%.]], label [[BB1:%.]], label [[BB2:%.*]]
	; CHECK: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:
	; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:
	; CHECK-NEXT: ret i32 0
	;
	entry:
	%v = load i32, i32* %p, align 4
	br i1 %c, label %bb1, label %bb2
	bb1:
	br label %bb3
	bb2:
	br label %bb3
	bb3:
	store i32 %v, i32* %p, align 4
	ret i32 0
	}

	declare void @unknown_func()			declare void @unknown_func()

	; Remove redundant store if loaded value is in another block inside a loop.			; Remove redundant store if loaded value is in another block inside a loop.
	define i32 @test31(i1 %c, i32* %p, i32 %i) {			define i32 @test31(i1 %c, i32* %p, i32 %i) {
	; CHECK-LABEL: @test31(			; CHECK-LABEL: @test31(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br i1 undef, label [[BB1]], label [[BB2:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB1]], label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%v = load i32, i32* %p, align 4			%v = load i32, i32* %p, align 4
	br label %bb1			br label %bb1
	bb1:			bb1:
	store i32 %v, i32* %p, align 4			store i32 %v, i32* %p, align 4
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2
	bb2:			bb2:
	ret i32 0			ret i32 0
	}			}

	; Remove redundant store, which is in the lame loop as the load.
	define i32 @test33(i1 %c, i32* %p, i32 %i) {
	; CHECK-LABEL: @test33(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:
	; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:
	; CHECK-NEXT: call void @unknown_func()
	; CHECK-NEXT: br i1 undef, label [[BB1]], label [[BB3:%.*]]
	; CHECK: bb3:
	; CHECK-NEXT: ret i32 0
	;
	entry:
	br label %bb1
	bb1:
	%v = load i32, i32* %p, align 4
	br label %bb2
	bb2:
	store i32 %v, i32* %p, align 4
	; Might read and overwrite value at %p, but doesn't matter.
	call void @unknown_func()
	br i1 undef, label %bb1, label %bb3
	bb3:
	ret i32 0
	}

	define void @test43(i32* %P, i32* noalias %Q) {
	; CHECK-LABEL: @test43(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: store i32 50331649, i32* [[P:%.*]]
	; CHECK-NEXT: store i32 2, i32* [[Q:%.*]]
	; CHECK-NEXT: ret void
	;
	entry:
	store i32 1, i32* %P
	%P2 = bitcast i32* %P to i8*
	store i32 2, i32* %Q
	store i8 3, i8* %P2
	ret void
	}

	define void @test43a(i32* %P, i32* noalias %Q) {			define void @test43a(i32* %P, i32* noalias %Q) {
	; CHECK-LABEL: @test43a(			; CHECK-LABEL: @test43a(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: store atomic i32 50331649, i32* [[P:%.*]] unordered, align 4			; CHECK-NEXT: store atomic i32 50331649, i32* [[P:%.*]] unordered, align 4
	; CHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4			; CHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	store atomic i32 1, i32* %P unordered, align 4			store atomic i32 1, i32* %P unordered, align 4
	%P2 = bitcast i32* %P to i8*			%P2 = bitcast i32* %P to i8*
	store atomic i32 2, i32* %Q unordered, align 4			store atomic i32 2, i32* %Q unordered, align 4
	store atomic i8 3, i8* %P2 unordered, align 4			store atomic i8 3, i8* %P2 unordered, align 4
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Remove noop stores in MSSA.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267462

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/noop-stores.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/simple-todo.ll

[DSE] Remove noop stores in MSSA.
ClosedPublic