This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Remove noop stores in MSSA.
ClosedPublic

Authored by zoecarver on May 4 2020, 9:01 PM.

Details

Summary

Adds a simple fast-path check for the pattern:

v = load ptr
store v to ptr

I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent).

Refs: https://bugs.llvm.org/show_bug.cgi?id=45795

Diff Detail

Event Timeline

zoecarver created this revision.May 4 2020, 9:01 PM
zoecarver marked an inline comment as done.May 4 2020, 9:03 PM
zoecarver added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1856

Forgot about the naming convention, my bad. Will fix.

jfb added inline comments.May 5 2020, 8:21 AM
llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
588 ↗(On Diff #262001)

Can you add the reverse:

%a = load i32, i32* %Q
store volatile i32 %a, i32* %Q

This one we can't eliminate.

As well as, can you add these two cases with atomic instead? We can do some optimizations there but need to be careful around release sequences, see https://wg21.link/n4455

zoecarver marked an inline comment as done.May 5 2020, 11:19 AM
zoecarver added inline comments.
llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
588 ↗(On Diff #262001)

Good call. Luckily this is done when building the DSEState so I won't have to add checks for it in this patch. But I have added both the atomic and volatile tests.

zoecarver updated this revision to Diff 262179.May 5 2020, 11:20 AM
  • Volatile and atomic tests
  • Local var spelling
fhahn requested changes to this revision.May 6 2020, 3:16 AM
fhahn added reviewers: asbirlea, george.burgess.iv.

Thanks for working on this! Some comments inline.

Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

I think the check is not strict enough at the moment. I think we have to ensure that there are no memory-writes between the load and the store that may alias them.

In the simple example below, IIUC the patch would currently incorrectly eliminate store i32 %a, i32* %Q.

define void @test(i32* %Q) {
  %a = load i32, i32* %Q
  store i32 10, i32* %Q
  store i32 %a, i32* %Q
  ret void
}

In terms of MemorySSA, I think we are looking for loads that use the same MemoryDef as KillingDef. One simple way to check this would be getting the memoryDef of KillingDef and check if it dominates the load I think.

This revision now requires changes to proceed.May 6 2020, 3:16 AM
asbirlea added inline comments.May 8 2020, 10:51 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

+1.
Please add a testcase similar to the example fhahn@ gave.

zoecarver marked an inline comment as done.May 8 2020, 11:17 AM
zoecarver added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

Will do. I tried the test case and it fails. I'll need to re-write most of this patch (minus the tests). I think @fhahn's solution will work. I probably won't get to this today but I should have time this weekend.

fhahn added inline comments.May 11 2020, 10:31 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

Awesome, thanks! Feel free to let us know if you have any questions.

zoecarver updated this revision to Diff 263563.May 12 2020, 4:18 PM
  • Analyze previous writes when trying to remove noop store

Another transform that might be worth exploring would be propagating a stored value to loads (users of the MemoryDef that exactly alias the def). Removing loads might enable further store elimination, if done early

I'm happy to work on this next.

zoecarver marked an inline comment as done.May 12 2020, 4:31 PM
zoecarver added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

I think we are looking for loads that use the same MemoryDef as KillingDef.

I tried playing around with getting the memory access of Load but, I'm not sure how it should be the "same" as KillingDef (I don't know a whole lot about MSSA).

I also tried analyzing ToCheck but, I ended up essentially having to implement this check lower down which meant missing some optimizations.

zoecarver updated this revision to Diff 263588.May 12 2020, 6:44 PM
  • Fix tests.

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit?

Thanks for all the help with this patch :)

fhahn added a comment.May 15 2020, 1:43 PM

I usually try to evaluate the impact by comparing the stats for DeadStoreElimination on the test suite with and without the patch. It would probably be good to do this for this patch as well :)

Sounds good. What's the best way to do this? It looks like I can pass the -stats option to opt but, is there a way to get this behavior across the whole test suit?

Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats. The final .json file after running the test-suite will contain the statistics collected. For more details on how to use the test-suite, see https://llvm.org/docs/TestSuiteGuide.html

I usually use something like

cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite

ninja -j 16
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

I was referring to MemoryDef/MemoryUse in MemorySSA (see http://llvm.org/docs/MemorySSA.html for a bit more details)

The problem with directly looking at the def-use chains in the IR is that it does not necessarily include all memory instructions that might overwrite a the same memory location. MemorySSA provides an 'virtual SSA IR' for memory operations.

I think you can make sure that there are no writes may-alias writes in between Load and SI by getting their defining accesses using MemorySSA (https://llvm.org/doxygen/classllvm_1_1MemoryUseOrDef.html#abc4e1b6cc483cfd57ca75c4ab92bd360).

fhahn added a comment.May 15 2020, 1:46 PM

Accidentally hit send too early.

....

I usually use something like

cmake -G Ninja \
    -DCMAKE_C_COMPILER=/path/to/llvm/build/bin/clang \
    -DCMAKE_CXX_COMPILER=/path/to/llvm/build/bin/clang++ \
    -DCMAKE_C_FLAGS="-O3" -DCMAKE_CXX_FLAGS="-O3 " \
   -DTEST_SUITE_SUBDIRS=MultiSource;External \
    -DTEST_SUITE_COLLECT_STATS=On \
    /path/to/test-suite

ninja -j 16

/path/to/llvm-lit -j16 . -o  stats.json
zoecarver marked an inline comment as done.May 19 2020, 9:45 PM
zoecarver added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1857

Heh, I probably should have spent some time reading the docs before diving right in. I get it now. This is actually way simpler (and probably safer).

zoecarver updated this revision to Diff 265130.May 19 2020, 9:45 PM
  • Rebase off master
  • Re-implement with MSSA
zoecarver updated this revision to Diff 265131.May 19 2020, 9:47 PM
  • Formatting

Yes, when you build the test-suite, you can set TEST_SUITE_COLLECT_STATS=On to collect stats.

Thanks that's super helpful.


Here are the results:

Redundant Stores: (no change)

Metric: dse.NumRedundantStores

Program                                        stats_opt stats_master diff

test-suite...lications/ClamAV/clamscan.test     2.00      2.00        0.0%
test-suite...ications/JM/ldecod/ldecod.test     3.00      3.00        0.0%
test-suite...ications/JM/lencod/lencod.test     4.00      4.00        0.0%
test-suite.../Applications/SPASS/SPASS.test     2.00      2.00        0.0%
test-suite.../Applications/lemon/lemon.test     2.00      2.00        0.0%
test-suite...pplications/oggenc/oggenc.test     8.00      8.00        0.0%
test-suite...pplications/treecc/treecc.test     4.00      4.00        0.0%
test-suite...marks/7zip/7zip-benchmark.test     6.00      6.00        0.0%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test     1.00      1.00        0.0%
test-suite...hmarks/McCat/15-trie/trie.test     1.00      1.00        0.0%
test-suite...nsumer-lame/consumer-lame.test     3.00      3.00        0.0%
test-suite...TimberWolfMC/timberwolfmc.test     1.00      1.00        0.0%
test-suite.../Prolangs-C/loader/loader.test     1.00      1.00        0.0%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     1.00      1.00        0.0%
test-suite...arks/mafft/pairlocalalign.test     6.00      6.00        0.0%
Geomean difference                                                    nan%
       stats_opt  stats_master  diff
count  15.000000  15.000000     15.0
mean   3.000000   3.000000      0.0
std    2.203893   2.203893      0.0
min    1.000000   1.000000      0.0
25%    1.000000   1.000000      0.0
50%    2.000000   2.000000      0.0
75%    4.000000   4.000000      0.0
max    8.000000   8.000000      0.0

Execution Time: (slight improvement)

Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...urce/Applications/hbd/hbd.test     0.00      0.00       -28.1%
test-suite...adpcm/rawcaudio/rawcaudio.test     0.00      0.00       -23.1%
test-suite...itBench/uudecode/uudecode.test     0.10      0.12       20.6%
test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite...count/automotive-bitcount.test     0.09      0.08       -16.9%
test-suite...ks/McCat/12-IOtest/iotest.test     0.28      0.33       16.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...chmarks/MallocBench/gs/gs.test     0.04      0.05       14.6%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...ve-susan/automotive-susan.test     0.04      0.04       -14.4%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
Geomean difference                                                   -1.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   2.136898    2.102667     -0.010418
std    4.097594    4.069014      0.079837
min    0.000800    0.000900     -0.281250
25%    0.036075    0.035250     -0.068807
50%    0.616700    0.650100      0.000000
75%    3.317450    3.089325      0.045511
max    32.665900   31.948900     0.205853

Execution Time (filter short): (mostly improved)

Tests: 198
Short Running: 96 (filtered out)
Remaining: 102
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...urce/Applications/aha/aha.test     1.26      1.02       -19.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.69      0.83       18.9%
test-suite...flt/LoopRestructuring-flt.test     4.32      3.57       -17.4%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.88      1.01       14.7%
test-suite...arching-dbl/Searching-dbl.test     4.16      3.55       -14.6%
test-suite...t/StatementReordering-flt.test     3.36      2.89       -14.0%
test-suite...mbolics-flt/Symbolics-flt.test     1.26      1.09       -13.8%
test-suite...mbolics-dbl/Symbolics-dbl.test     3.16      2.73       -13.7%
test-suite...C/Packing-flt/Packing-flt.test     4.71      4.10       -13.0%
test-suite...ctions-dbl/Reductions-dbl.test     4.06      3.54       -12.7%
test-suite...rimaran/enc-3des/enc-3des.test     1.90      1.66       -12.6%
test-suite.../Trimaran/enc-md5/enc-md5.test     1.69      1.48       -12.4%
test-suite...yApps-C++/PENNANT/PENNANT.test     0.62      0.70       12.3%
test-suite...arching-flt/Searching-flt.test     3.98      3.49       -12.3%
test-suite...s/Ptrdist/anagram/anagram.test     0.64      0.72       12.2%
Geomean difference                                                    nan%
        stats_opt  stats_master        diff
count  102.000000  101.000000    101.000000
mean   4.070663    4.036584     -0.015677
std    4.994630    4.988885      0.080786
min    0.604600    0.635000     -0.191148
25%    1.660050    1.549400     -0.083075
50%    3.237450    3.072300     -0.005657
75%    4.359050    4.313500      0.046250
max    32.665900   31.948900     0.189496

Compile Time: (unexpectedly improved but, that might just be noise)

Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    45.98     52.93       15.1%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.18      0.20        9.6%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        9.0%
test-suite...hmarks/VersaBench/bmm/bmm.test     0.14      0.15        8.4%
test-suite...marks/SciMark2-C/scimark2.test     0.94      1.02        8.1%
test-suite.../Benchmarks/Ptrdist/ft/ft.test     0.57      0.61        7.6%
test-suite...marks/7zip/7zip-benchmark.test   122.38    131.64        7.6%
test-suite.../Trimaran/enc-rc4/enc-rc4.test     0.13      0.14        7.5%
test-suite.../Trimaran/enc-md5/enc-md5.test     0.29      0.31        7.5%
test-suite...ngs-C/fixoutput/fixoutput.test     0.16      0.18        7.2%
test-suite...netbench-url/netbench-url.test     0.50      0.53        7.1%
test-suite.../Prolangs-C/loader/loader.test     0.73      0.78        7.1%
test-suite...arks/VersaBench/dbms/dbms.test     2.30      2.47        7.0%
test-suite...langs-C/allroots/allroots.test     0.20      0.22        7.0%
test-suite...ngs-C/assembler/assembler.test     1.38      1.48        6.9%
Geomean difference                                                    0.3%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.392878    5.432042      0.003577
std    13.798743   14.112571     0.039915
min    0.053700    0.052200     -0.067272
25%    0.318475    0.314125     -0.027627
50%    1.606700    1.594650     -0.000062
75%    3.390850    3.419850      0.031306
max    122.376700  131.644500    0.151016
asbirlea added inline comments.May 20 2020, 5:09 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1859

I don't entirely understand the logic here, it would help a lot to add some comments to elaborate.

What happens if IsDead is true? Seems this would be a good case to eliminate the store if the load/store are from/to the same address.

Next, check the condition Load->getPointerOperand() != SI->getOperand(1) first, and exit early if the equality condition is not met.

I see this roughly as:

if (Load->getPointerOperand() == SI->getOperand(1)) {
  if (LoadAccess == KillingDef->getDefiningAccess()) {
    State.deleteDeadInstruction(SI);
  } else {
    //Note: LoadAccess may not be optimized, if it is the walker call is a no-op.
    auto *LoadAccessOptimized = MSSAWalker->getClobberingMemoryAccess(MSSA.getMemoryAccess(Load));
    if (MSSAWalker->getClobberingMemoryAccess(KillingDef) == LoadAccessOptimized) {
      State.deleteDeadInstruction(SI);
    }
  }
}

Please let me know if I misunderstood the intention here.

1865

Why is this inserting two accesses then starting the checks from 1?

Updates above comment with rough replacement.

Compile Time: (unexpectedly improved but, that might just be noise)

Regressed kinda a lot ?

Compile Time: (unexpectedly improved but, that might just be noise)

Regressed kinda a lot ?

Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression.

zoecarver marked an inline comment as done.May 23 2020, 12:39 PM

@asbirlea thank you for the review! Sorry I didn't get to this sooner, I was super busy last week.

Yup! 7%-15% is huge. I'm hoping this is due to the unnecessary processing I commented on, but won't know until tested with the changes. I'm pretty curios which part led to that big of a regression.

I'm pretty sure it's because I recursively walk *every* store's accesses. I suspect checking Load->getPointerOperand() != Store->getOperand(1) *first* will fix this. Running the stats again.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1859

I've re-factored this a bit and added comments. It should be much clearer now. Let me know if not, and I can explain further.

Update based on review:

  • Fix performance issues.
  • Factor into its own function.
  • Add comments.

Execution time:

Tests: 198
Short Running: 97 (filtered out)
Remaining: 101
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...ks/Prolangs-C++/life/life.test     2.30      3.01       30.9%
test-suite...arks/mafft/pairlocalalign.test    12.66     15.31       20.9%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.68      0.83       20.7%
test-suite...Source/Benchmarks/sim/sim.test     2.81      3.38       20.2%
test-suite...urce/Applications/aha/aha.test     1.25      1.02       -18.4%
test-suite...s/Ptrdist/anagram/anagram.test     0.62      0.72       16.9%
test-suite.../Benchmarks/nbench/nbench.test     1.32      1.55       16.9%
test-suite...ks/VersaBench/8b10b/8b10b.test     4.31      5.02       16.4%
test-suite...nchmarks/llubenchmark/llu.test     3.19      3.68       15.1%
test-suite.../Applications/lemon/lemon.test     0.87      0.75       -14.1%
test-suite...tions/lambda-0.1.3/lambda.test     3.50      3.04       -13.1%
test-suite.../VersaBench/ecbdes/ecbdes.test     1.81      2.02       11.7%
test-suite...netbench-url/netbench-url.test     2.97      3.31       11.3%
test-suite...hmarks/VersaBench/bmm/bmm.test     2.34      2.60       11.3%
test-suite...marks/SciMark2-C/scimark2.test    28.48     31.68       11.2%
Geomean difference                                                    nan%
       stats_opt  stats_master       diff
count  99.000000  101.000000    99.000000
mean   3.921419   4.036584      0.035598
std    4.557934   4.988885      0.071592
min    0.617100   0.635000     -0.184291
25%    1.575900   1.549400     -0.003149
50%    3.090300   3.072300      0.022620
75%    4.272200   4.313500      0.070729
max    28.946700  31.948900     0.308558

Compile time:

Tests: 198
Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    43.40     52.93       22.0%
test-suite...arks/mafft/pairlocalalign.test    26.05     30.58       17.4%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test     4.32      4.97       15.1%
test-suite...frame_layout/frame_layout.test     2.47      2.81       13.5%
test-suite...marks/7zip/7zip-benchmark.test   122.67    131.64        7.3%
test-suite...fice-ispell/office-ispell.test     3.99      4.28        7.2%
test-suite...earch/office-stringsearch.test     0.44      0.47        6.8%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        6.8%
test-suite...nsumer-lame/consumer-lame.test    11.69     12.47        6.6%
test-suite...math/automotive-basicmath.test     0.34      0.32       -6.3%
test-suite...lowfish/security-blowfish.test     0.64      0.68        6.2%
test-suite...comm-adpcm/telecomm-adpcm.test     0.15      0.16        6.2%
test-suite.../Prolangs-C++/ocean/ocean.test     0.38      0.35       -6.0%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.19      0.20        5.4%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test     2.84      2.69       -5.3%
Geomean difference                                                    0.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.349209    5.432042      0.004814
std    13.699984   14.112571     0.038243
min    0.053100    0.052200     -0.063188
25%    0.312425    0.314125     -0.025223
50%    1.604600    1.594650      0.001472
75%    3.347775    3.419850      0.026751
max    122.668900  131.644500    0.219606

Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better?

There may be some noise here so I'm going to try running the benchmarks a few more times to make sure they line up. I'm also going to add a statistic for noop stores.

fhahn added a comment.May 24 2020, 1:33 PM

Execution time:

Tests: 198
Short Running: 97 (filtered out)
Remaining: 101
Metric: exec_time

Program                                        stats_opt stats_master diff

test-suite...ks/Prolangs-C++/life/life.test     2.30      3.01       30.9%
test-suite...arks/mafft/pairlocalalign.test    12.66     15.31       20.9%
test-suite.../Benchmarks/Ptrdist/bc/bc.test     0.68      0.83       20.7%
test-suite...Source/Benchmarks/sim/sim.test     2.81      3.38       20.2%
test-suite...urce/Applications/aha/aha.test     1.25      1.02       -18.4%
test-suite...s/Ptrdist/anagram/anagram.test     0.62      0.72       16.9%
test-suite.../Benchmarks/nbench/nbench.test     1.32      1.55       16.9%
test-suite...ks/VersaBench/8b10b/8b10b.test     4.31      5.02       16.4%
test-suite...nchmarks/llubenchmark/llu.test     3.19      3.68       15.1%
test-suite.../Applications/lemon/lemon.test     0.87      0.75       -14.1%
test-suite...tions/lambda-0.1.3/lambda.test     3.50      3.04       -13.1%
test-suite.../VersaBench/ecbdes/ecbdes.test     1.81      2.02       11.7%
test-suite...netbench-url/netbench-url.test     2.97      3.31       11.3%
test-suite...hmarks/VersaBench/bmm/bmm.test     2.34      2.60       11.3%
test-suite...marks/SciMark2-C/scimark2.test    28.48     31.68       11.2%
Geomean difference                                                    nan%
       stats_opt  stats_master       diff
count  99.000000  101.000000    99.000000
mean   3.921419   4.036584      0.035598
std    4.557934   4.988885      0.071592
min    0.617100   0.635000     -0.184291
25%    1.575900   1.549400     -0.003149
50%    3.090300   3.072300      0.022620
75%    4.272200   4.313500      0.070729
max    28.946700  31.948900     0.308558

Compile time:

Tests: 198
Metric: compile_time

Program                                        stats_opt stats_master diff

test-suite...rks/tramp3d-v4/tramp3d-v4.test    43.40     52.93       22.0%
test-suite...arks/mafft/pairlocalalign.test    26.05     30.58       17.4%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test     4.32      4.97       15.1%
test-suite...frame_layout/frame_layout.test     2.47      2.81       13.5%
test-suite...marks/7zip/7zip-benchmark.test   122.67    131.64        7.3%
test-suite...fice-ispell/office-ispell.test     3.99      4.28        7.2%
test-suite...earch/office-stringsearch.test     0.44      0.47        6.8%
test-suite...netbench-crc/netbench-crc.test     0.20      0.21        6.8%
test-suite...nsumer-lame/consumer-lame.test    11.69     12.47        6.6%
test-suite...math/automotive-basicmath.test     0.34      0.32       -6.3%
test-suite...lowfish/security-blowfish.test     0.64      0.68        6.2%
test-suite...comm-adpcm/telecomm-adpcm.test     0.15      0.16        6.2%
test-suite.../Prolangs-C++/ocean/ocean.test     0.38      0.35       -6.0%
test-suite.../Trimaran/enc-pc1/enc-pc1.test     0.19      0.20        5.4%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test     2.84      2.69       -5.3%
Geomean difference                                                    0.4%
        stats_opt  stats_master        diff
count  198.000000  198.000000    198.000000
mean   5.349209    5.432042      0.004814
std    13.699984   14.112571     0.038243
min    0.053100    0.052200     -0.063188
25%    0.312425    0.314125     -0.025223
50%    1.604600    1.594650      0.001472
75%    3.347775    3.419850      0.026751
max    122.668900  131.644500    0.219606

Question: it would seem like the compile-time actually got a lot better (which is odd). Am I reading the data incorrectly? Shouldn't smaller numbers be better?

Runtimes/execution times depend a lot on the system and can be quite noisy. For comparable numbers you have to make sure that the system noise is reduced to a minimum (and you don't build & run the benchmarks in parallel).

I would suggest to focus on the statistics about removed stores and how the patch impacts them. With respect to compile-time, I don't think there's much to worry about with the patch in its current form..

Also, I recently found out that EarlyCSE also does some simple form of load-store forwarding (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/EarlyCSE.cpp#L1214). It might be worth consolidating the DSE logic in the DeadStoreElimination pass in the future, but that's a discussion for the future.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1847

IIUC the MemoryPHI handling should be optional, right? I would recommend splitting it off in a separate patch, as smaller patches make it generally easier to review/submit.

1861

shouldn't that be continue? (Test case would be a scenario where we first remove a no-op store and then some other stores by another def in State.MemDefs.

zoecarver marked 2 inline comments as done.May 26 2020, 9:09 AM

Thanks for the review.

I would suggest to focus on the statistics about removed stores and how the patch impacts them.

What do you think about adding a statistic for noop stores? Or should I increment NumRedundantStores or a different statistic? I think as-is this patch doesn't change any statistics.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1847

Yes, it's optional. I'll make this a separate patch.

1861

Yes, you're right.

zoecarver updated this revision to Diff 266390.May 26 2020, 6:20 PM
  • Rebase off master.
  • Continue instead of break.
  • Add noop store statistic.
  • Remove phi argument support

Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly).

Also, it looks like the test suit does not have any noop stores (either that or I implemented the NumNoopStores statistic incorrectly).

I think it may be that currently most of those stores are eliminated earlier by EarlyCSE (see earlier comment). Might be interesting to see what happens with MSSA disabled for EarlyCSE. (just to double check: MSSA based DSE is not enabled by default, so for your measurements you have to make sure it is turned on via -mllvm -enable-dse-memoryssa=true or by setting EnableMemorySSA)

I think it is almost ready, just a question about some of the test changes remaining.

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗(On Diff #266390)

Is there a reason for this change in the source (and some of the ones below)? I think we should keep the existing tests as is and add additional cases as required.

zoecarver marked an inline comment as done.May 28 2020, 8:39 PM
zoecarver added inline comments.
llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗(On Diff #266390)

The load and store here would be removed. I wasn't sure if the noop store was inadvertent and wanted to keep the test as similar as possible to what I thought it was testing (that neither store is removed). If you'd rather I can remove the changes and update the CHECK comments instead.

Might be interesting to see what happens with MSSA disabled for EarlyCSE.

I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded.

fhahn added a comment.May 29 2020, 3:01 AM

Might be interesting to see what happens with MSSA disabled for EarlyCSE.

I passed CXXFLAGS="-mllvm -enable-dse-memoryssa=true -mllvm -earlycse-mssa-optimization-cap=0" to cmake and re-built and ran the tests. It doesn't look like that had any impact, though. I still get Unknown metric 'dse.NumNoopStores' which I assume means none were recorded.

Hm I think cmake might not pick up CXXFLAGS, if it's not the initial run. Probably better to pass -DCMAKE_C_FLAGS=/-DCMAKE_CXX_FLAGS=. Or just flip the default for EnableMemorySSA (which I usually do :P) and rebuild clang/libLTO.

To verify, you can check dse.NumFastStores. There should be a lot of changes with/without MemorySSA. For example, with -O3 -flto:

~/projects/test-suites/test-suite/utils/compare.py --filter-hash -m dse.NumFastStores  base.json mssa.json
Tests: 237
Same hash: 104 (filtered out)
Remaining: 133
Metric: dse.NumFastStores

Program                                        base   mssa   diff
 test-suite...-typeset/consumer-typeset.test   162.00 891.00 450.0%
 test-suite...INT2000/164.gzip/164.gzip.test    15.00  67.00 346.7%
 test-suite...ks/Prolangs-C/agrep/agrep.test     2.00   8.00 300.0%
 test-suite...000/183.equake/183.equake.test    10.00  40.00 300.0%
 test-suite...lications/sqlite3/sqlite3.test    22.00  76.00 245.5%
 test-suite...CFP2006/433.milc/433.milc.test    64.00 195.00 204.7%
 test-suite...000/197.parser/197.parser.test     3.00   9.00 200.0%
 test-suite...chmarks/Olden/power/power.test    11.00  33.00 200.0%
 test-suite...lications/ClamAV/clamscan.test    72.00 210.00 191.7%
 test-suite.../Applications/SPASS/SPASS.test    48.00 139.00 189.6%
 test-suite...T2006/445.gobmk/445.gobmk.test    19.00  52.00 173.7%
 test-suite...Source/Benchmarks/sim/sim.test     2.00   5.00 150.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test     9.00  22.00 144.4%
 test-suite...ks/McCat/12-IOtest/iotest.test     6.00  14.00 133.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    12.00  27.00 125.0%
 Geomean difference                                           nan%

For noop stores, it looks like there are the following nook-store changes for the configuration above:

test-suite...:: External/Povray/povray.test    NaN   1.00  nan%
test-suite...CFP2000/177.mesa/177.mesa.test    NaN   1.00  nan%
test-suite...000/183.equake/183.equake.test    NaN  11.00  nan%
test-suite...CFP2006/433.milc/433.milc.test    NaN   1.00  nan%
test-suite...006/447.dealII/447.dealII.test    NaN   2.00  nan%
test-suite...006/453.povray/453.povray.test    NaN  13.00  nan%
test-suite...6/482.sphinx3/482.sphinx3.test    NaN   1.00  nan%
test-suite.../CINT2000/176.gcc/176.gcc.test    NaN   6.00  nan%
test-suite...0/253.perlbmk/253.perlbmk.test    NaN  21.00  nan%
test-suite...0.perlbench/400.perlbench.test    NaN  24.00  nan%
test-suite...T2006/445.gobmk/445.gobmk.test    NaN  25.00  nan%
test-suite...lications/ClamAV/clamscan.test    NaN   2.00  nan%
test-suite.../Applications/SPASS/SPASS.test    NaN   2.00  nan%
test-suite...pplications/oggenc/oggenc.test    NaN   2.00  nan%
test-suite...s/MallocBench/cfrac/cfrac.test    NaN   6.00  nan%
test-suite...chmarks/MallocBench/gs/gs.test    NaN   1.00  nan%
test-suite...TimberWolfMC/timberwolfmc.test    NaN   1.00  nan%
test-suite.../Prolangs-C/loader/loader.test    NaN   1.00  nan%

As mentioned earlier, I think this looks good, once the comments for the tests are addressed :)

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll
34 ↗(On Diff #266390)

I don't think the current version of the patch would remove anything for this test case (and likely the other modified ones). %g_addr and @g may alias I think, so the defining access of store i32 %g_value, i32* %g_addr, align 4 would be store i32 -1, i32* @g, align 4 and the defining access of %g_value = load i32, i32* %g_addr, align 4 would be entry.

Most existing tests intentionally cover cases to exercise various code paths/guard against problematic scenarios encountered in the past. Therefor it would be good to preserve the original input and update the check lines, if there are any changes (those can then be checked during the review).

I am not sure if we could remove anything here, as it would require proving it is safe to remove them in all possible alias scenarios for %g_addr and @g. If they don't alias, we can remove the store to g_addr. If they alias exactly (= complete overwrite), we could remove store i32 %g_value, i32* %g_addr, align 4, but only if we also remove store i32 -1, i32* @g, align 4. But we cannot remove that store if they don't alias.

627 ↗(On Diff #266390)

Might be good to move the new tests for coop-stores into a separate test file (and please remove the tests from simple-todo.ll)

  • Update tests based on review

@fhahn I really appreciate all your help with this patch. The tests are updated.


Hm I think cmake might not pick up CXXFLAGS...

I was passing the flags to the llvm-test-suite build but, I also tried when building the mono-repo. I tried both with CXXFLAGS and with DCMAKE_C_FLAGS as suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both) but I still get the same error when running utils/compare.py. I'll take another look at the docs and keep playing around with it to see if I can get it working tomorrow.

fhahn accepted this revision.May 30 2020, 4:33 AM

LGTM, thanks!

This should only remove stores which we couldn't kill other stores, as the load would block the store killing any other stores.

Hm I think cmake might not pick up CXXFLAGS...

suggested and made sure to export EnableMemorySSA=1 (and even tried a clean build of both)

Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;)

This revision is now accepted and ready to land.May 30 2020, 4:33 AM

Oh right, I meant flipping the cl::init value of the variable https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L100 and rebuilding ;)

Ah, there we go! That works now :)

This revision was automatically updated to reflect the committed changes.