This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Support looking through memory phis at end of function.
ClosedPublic

Authored by fhahn on Aug 22 2022, 3:06 AM.

Details

Summary

Update isWriteAtEndOfFunction to look through MemoryPhis. The reason
MemoryPhis were skipped so far was the known AliasAnalysis issue with it
missing loop-carried dependences.

This problem is already addressed in other parts of the code by skipping
MemoryDefs that may be in difference loops. I think the same logic can
be applied here.

This can have a substantial impact on the number of stores removed in
some cases. For MultiSource/SPEC2006/SPEC2017 with -O3:

Metric: dse.NumFastStores

Program                                       dse.NumFastStores
                                              base              patch   diff
External/S...CINT2017rate/557.xz_r/557.xz_r     14.00             45.00 221.4%
External/S...te/538.imagick_r/538.imagick_r    439.00           1267.00 188.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4      6.00             15.00 150.0%
MultiSourc...Prolangs-C/simulator/simulator      3.00              7.00 133.3%
MultiSource/Applications/siod/siod               3.00              7.00 133.3%
MultiSourc...arks/FreeBench/distray/distray      6.00              9.00  50.0%
MultiSourc...e/Applications/obsequi/Obsequi     22.00             30.00  36.4%
MultiSource/Benchmarks/Ptrdist/bc/bc            23.00             28.00  21.7%
External/S...NT2017rate/502.gcc_r/502.gcc_r   1258.00           1512.00  20.2%
External/S...te/520.omnetpp_r/520.omnetpp_r    954.00           1143.00  19.8%
External/S...rate/510.parest_r/510.parest_r   5961.00           7122.00  19.5%
External/S...C/CINT2006/445.gobmk/445.gobmk     47.00             56.00  19.1%
External/S...00.perlbench_r/500.perlbench_r    241.00            286.00  18.7%
External/S...NT2006/471.omnetpp/471.omnetpp     36.00             42.00  16.7%
External/S...06/400.perlbench/400.perlbench    183.00            210.00  14.8%
MultiSource/Applications/SPASS/SPASS            72.00             81.00  12.5%
External/S...17rate/541.leela_r/541.leela_r     72.00             80.00  11.1%
External/SPEC/CINT2006/403.gcc/403.gcc         585.00            642.00   9.7%
MultiSourc...e/Applications/sqlite3/sqlite3    120.00            131.00   9.2%
MultiSourc...Applications/hexxagon/hexxagon     11.00             12.00   9.1%
External/S.../CFP2006/453.povray/453.povray    566.00            615.00   8.7%
External/S...rate/511.povray_r/511.povray_r    578.00            627.00   8.5%
External/S...FP2006/482.sphinx3/482.sphinx3     12.00             13.00   8.3%
MultiSource/Applications/oggenc/oggenc         130.00            140.00   7.7%
MultiSourc...e/Applications/ClamAV/clamscan    250.00            268.00   7.2%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     19.00             20.00   5.3%
MultiSourc...ch/consumer-jpeg/consumer-jpeg     19.00             20.00   5.3%
External/S...te/526.blender_r/526.blender_r   3747.00           3928.00   4.8%
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE    104.00            108.00   3.8%
MultiSourc...ch/consumer-lame/consumer-lame     54.00             56.00   3.7%
MultiSource/Benchmarks/Bullet/bullet          1222.00           1264.00   3.4%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    973.00           1005.00   3.3%
External/S.../CFP2006/447.dealII/447.dealII   2699.00           2780.00   3.0%
External/S...06/483.xalancbmk/483.xalancbmk    788.00            810.00   2.8%
External/S.../CFP2006/450.soplex/450.soplex    180.00            185.00   2.8%
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR    338.00            345.00   2.1%
MultiSourc...Benchmarks/7zip/7zip-benchmark    685.00            699.00   2.0%
External/S...FP2017rate/544.nab_r/544.nab_r    158.00            160.00   1.3%
MultiSourc...sumer-typeset/consumer-typeset    772.00            781.00   1.2%
External/S...2017rate/525.x264_r/525.x264_r    410.00            414.00   1.0%
External/S...23.xalancbmk_r/523.xalancbmk_r    998.00           1002.00   0.4%

Compile-time is almost neutral:

https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions

NewPM-O3: +0.03%
NewPM-ReleaseThinLTO: -0.01%
NewPM-ReleaseLTO-g: +0.03%

Diff Detail

Event Timeline

fhahn created this revision.Aug 22 2022, 3:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 3:06 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn requested review of this revision.Aug 22 2022, 3:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 3:06 AM
fhahn updated this revision to Diff 454449.Aug 22 2022, 4:04 AM

Update memcpy.ll test.

fhahn edited the summary of this revision. (Show Details)Aug 22 2022, 4:48 AM

pardon my ignorance, but could you give an example of something where naively adding all MemoryPhi users misses something but using isGuaranteedLoopIndependent works around that?

nikic added inline comments.Aug 29 2022, 3:55 AM
llvm/test/Transforms/DeadStoreElimination/multiblock-loop-carried-dependence.ll
9 ↗(On Diff #454449)

Comment is confusing because there is no %l.2.

34 ↗(On Diff #454449)

Isn't this exactly the miscompile we want to prevent? The store is to A[iv.2 + 1], which will be read as A[iv.2] on the next iteration?

llvm/test/Transforms/DeadStoreElimination/phi-translation.ll
159

Drop TODO

fhahn updated this revision to Diff 456368.Aug 29 2022, 9:08 AM
fhahn edited the summary of this revision. (Show Details)

Update to use isGuaranteedLoopInvariant instead of isGuaranteedLoopIndependent. isGuaranteedLoopIndependent is too loose here because we are not looking for must-alias (as we do for elimination candidates). Instead we must ensure we do not miss any potential reads.

pardon my ignorance, but could you give an example of something where naively adding all MemoryPhi users misses something but using isGuaranteedLoopIndependent works around that?

The tests in multiblock-loop-carried-dependence.ll should provide such cases, e.g. @test.1 which was incorrectly optimized by the earlier version of the test.

fhahn marked 3 inline comments as done.Aug 29 2022, 9:09 AM
fhahn added inline comments.
llvm/test/Transforms/DeadStoreElimination/multiblock-loop-carried-dependence.ll
9 ↗(On Diff #454449)

thanks, the comment should be updated in 197332a1f80db7cbe2bb43d3f5bd45282245dd28

34 ↗(On Diff #454449)

Yeah I missed that in the original version! Updated to use isGuaranteedLoopInvariant

llvm/test/Transforms/DeadStoreElimination/phi-translation.ll
159

Thanks, should be removed in the latest version.

fhahn marked 3 inline comments as done.Aug 29 2022, 10:54 AM

(The shared stats in the description stay effectively the same after the recent update)

asbirlea accepted this revision.Aug 29 2022, 12:12 PM

Thank you for adding this. The diff looks good to me.
The stats and compile-time look very good. Tested for run time performance impact and can't see noticeable improvement.
Overall LGTM from my side (with one inline comment).

llvm/test/Transforms/DeadStoreElimination/phi-translation.ll
159

The TODO is unrelated to this patch and should stay.
The store in the %entry block could also be removed. It's writing to an alloca that it's either never used later in this function or overwritten and then used in the call. Either way it's a dead store.

This revision is now accepted and ready to land.Aug 29 2022, 12:12 PM
This revision was landed with ongoing or failed builds.Aug 30 2022, 5:28 AM
This revision was automatically updated to reflect the committed changes.