This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Use EarliestEscapeInfo instead of callCapturesBefore.
AbandonedPublic

Authored by fhahn on Sep 27 2021, 6:07 AM.

Details

Reviewers
asbirlea
nikic
Summary

D110368 exposed 'earliest escape' handling in BatchAA. We can use this
via BatchAA instead of the single call of callCapturesBefore in
MemCpyOpt to improve the results in most cases.

On SPEC2006/SPEC2017/MultiSource we get the following changes:

Metric: memcpyopt.NumCallSlot

test-suite.../CINT2006/403.gcc/403.gcc.test    17.00  26.00       52.9%
test-suite...510.parest_r/510.parest_r.test   718.00 773.00        7.7%
test-suite...006/447.dealII/447.dealII.test    12.00  12.00        0.0%
test-suite...yApps-C++/PENNANT/PENNANT.test     9.00   9.00        0.0%
test-suite...speed/602.gcc_s/602.gcc_s.test    15.00  14.00       -6.7%
test-suite...7rate/502.gcc_r/502.gcc_r.test    15.00  14.00       -6.7%

Metric: memcpyopt.NumMemCpyInstr
test-suite.../CINT2006/403.gcc/403.gcc.test     47.00   62.00      31.9%
test-suite...510.parest_r/510.parest_r.test   1600.00 1654.00       3.4%
test-suite...006/447.dealII/447.dealII.test    108.00  108.00       0.0%
test-suite...marks/7zip/7zip-benchmark.test     10.00   10.00       0.0%
test-suite...yApps-C++/PENNANT/PENNANT.test     13.00   13.00       0.0%
test-suite...speed/602.gcc_s/602.gcc_s.test     91.00   90.00      -1.1%
test-suite...7rate/502.gcc_r/502.gcc_r.test     91.00   90.00      -1.1%

Metric: memcpyopt.NumMoveToCpy
test-suite...marks/7zip/7zip-benchmark.test     4.00  19.00       375.0%
test-suite...006/447.dealII/447.dealII.test   122.00 150.00        23.0%
test-suite...510.parest_r/510.parest_r.test   322.00 394.00        22.4%
test-suite...speed/602.gcc_s/602.gcc_s.test      NaN    NaN          NaN
test-suite...7rate/502.gcc_r/502.gcc_r.test      NaN    NaN          NaN
test-suite.../CINT2006/403.gcc/403.gcc.test      NaN    NaN          NaN
test-suite...yApps-C++/PENNANT/PENNANT.test      NaN   4.00          NaN

No changes for memcpyopt.NumMemSetInfer

Compile-time impact

NewPM-O3: +0.02 (outlier mafft +0.31%)
NewPM-ReleaseThinLTO: +0.01% (no notable outliers)
NewPM-ReleaseLTO-g: +0.03% (no notable outliers)

Diff Detail

Event Timeline

fhahn created this revision.Sep 27 2021, 6:07 AM
fhahn requested review of this revision.Sep 27 2021, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2021, 6:07 AM
nikic requested changes to this revision.Sep 27 2021, 6:36 AM

The global use of BatchAA in DSE is based on the justification that DSE does not add new instructions while BatchAA is in use -- this means that even though there may be stale entries in the alias cache (stale in the sense of having the location pointing to a deleted instruction), these will not get accidentally reused by newly allocated instructions.

I don't believe that this logic holds up for MemCpyOpt, which does create new instructions. I also suspect that it may modify existing instructions in ways that could render cached alias information invalid (not sure on that though).

This revision now requires changes to proceed.Sep 27 2021, 6:36 AM

+1 on Nikita's comment that BatchAA cannot be used with a single instance for MemCpyOpt.

However, perhaps it's worth considering using it on the call paths when no memory instructions are created, and at least some of the improved results will remain.
The question is what's the cost of creating a BatchAA instance every time one of the calls in iterateOnFunction returns true (either all those adding instructions, or change all APIs to return a {MadeChange, AddedMemInstr} pair).

I'd compare the cost of the extra memory for keeping the EI and BAA alongside AA, plus compile for creating the BAA instance on every mem instruction insert, with the improved metrics obtained, and see if it's worth pursuing.

fhahn abandoned this revision.Nov 3 2021, 6:23 AM

Thanks for raising the issue @nikic & @asbirlea!

Unfortunately I do not really have time to dig deeper here for now. I don't think trying to invalidate BatchAA/EI would be worth the effort, but trying @asbirlea seems interesting. I might pick that up some time in the future. If anyone is interested in picking this up in the meantime that would be great as well :)