Page MenuHomePhabricator

[MachineSinking] sink more profitable loads
Needs ReviewPublic

Authored by shchenz on Aug 31 2020, 2:11 AM.

Details

Reviewers
MatzeB
echristo
efriedma
qcolombet
Group Reviewers
Restricted Project
Summary

Instead of setting Store to true conservatively, handle some cases where Store can be determined by some analysis if MI.parent() dom SuccToSinkTo and SuccToSinkTo post dom MI.parent().

Thus we can get more profitable loads to be sunk.

Diff Detail

Unit TestsFailed

TimeTest
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp

Event Timeline

shchenz created this revision.Aug 31 2020, 2:11 AM
shchenz requested review of this revision.Aug 31 2020, 2:11 AM
shchenz edited the summary of this revision. (Show Details)Aug 31 2020, 5:06 AM
shchenz updated this revision to Diff 289112.Sep 1 2020, 2:57 AM

handle further more cases to sink profitable loads

Do we need some threshold to limit the analysis? I'm worried this could get very expensive.

llvm/lib/CodeGen/MachineSink.cpp
942

Are the post-dominance checks necessary? I'm not sure what invariants you're trying to enforce.

shchenz updated this revision to Diff 289389.Sep 2 2020, 3:46 AM

1: can not check alias for call instruction by using mayAlias
2: add one more cache to save compiling time.

Do we need some threshold to limit the analysis? I'm worried this could get very expensive.

I add a new cache to save compiling time. Could you be more specific on how to add the threshold? Thanks.

llvm/lib/CodeGen/MachineSink.cpp
942

they are not necessary. I add the post-dominance check for the convenience of finding all blocks which are in the path from block From to To.
With these checks, it is easy to judge if a block BB(reachable from From) is in the path from From to To. Just check PDT->dominates(To, BB))
`

lenary removed a subscriber: lenary.Sep 2 2020, 3:57 AM

gentle ping

Hi @efriedma,

Given you've already taken a look, can you do the review or do you want me to step in?

Cheers,
-Quentin

I'd appreciate it if you'd step in, Quentin. I have a backlog of reviews, and I still haven't quite wrapped my head around the use of post-dominance here.

shchenz added a comment.EditedWed, Sep 30, 6:58 PM

@efriedma @qcolombet haha, thanks for your review. Please take your time. ^-^ . I want to make the alias check simple(find all basic blocks in all paths from block From to block To), so I guard the dominance between From and To as From dominates To and To post dominates From. This will make us still miss some cases in which load instruction can be sunk. But it can make the complexity simple.

gentle ping ^^

Hi,

Looks reasonable but I'd like to have an idea of the compile time impact of this patch.

Make sure to fix the Lint comments as well.

Cheers,
-Quentin

llvm/lib/CodeGen/MachineSink.cpp
347

Would it make sense to preserve the cache as long as we don't move stores, calls and loads with ordered memory ref?

I'm worried about the compile time impact of this patch so I am wondering if we want to be smarter about invalidating the cache.

965

While we traverse BB, should we create an entry for the pair (To, BB).

Put differently, how is the compile time impact of this patch and do we have to do more to avoid computations?

shchenz updated this revision to Diff 299890.Thu, Oct 22, 1:46 AM

1: clear the cache just before function returns
2: cache handled bb while calling depth_first() for a block

Thanks very much for your review. @qcolombet Update accordingly. I collected some compiling time numbers for llvm test-suite. No obivious degradation found.

llvm/lib/CodeGen/MachineSink.cpp
347

Good idea. But seems we still need to clear the cache before we return from MachineSinking::runOnMachineFunction. (MachineSink instance is shared with different callings to MachineSinking::runOnMachineFunction for different functions), otherwise I meet some weird memory issue.

965

I collected some data from PowerPC, seems the compiling time difference is not obivious in llvm test-suite. the tool hides some small diff tests. Among them, I saw some tests takes 10s and some of them takes more than 30s, but no big reg for them. The biggest diff 9.6% and 7.4%, compile time are around 0.1s. And these two degradations can not reproduciable in other runs.

./utils/compare.py -m compile_time base.json fix.json  
Tests: 309
Metric: compile_time

Program                                        base  fix   diff 
                                                                  
 test-suite...enchmarks/Stanford/Queens.test       9.6%
 test-suite...d-warshall/floyd-warshall.test       7.4%
 test-suite...ImageProcessing/Blur/blur.test      -4.7%
 test-suite...SubsetBRawLoops/lcalsBRaw.test       2.7%
 test-suite...math/automotive-basicmath.test     -2.5%
 test-suite.../Benchmarks/Stanford/Perm.test      -2.4%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test     -2.2%
 test-suite...ow-dbl/GlobalDataFlow-dbl.test      -2.2%
 test-suite...enchmarks/Stanford/RealMM.test      -2.0%
 test-suite...ctions-flt/Reductions-flt.test      -1.8%
 test-suite...t/StatementReordering-flt.test      -1.7%
 test-suite.../Trimaran/enc-rc4/enc-rc4.test      -1.7%
 test-suite...l/StatementReordering-dbl.test      1.6%
 test-suite...pansion-dbl/Expansion-dbl.test      -1.4%
 test-suite...bl/IndirectAddressing-dbl.test      -1.2%
 Geomean difference                                           nan%
            base        fix        diff
count  309.000000  309.000000  277.000000
mean   2.696964    2.685603   -0.003232  
std    6.283106    6.251353    0.009456  
min    0.000000    0.000000   -0.047321  
25%    0.077944    0.077862   -0.005599  
50%    0.291912    0.289780   -0.003404  
75%    2.475318    2.478657   -0.001450  
max    64.257411   64.088470   0.095601
shchenz updated this revision to Diff 299903.Thu, Oct 22, 2:43 AM

1: address Lint suggestion

shchenz added a subscriber: nikic.EditedFri, Oct 23, 7:07 AM

Ah, I used @nikic perf testing tool for this patch on the X86 target(?). (Thanks for this great tool ^-^) Here is the result of the compiling time: https://llvm-compile-time-tracker.com/compare.php?from=2d25004a137223a02aa06e8bfd512a648f3b3f94&to=abf39366779bed8b9f2d276cd199c3b78471e3b1&stat=instructions

GEO degradates about 0.02% and no obivious outlier. I guess the compiling time increase on these benchmarks should be acceptable?