Page MenuHomePhabricator

[DFAJumpThreading] Relax analysis to handle unpredictable initial values
ClosedPublic

Authored by alexey.zhikhar on Apr 25 2022, 8:30 AM.

Details

Summary

Responding to a feature request from the Rust community:

https://github.com/rust-lang/rust/issues/80630

void foo(X) {
  for (...)
    switch (X)
      case A
        X = B
      case B
        X = C
}

Even though the initial switch value is non-constant, the switch statement can still be threaded:
the initial value will hit the switch statement but the rest of the state changes will proceed by
jumping unconditionally.

The early predictability check is relaxed to allow unpredictable values anywhere, but later,
after the paths through the switch statement have been enumerated, no non-constant state
values are allowed along the paths. Any state value not along a path will be an initial switch value,
which can be safely ignored.

Diff Detail

Event Timeline

alexey.zhikhar created this revision.Apr 25 2022, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 8:30 AM
alexey.zhikhar requested review of this revision.Apr 25 2022, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 8:30 AM
alexey.zhikhar edited the summary of this revision. (Show Details)Apr 25 2022, 8:31 AM

Code size increase is very small.

Tests: 3887
Metric: size..text

Program                                        results-base results-init diff

test-suite...TimberWolfMC/timberwolfmc.test   215116.00    216284.00     0.5%
test-suite...lications/ClamAV/clamscan.test   485520.00    485968.00     0.1%
test-suite...marks/7zip/7zip-benchmark.test   895972.00    896396.00     0.0%
test-suite...lications/sqlite3/sqlite3.test   464044.00    464052.00     0.0%
test-suite...te/GCC-C-execute-920618-1.test   420.00       420.00        0.0%
test-suite...te/GCC-C-execute-920501-5.test   508.00       508.00        0.0%
test-suite...te/GCC-C-execute-920501-6.test   972.00       972.00        0.0%
test-suite...te/GCC-C-execute-920501-8.test   1484.00      1484.00       0.0%
test-suite...te/GCC-C-execute-920501-9.test   836.00       836.00        0.0%
test-suite...te/GCC-C-execute-920506-1.test   436.00       436.00        0.0%
test-suite...te/GCC-C-execute-920520-1.test   436.00       436.00        0.0%
test-suite...te/GCC-C-execute-920603-1.test   444.00       444.00        0.0%
test-suite...te/GCC-C-execute-920604-1.test   428.00       428.00        0.0%
test-suite...te/GCC-C-execute-920612-1.test   436.00       436.00        0.0%
test-suite...simd_ops_test_op_abs_1026.test   27548.00     27548.00      0.0%
Geomean difference                                                       nan%
       results-base  results-init          diff
count  3.412000e+03  3.412000e+03  3.412000e+03
mean   1.749191e+04  1.749155e+04  3.031513e-08
std    4.639147e+04  4.638708e+04  1.444742e-04
min    4.120000e+02  4.120000e+02 -6.363928e-03
25%    4.840000e+02  4.840000e+02  0.000000e+00
50%    2.012000e+03  2.012000e+03  0.000000e+00
75%    2.768600e+04  2.768600e+04  0.000000e+00
max    1.029404e+06  1.029404e+06  5.429629e-03

I'm getting compile time numbers, will submit them soon. We're also planning on adding more test cases before the patch is merged.

Limit the number of paths explored to contain compile time.

Compare the number of switches threaded before and after this patch.

BMBeforeAfter
MultiSource/Applications/ClamAV/CMakeFiles/clamscan.dir/libclamav\_htmlnorm.stats01
MultiSource/Applications/ClamAV/CMakeFiles/clamscan.dir/libclamav\_message.stats33
MultiSource/Applications/ClamAV/CMakeFiles/clamscan.dir/libclamav\_nsis\_LZMADecode.stats01
MultiSource/Applications/ClamAV/CMakeFiles/clamscan.dir/libclamav\_nsis\_bzlib.stats01
MultiSource/Applications/ClamAV/CMakeFiles/clamscan.dir/libclamav\_untar.stats11
MultiSource/Applications/SPASS/CMakeFiles/SPASS.dir/dfgscanner.stats11
MultiSource/Applications/SPASS/CMakeFiles/SPASS.dir/iascanner.stats11
MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/kimwl.stats11
MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/kimwy.stats01
MultiSource/Applications/lua/CMakeFiles/lua.dir/lvm.stats11
MultiSource/Applications/obsequi/CMakeFiles/Obsequi.dir/does\_x\_win.stats11
MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.stats77
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/C/BwtSort.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/C/Lzma2Dec.stats01
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/C/LzmaEnc.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/C/XzDec.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/7z/7zUpdate.stats22
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Cab/CabHandler.stats33
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/GzHandler.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Tar/TarHandler.stats12
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Zip/ZipHandler.stats01
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Zip/ZipHandlerOut.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Zip/ZipIn.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Archive/Zip/ZipUpdate.stats22
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/Compress/ShrinkDecoder.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/EnumDirItems.stats01
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/Extract.stats12
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/OpenArchive.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/Update.stats11
MultiSource/Benchmarks/7zip/CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Console/List.stats11
MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/Cmd.stats11
MultiSource/Benchmarks/MallocBench/gs/CMakeFiles/gs.dir/interp.stats11
MultiSource/Benchmarks/McCat/01-qbsort/CMakeFiles/qbsort.dir/readlist.stats01
MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jddctmgr.stats11
MultiSource/Benchmarks/MiBench/consumer-typeset/CMakeFiles/consumer-typeset.dir/z12.stats01
MultiSource/Benchmarks/MiBench/consumer-typeset/CMakeFiles/consumer-typeset.dir/z14.stats01
MultiSource/Benchmarks/MiBench/consumer-typeset/CMakeFiles/consumer-typeset.dir/z36.stats11
MultiSource/Benchmarks/MiBench/consumer-typeset/CMakeFiles/consumer-typeset.dir/z38.stats11
MultiSource/Benchmarks/MiBench/consumer-typeset/CMakeFiles/consumer-typeset.dir/z49.stats11
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/genorient.stats01
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/parser.stats01
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/unbust.stats11
MultiSource/Benchmarks/Prolangs-C/football/CMakeFiles/football.dir/sort.stats11
MultiSource/Benchmarks/Ptrdist/bc/CMakeFiles/bc.dir/scan.stats11
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/CMakeFiles/cjpeg.dir/jddctmgr.stats11
4559
+14

No noticeable compile time impact.

Tests: 10
Metric: compile_time

Program                                        compile_time
                                              base-ct      limit200-ct diff
kimwitu++/kc                                   25.65        25.83       0.7%
sqlite3/sqlite3                                29.09        29.21       0.4%
SPASS/SPASS                                    26.73        26.82       0.4%
ClamAV/clamscan                                29.22        29.21      -0.0%
Bullet/bullet                                  51.96        51.91      -0.1%
consumer-typeset/consumer-typeset              19.90        19.86      -0.2%
lencod/lencod                                  36.56        36.41      -0.4%
mafft/pairlocalalign                           19.60        19.47      -0.7%
7zip/7zip-benchmark                            79.28        78.62      -0.8%
tramp3d-v4/tramp3d-v4                          58.38        56.99      -2.4%
Geomean difference                               nan          nan      -0.3%
      compile_time
run        base-ct limit200-ct       diff
count  10.000000    10.000000   10.000000
mean   37.636450    37.432840  -0.003150
std    19.486189    19.158558   0.008754
min    19.604300    19.467300  -0.023786
25%    25.916200    26.077900  -0.006306
50%    29.154400    29.212200  -0.001405
75%    48.110275    48.032550   0.002560
max    79.283200    78.622400   0.007186

@SjoerdMeijer @jaykang10

Hi Sjoerd and JinGu, I remember from your comments on the initial DFAJumpThreading patch, that you guys are interested in the opportunities in SPEC2017 perlbench. I wanted to let you know that this patch introduces a limit on the number of threaded paths per switch. The limit is 200 paths, that means that if a switch has 1,000 potential path candidates, only 200 of them will get threaded, the rest of them will circle back to the switch instruction the usual way. The thing is that perlbench has several switches optimized by DFAJumpThreading, and a couple of them have the number of paths above the limit (in the thousands). Due to the limit, these switches are threaded partially, which might lead to performance regressions.

Initially I wanted to check whether this patch leads to a regression in perlbench myself but I don't see any performance gains from DFAJumpThreading on perlbench even before introducing the limit, so I don't know which switches you guys are interested in. To be sure that this patch doesn't slow down perlbench for you, please experiment with it on your end, thanks.

amehsan accepted this revision.May 25 2022, 12:36 PM

LGTM. I have looked at the code here, but before that we have discussed the approach here and also alternatives with Alex. Since we haven't received any comments from other reviewers, I suggest we merge it.

This revision is now accepted and ready to land.May 25 2022, 12:36 PM