Page MenuHomePhabricator

[JumpThreading] Reduce duplicate threshold at Oz
ClosedPublic

Authored by samparker on Jan 17 2020, 4:58 AM.

Details

Summary

Reducing the number of instructions duplicated can be good for reducing code size.

Diff Detail

Event Timeline

samparker created this revision.Jan 17 2020, 4:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2020, 4:58 AM

I don't know enough about jump-threading to approve, but if this is correct, then we should keep the new pass manager in sync with this change?

It has existing checks like this in PassBuilder.cpp:

// Rotate Loop - disable header duplication at -Oz
LPM1.addPass(LoopRotatePass(Level != OptimizationLevel::Oz));

I feel that this should be an decision made inside the pass, not necessarily in the pass manager. We may have a single module with both functions that are and are not optsize. The inliner/unroller would change thresholds based on this, for example.

Also do you have some size numbers? (Jump threading can sometimes simplify code more than you would expect, although I think in theory this should be a sensible change.)

kazu added a comment.Jan 17 2020, 8:46 AM

I echo what other have said.

  • I'm curious about numbers.
  • The new pass manager should be kept in sync.
  • Jump threading could reduce code size in certain cases. If we have:

    if (cond1) { A; } else { B; } if (cond1) { C; } else { D; }

    Then jump threading could eliminate the merge point (that is, the second "if") and produce:

    if (cond1) { A; C; } else { B; D; }

    It should just be matter of adjusting the cost calculation to disallow size growth.
  • That said, I'm OK with this change if numbers justify. We can always revisit this pass and look into jump threading opportunities with size in mind.

Thanks, you both are indeed right that it can reduce code size... I've now tried reducing BBDuplicateThreshold and it looks like halving this to 3 is the best option (for aarch64 at least). There's a couple of outliers, but most changes are -0.1%:
test-suite...otout/Shootout-nestedloop.test 484 468 -3.3%
test-suite...+/Shootout-C++-nestedloop.test 532 516 -3.0%
test-suite...Olden/perimeter/perimeter.test 2108 2148 1.9%
test-suite...urce/Applications/aha/aha.test 2964 2932 -1.1%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 53748 53964 0.4%
test-suite...ch/g721/g721encode/encode.test 5468 5452 -0.3%
test-suite...marks/Ptrdist/yacr2/yacr2.test 15204 15236 0.2%
test-suite...ications/JM/ldecod/ldecod.test 161116 160788 -0.2%
test-suite...marks/7zip/7zip-benchmark.test 469784 470728 0.2%
test-suite...chmarks/MallocBench/gs/gs.test 104228 104428 0.2%
test-suite...oxyApps-C/miniAMR/miniAMR.test 34260 34196 -0.2%
test-suite...ications/JM/lencod/lencod.test 380324 380996 0.2%
test-suite...plications/d/make_dparser.test 62816 62912 0.2%
test-suite...s-C/Pathfinder/PathFinder.test 15996 15972 -0.2%
test-suite...ks/Prolangs-C++/city/city.test 5428 5436 0.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test 27300 27260 -0.1%
test-suite...pplications/oggenc/oggenc.test 105776 105632 -0.1%
test-suite...-typeset/consumer-typeset.test 327944 327520 -0.1%
test-suite...lications/sqlite3/sqlite3.test 244704 244400 -0.1%
test-suite...tions/lambda-0.1.3/lambda.test 20092 20068 -0.1%
test-suite.../Prolangs-C/bison/mybison.test 35828 35788 -0.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 29916 29884 -0.1%
test-suite.../Applications/SPASS/SPASS.test 287832 288136 0.1%
test-suite...arching-flt/Searching-flt.test 8292 8284 -0.1%
test-suite...arching-dbl/Searching-dbl.test 8500 8492 -0.1%
test-suite...C/Packing-flt/Packing-flt.test 8548 8540 -0.1%
test-suite...ences-flt/Recurrences-flt.test 8644 8636 -0.1%
test-suite...t/StatementReordering-flt.test 8660 8652 -0.1%
test-suite...encode/alacconvert-encode.test 26044 26020 -0.1%
test-suite...decode/alacconvert-decode.test 26044 26020 -0.1%
test-suite...C/Packing-dbl/Packing-dbl.test 8756 8748 -0.1%
test-suite...lications/SIBsim4/SIBsim4.test 26476 26500 0.1%
test-suite...ences-dbl/Recurrences-dbl.test 8852 8844 -0.1%
test-suite...l/StatementReordering-dbl.test 8868 8860 -0.1%
test-suite...ing-flt/LoopRerolling-flt.test 9116 9108 -0.1%
test-suite...ing-flt/Equivalencing-flt.test 9268 9260 -0.1%
test-suite...ing-flt/NodeSplitting-flt.test 9276 9268 -0.1%
test-suite...ing-dbl/LoopRerolling-dbl.test 9340 9332 -0.1%
test-suite...mbolics-flt/Symbolics-flt.test 9364 9356 -0.1%
test-suite...ing-dbl/Equivalencing-dbl.test 9420 9412 -0.1%
test-suite...ing-dbl/NodeSplitting-dbl.test 9476 9468 -0.1%
test-suite...mbolics-dbl/Symbolics-dbl.test 9588 9580 -0.1%
test-suite...lt/IndirectAddressing-flt.test 9708 9700 -0.1%
test-suite...bl/IndirectAddressing-dbl.test 9924 9916 -0.1%
test-suite...lt/CrossingThresholds-flt.test 9940 9932 -0.1%
test-suite...cations/hexxagon/hexxagon.test 10124 10132 0.1%
test-suite...bl/CrossingThresholds-dbl.test 10148 10140 -0.1%
test-suite...flt/InductionVariable-flt.test 10204 10196 -0.1%
test-suite...flt/LoopRestructuring-flt.test 10436 10428 -0.1%
test-suite...ow-flt/GlobalDataFlow-flt.test 10452 10444 -0.1%
test-suite...dbl/InductionVariable-dbl.test 10468 10460 -0.1%
test-suite...oops-flt/ControlLoops-flt.test 10588 10580 -0.1%
test-suite...dbl/LoopRestructuring-dbl.test 10644 10636 -0.1%
test-suite...ow-dbl/GlobalDataFlow-dbl.test 10660 10652 -0.1%
test-suite...oops-dbl/ControlLoops-dbl.test 10780 10772 -0.1%
test-suite...pansion-flt/Expansion-flt.test 10996 10988 -0.1%
test-suite...-flt/LinearDependence-flt.test 11148 11140 -0.1%
test-suite...pansion-dbl/Expansion-dbl.test 11196 11188 -0.1%
test-suite...ctions-flt/Reductions-flt.test 11316 11308 -0.1%
test-suite...-dbl/LinearDependence-dbl.test 11364 11356 -0.1%
test-suite...ctions-dbl/Reductions-dbl.test 11524 11516 -0.1%
test-suite...sc-C++/stepanov_container.test 13132 13140 0.1%
test-suite...lFlow-flt/ControlFlow-flt.test 13764 13756 -0.1%
test-suite...lFlow-dbl/ControlFlow-dbl.test 13972 13964 -0.1%
Geomean difference -0.0%

default-results    threshold-3        diff

count 310.000000 310.000000 310.000000
mean 26375.587097 26377.445161 -0.000273
std 65478.305060 65501.068235 0.002870
min 460.000000 460.000000 -0.033058
25% 1286.000000 1286.000000 0.000000
50% 3128.000000 3128.000000 0.000000
75% 11262.000000 11260.000000 0.000000
max 469784.000000 470728.000000 0.018975

I agree that adjusting BBDuplicateThreshold for code size is better choice than disabling the whole pass.

I will proceed with getting some tests together and upload a patch for controlling the threshold. The numbers look even nicer for thumb2 targets:

test-suite...rks/FreeBench/mason/mason.test 1372 1232 -10.2%
test-suite...otout/Shootout-nestedloop.test 312 304 -2.6%
test-suite...+/Shootout-C++-nestedloop.test 352 344 -2.3%
test-suite...Olden/perimeter/perimeter.test 1388 1404 1.2%
test-suite.../Applications/spiff/spiff.test 10980 10932 -0.4%
test-suite...lications/SIBsim4/SIBsim4.test 18816 18736 -0.4%
test-suite...ch/g721/g721encode/encode.test 3808 3792 -0.4%
test-suite...encode/alacconvert-encode.test 19944 19888 -0.3%
test-suite...decode/alacconvert-decode.test 19944 19888 -0.3%
test-suite...marks/Ptrdist/yacr2/yacr2.test 11008 11036 0.3%
test-suite...ications/JM/ldecod/ldecod.test 126648 126352 -0.2%
test-suite...s-C/Pathfinder/PathFinder.test 10540 10516 -0.2%
test-suite...rks/FreeBench/pifft/pifft.test 23600 23560 -0.2%
test-suite...oxyApps-C/miniAMR/miniAMR.test 30588 30548 -0.1%
test-suite.../Prolangs-C/bison/mybison.test 26536 26504 -0.1%
test-suite...-typeset/consumer-typeset.test 280016 279696 -0.1%
test-suite.../Benchmarks/Ptrdist/bc/bc.test 19524 19504 -0.1%
test-suite...Source/Benchmarks/sim/sim.test 7936 7928 -0.1%
test-suite...ks/Prolangs-C++/city/city.test 4212 4216 0.1%
test-suite...chmarks/MallocBench/gs/gs.test 76848 76920 0.1%
test-suite...sc-C++/stepanov_container.test 9048 9056 0.1%
test-suite...nsumer-lame/consumer-lame.test 67628 67684 0.1%
test-suite...CI_Purple/SMG2000/smg2000.test 88552 88480 -0.1%
test-suite...lications/ClamAV/clamscan.test 245252 245060 -0.1%
test-suite.../Applications/SPASS/SPASS.test 201956 201828 -0.1%
test-suite...ce/Applications/siod/siod.test 46036 46008 -0.1%
test-suite...cations/hexxagon/hexxagon.test 7236 7232 -0.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 22512 22500 -0.1%
test-suite...arching-dbl/Searching-dbl.test 7560 7556 -0.1%
test-suite...arching-flt/Searching-flt.test 7764 7760 -0.1%
test-suite...C/Packing-dbl/Packing-dbl.test 7768 7764 -0.1%
test-suite...ences-dbl/Recurrences-dbl.test 7800 7796 -0.1%
test-suite...l/StatementReordering-dbl.test 7808 7804 -0.1%
Geomean difference -0.1%

count 310.000000 310.000000 310.000000
mean 19255.638710 19250.812903 -0.000593
std 48432.908955 48424.032899 0.006159
min 292.000000 292.000000 -0.102041
25% 971.000000 971.000000 0.000000
50% 2376.000000 2376.000000 0.000000
75% 9927.000000 9926.000000 0.000000
max 323644.000000 323748.000000 0.011527

samparker retitled this revision from [IPO] Don't run jump threading at Oz to [JumpThreading] Reduce duplicate threshold at Oz.
samparker edited the summary of this revision. (Show Details)

Changing the threshold within the pass.

My numbers agree that this is a nice improvement (but will be ARM/AArch64 again).

llvm/lib/Transforms/Scalar/JumpThreading.cpp
379–380

Can you move the other code for setting up BBDupThreshold here too. There may be multiple functions, some with and some without MinSize.

samparker updated this revision to Diff 241075.Wed, Jan 29, 2:22 AM

@dmgreen Something like this? We now store the default threshold in the constructor and then make a decision for each function.

dmgreen accepted this revision.Sun, Feb 2, 1:19 PM

Aye, looks good. I'm not sure the Threshold being passed to the JumpThread pass is really serving anyone at the moment, but best to keep it around.

We only have ARM/AArch64size results, but I think at this level these changes will be more about general code structure than individual architecture details. If now, we can add a target hook in.

LGTM. Thanks.

This revision is now accepted and ready to land.Sun, Feb 2, 1:19 PM
This revision was automatically updated to reflect the committed changes.