This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnswitch] Perform loop unswitching on select instructions
AbandonedPublic

Authored by kachkov98 on Dec 1 2022, 6:29 AM.

Details

Summary

Try to convert select instruction with loop-invariant condition into
control-flow equivalent (if-then branch and phi node) and unswitch
this branch. This unswitching is always non-trivial since the successors
of created branch are not loop exits, so it requires cloning of loop.

Diff Detail

Event Timeline

kachkov98 created this revision.Dec 1 2022, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 6:29 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
kachkov98 requested review of this revision.Dec 1 2022, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 6:29 AM
kachkov98 added a comment.EditedDec 1 2022, 6:30 AM

test-suite statistics:

Program                                                                             simple-loop-unswitch.NumSelects
                                                                                    selects                        
           test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test  52.00                         
                 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  16.00                         
                  test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  16.00                         
                        test-suite :: MultiSource/Applications/ClamAV/clamscan.test   7.00                         
                      test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   6.00                         
                       test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   6.00                         
test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test   4.00                         
                   test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test   3.00                         
                          test-suite :: MultiSource/Applications/oggenc/oggenc.test   3.00                         
      test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   3.00                         
     test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   3.00                         
                    test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test   3.00                         
             test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test   3.00                         
                       test-suite :: MultiSource/Applications/JM/lencod/lencod.test   2.00                         
            test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   2.00                         
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   2.00                         
      test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   2.00                         
     test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test   2.00                         
                            test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   2.00                         
                            test-suite :: MultiSource/Applications/spiff/spiff.test   2.00                         
                            test-suite :: MultiSource/Benchmarks/nbench/nbench.test   1.00                         
        test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test   1.00                         
          test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test   1.00                         
                        test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   1.00                         
test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   1.00                         
          test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test   1.00                         
                  test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test   1.00                         
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   1.00                         
                     test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   1.00                         
      test-suite :: MultiSource/Benchmarks/MiBench/office-ispell/office-ispell.test   1.00                         
           test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test   1.00                         
             test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test   1.00                         
                                  test-suite :: MultiSource/Benchmarks/sim/sim.test   1.00

I don't observe any measurable execution time change on SPEC2017 with this change, but in some cases unswitching of select looks desirable (like https://godbolt.org/z/1rPTPan7G)

mkazantsev added inline comments.Dec 4 2022, 11:22 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3104

It is not an equivalent transform in sense of UB. If the condition was posion, then select by this condition is also poison. If you turn it into a branch, branch by poisoned condition is immediate UB. How do you account for that?

kachkov98 updated this revision to Diff 480082.Dec 5 2022, 5:54 AM

Insert freeze of select condition

kachkov98 added inline comments.Dec 5 2022, 5:59 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3104

Accidentionally it was correct in some tests, because unswitchNontrivialInvariants() inserts freeze for branch conditions if the branch is not guaranteed to execute at least once in loop body; for other loops it will not work, so now we insert freeze of select condition explicitly.

mkazantsev requested changes to this revision.Dec 6 2022, 11:47 PM
mkazantsev added inline comments.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2619

Can we change the insertion point to right after the cond? Presumeably it's outside of loop.

2628

This is wrong, you can't insert a Phi before SI if there is another instruction before it. Your insertion point should be smth like block.begin().

2720–2721

This condition is growing weird, can we rewrite it to make it explicit that we choose between guard, select, terminator etc? Maybe factor out into a lambda.

2806–2820

Just for record: this will cost us compile time, turning the algorithm from O(number of blocks) to O(number of instructions) (if guards are off). Keep that in mind if you see compile time regressions.

2812

I would also filter out such obviously useless cases as:

  • Select is not used
  • Select is only used outside of the loop (and therefore can be sunk)

maybe smth else, give it some more thought.

llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
4393

To see what your tests are doing, I propose the following steps:

  • Regenerate all checks in this file using update_test_checks.py script;
  • Check in your new tests with checks as is (without your change in code)
  • Then rebase your patch on top of it, to see what exactly your patch has changed.

Without this, it's hard to tell what's right or wrong here.

This revision now requires changes to proceed.Dec 6 2022, 11:47 PM
mkazantsev added inline comments.Dec 6 2022, 11:48 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2628

I also propose to give this Phi a name.

fhahn added a subscriber: caojoshua.Dec 7 2022, 7:10 AM

It looks like this tries to address the same issue as D138526. Might be goo to sync up with @caojoshua

It looks like this tries to address the same issue as D138526. Might be goo to sync up with @caojoshua

Can confirm this is addressing the same issue. Our implementations are quite a bit different. It seems you are converting select's into branch's and mostly using existing logic. In my review, I try to directly unswitch the select. Not sure which approach is preferred. I'll let people who have been working on LLVM longer give their thoughts.

Either way, I think we can share some test cases. In my review, I pulled in select tests from the old PM LoopUnswitch pass, and added some checks to existing tests.

i had previously attempted to solve the same problem in https://reviews.llvm.org/D138526. @kachkov98 are you still working on this? If not, can I take over?

kachkov98 abandoned this revision.Apr 10 2023, 2:22 AM

@caojoshua Sure, I prefer your approach without select -> branch convertion more (the weak point here is that unswitching can fail after this conversion, and subsequent passes should fold branch to select back). You can share LIT test if it will be helpful.

caojoshua added a comment.EditedApr 10 2023, 9:53 PM

@caojoshua Sure, I prefer your approach without select -> branch convertion more (the weak point here is that unswitching can fail after this conversion, and subsequent passes should fold branch to select back). You can share LIT test if it will be helpful.

Ha, that's funny. I was actually going to say I prefer your approach and would have liked to work on top of this patch. It does not look possible that a select is converted, and then unswitching fails. By the time you convert a select, it has already been determined a legal unswitch target with the best cost.

This patch uses already existing unswitching logic, while my patch adds a whole bunch of new logic. This patch is much easier to understand.