Try to convert select instruction with loop-invariant condition into
control-flow equivalent (if-then branch and phi node) and unswitch
this branch. This unswitching is always non-trivial since the successors
of created branch are not loop exits, so it requires cloning of loop.
Details
- Reviewers
fhahn mkazantsev nikic
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
test-suite statistics:
Program simple-loop-unswitch.NumSelects selects test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 52.00 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 16.00 test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 16.00 test-suite :: MultiSource/Applications/ClamAV/clamscan.test 7.00 test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 6.00 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 6.00 test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 4.00 test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test 3.00 test-suite :: MultiSource/Applications/oggenc/oggenc.test 3.00 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 3.00 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 3.00 test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test 3.00 test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 3.00 test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2.00 test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 2.00 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 2.00 test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2.00 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2.00 test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 2.00 test-suite :: MultiSource/Applications/spiff/spiff.test 2.00 test-suite :: MultiSource/Benchmarks/nbench/nbench.test 1.00 test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 1.00 test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test 1.00 test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 1.00 test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 1.00 test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1.00 test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 1.00 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 1.00 test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 1.00 test-suite :: MultiSource/Benchmarks/MiBench/office-ispell/office-ispell.test 1.00 test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1.00 test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1.00 test-suite :: MultiSource/Benchmarks/sim/sim.test 1.00
I don't observe any measurable execution time change on SPEC2017 with this change, but in some cases unswitching of select looks desirable (like https://godbolt.org/z/1rPTPan7G)
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3104 | It is not an equivalent transform in sense of UB. If the condition was posion, then select by this condition is also poison. If you turn it into a branch, branch by poisoned condition is immediate UB. How do you account for that? |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3104 | Accidentionally it was correct in some tests, because unswitchNontrivialInvariants() inserts freeze for branch conditions if the branch is not guaranteed to execute at least once in loop body; for other loops it will not work, so now we insert freeze of select condition explicitly. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2619 | Can we change the insertion point to right after the cond? Presumeably it's outside of loop. | |
2628 | This is wrong, you can't insert a Phi before SI if there is another instruction before it. Your insertion point should be smth like block.begin(). | |
2720–2721 | This condition is growing weird, can we rewrite it to make it explicit that we choose between guard, select, terminator etc? Maybe factor out into a lambda. | |
2806–2820 | Just for record: this will cost us compile time, turning the algorithm from O(number of blocks) to O(number of instructions) (if guards are off). Keep that in mind if you see compile time regressions. | |
2812 | I would also filter out such obviously useless cases as:
maybe smth else, give it some more thought. | |
llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll | ||
4393 | To see what your tests are doing, I propose the following steps:
Without this, it's hard to tell what's right or wrong here. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2628 | I also propose to give this Phi a name. |
It looks like this tries to address the same issue as D138526. Might be goo to sync up with @caojoshua
Can confirm this is addressing the same issue. Our implementations are quite a bit different. It seems you are converting select's into branch's and mostly using existing logic. In my review, I try to directly unswitch the select. Not sure which approach is preferred. I'll let people who have been working on LLVM longer give their thoughts.
Either way, I think we can share some test cases. In my review, I pulled in select tests from the old PM LoopUnswitch pass, and added some checks to existing tests.
i had previously attempted to solve the same problem in https://reviews.llvm.org/D138526. @kachkov98 are you still working on this? If not, can I take over?
@caojoshua Sure, I prefer your approach without select -> branch convertion more (the weak point here is that unswitching can fail after this conversion, and subsequent passes should fold branch to select back). You can share LIT test if it will be helpful.
Ha, that's funny. I was actually going to say I prefer your approach and would have liked to work on top of this patch. It does not look possible that a select is converted, and then unswitching fails. By the time you convert a select, it has already been determined a legal unswitch target with the best cost.
This patch uses already existing unswitching logic, while my patch adds a whole bunch of new logic. This patch is much easier to understand.
Can we change the insertion point to right after the cond? Presumeably it's outside of loop.