Test thanks to Michael Kuklinski from #llvm: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/
We manage to deduce that the answer does not require looping,
but we do that after the last LoopDeletion pass run,
so we end up being stuck with a dead loop.
While perhaps we are (clearly) also missing LoopDeletion
in buildModuleOptimizationPipeline() part of the pipeline,
we *should* add this one here, to reduce the function size
before inlining. I believe we want to have a new loop pass manager,
so that LoopSimplifyPass/LCSSAPass are guaranteed to be computed once?
Now, as with all things SCEV, this has a very expected ~+0.5% compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=e5df0a5a6f412965eb1be495d2672b4164c6c3d5&to=4af31d7accb368597bc8b4fc8f125d7c2fdbae6a&stat=instructions
Looking at the transformation stats over vanilla test-suite, i think it's rather expected:
| statistic name | baseline | proposed | Δ | % | |%| | |--------------------------------------------------|----------:|----------:|-------:|-------:|-------:| | scalar-evolution.NumTripCountsNotComputed | 105592 | 137458 | 31866 | 30.18% | 30.18% | | scalar-evolution.NumBruteForceTripCountsComputed | 789 | 955 | 166 | 21.04% | 21.04% | | scalar-evolution.NumTripCountsComputed | 299759 | 348901 | 49142 | 16.39% | 16.39% | | loop-delete.NumBackedgesBroken | 542 | 557 | 15 | 2.77% | 2.77% | | regalloc.numExtends | 81 | 79 | -2 | -2.47% | 2.47% | | licm.NumSunk | 12167 | 11875 | -292 | -2.40% | 2.40% | | indvars.NumFoldedUser | 408 | 400 | -8 | -1.96% | 1.96% | | indvars.NumElimCmp | 3831 | 3757 | -74 | -1.93% | 1.93% | | loop-delete.NumDeleted | 8055 | 8096 | 41 | 0.51% | 0.51% | | codegenprepare.NumSelectsExpanded | 277 | 278 | 1 | 0.36% | 0.36% | | assume-queries.NumAssumeQueries | 120910554 | 121324983 | 414429 | 0.34% | 0.34% | | loop-unroll.NumRuntimeUnrolled | 13841 | 13796 | -45 | -0.33% | 0.33% | | phi-node-elimination.NumCriticalEdgesSplit | 83054 | 82804 | -250 | -0.30% | 0.30% | | branch-folder.NumBranchOpts | 108122 | 107827 | -295 | -0.27% | 0.27% | | correlated-value-propagation.NumCmps | 1461 | 1465 | 4 | 0.27% | 0.27% | | branch-folder.NumDeadBlocks | 130818 | 130490 | -328 | -0.25% | 0.25% | | branch-folder.NumTailMerge | 72501 | 72347 | -154 | -0.21% | 0.21% | | correlated-value-propagation.NumAShrs | 514 | 515 | 1 | 0.19% | 0.19% | | loop-unroll.NumUnrolled | 40136 | 40069 | -67 | -0.17% | 0.17% | | licm.NumHoisted | 388827 | 388221 | -606 | -0.16% | 0.16% | | machine-cse.NumPREs | 3085 | 3080 | -5 | -0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 9236 | 9222 | -14 | -0.15% | 0.15% |
Note that i'm only changing current PM, and not touching obsolete PM.
In that particular case after this one.