This changes the dataflow solver into a two-phase approach:
- In a first pass visit every block in a reverse post-order resulting in a topological ordering for loop-free code. Put "backedges" into worklist.
- Revisit blocks with worklist dataflow algorithm until fixpoint is reached.
This should be optimal for loop-free code and lead to faster convergence with loops present (compare to the current code).
This is inspired by https://reviews.llvm.org/D154695
Performance of artifical testcase in https://github.com/llvm/llvm-project/issues/62348 without this patch:
n: 20000 6.27user 0.22system 0:06.50elapsed 99%CPU (0avgtext+0avgdata 498664maxresident)k 0inputs+8824outputs (0major+115980minor)pagefaults 0swaps n: 40000 18.21user 0.87system 0:19.10elapsed 99%CPU (0avgtext+0avgdata 1722016maxresident)k 0inputs+17576outputs (0major+425395minor)pagefaults 0swaps n: 60000 35.72user 1.74system 0:37.47elapsed 99%CPU (0avgtext+0avgdata 3742560maxresident)k 0inputs+26456outputs (0major+936001minor)pagefaults 0swaps n: 80000 58.58user 3.06system 1:01.65elapsed 99%CPU (0avgtext+0avgdata 6513700maxresident)k 0inputs+35328outputs (0major+1631166minor)pagefaults 0swaps n: 100000 82.78user 4.64system 1:27.44elapsed 99%CPU (0avgtext+0avgdata 10117168maxresident)k 0inputs+44016outputs (0major+2552478minor)pagefaults 0swaps
with this change:
n: 20000 3.59user 0.22system 0:03.82elapsed 99%CPU (0avgtext+0avgdata 498804maxresident)k 0inputs+9016outputs (0major+119143minor)pagefaults 0swaps n: 40000 8.57user 0.81system 0:09.39elapsed 99%CPU (0avgtext+0avgdata 1722108maxresident)k 0inputs+17832outputs (0major+431756minor)pagefaults 0swaps n: 60000 12.70user 1.79system 0:14.50elapsed 99%CPU (0avgtext+0avgdata 3742640maxresident)k 0inputs+26328outputs (0major+945180minor)pagefaults 0swaps n: 80000 19.23user 3.07system 0:22.32elapsed 99%CPU (0avgtext+0avgdata 6513620maxresident)k 0inputs+35200outputs (0major+1643901minor)pagefaults 0swaps n: 100000 22.48user 4.71system 0:27.20elapsed 99%CPU (0avgtext+0avgdata 10117160maxresident)k 0inputs+43952outputs (0major+2571138minor)pagefaults 0swaps
and with https://reviews.llvm.org/D154695 which performs the same within measurement noise:
n: 20000 3.84user 0.22system 0:04.07elapsed 99%CPU (0avgtext+0avgdata 498536maxresident)k 0inputs+8824outputs (0major+116001minor)pagefaults 0swaps n: 40000 8.33user 0.83system 0:09.18elapsed 99%CPU (0avgtext+0avgdata 1721764maxresident)k 0inputs+17704outputs (0major+425381minor)pagefaults 0swaps n: 60000 12.68user 1.78system 0:14.47elapsed 99%CPU (0avgtext+0avgdata 3742440maxresident)k 0inputs+26328outputs (0major+936004minor)pagefaults 0swaps n: 80000 18.10user 3.02system 0:21.13elapsed 99%CPU (0avgtext+0avgdata 6513456maxresident)k 0inputs+35328outputs (0major+1631150minor)pagefaults 0swaps n: 100000 22.15user 4.56system 0:26.73elapsed 99%CPU (0avgtext+0avgdata 10116828maxresident)k 0inputs+43824outputs (0major+2552466minor)pagefaults 0swaps
Is it better to have only one BasicBlock arguments? Then we can get the BlockData and the Block index by Mapping and Block.