- Move the s_and exec to its correct position before the content of the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations
- Add support for indirect function calls
To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.
Adjust the handling of new basic blocks in RegBankSelect to cope with
the new basic blocks inserted for indirect calls.
I think in the absence of knowing if the call target is uniform in CallLowering, we can't do tail calls