Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.
I realized after implementing this that it would probably be better to just directly select the indexing instructions here, but this is a first step