Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.
One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
f they will be lowered to v_mov_b32.
This has a positive effect on SGPR usage in shader-db: https://ghostbin.com/paste/rvn5x