AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.
This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
In a future change, this should always split the block.