So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.
Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.
This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask