This adjusts the cost of cross register bank copy instructions in MachineLICM, no longer treating them as expensive in order to allow them to be pulled out of inner loops. This can be especially important for MVE code where we sink VDUP's into blocks attempting to fold them into register variants of vector instructions. Where this scalar is a float value we are left with a COPY from SPR to GPR which then needs to be hoisted.
This is an alternative to D76024.
The isCrossCopy code was taken from DetectDeadLanes with some adjustment/cleanup. I don't feel like an expert on some of these areas like subreg copies.
I'm not 100% sure all these test diffs are better, but most of them look OK to me.
There's no reason for this to be virtual?