It was discovered that an extra register COPY remained when expanding a (var-len) memory operation with a loop and there were another use of the involved address register(s) afterwards.
A simple fix for this is to COPY the address registers before the loop and use that new vreg instead. This handles the test cases and also seems clearly beneficial on SPEC:
Spill|Reload : 613173 613123 -50 Copies : 1018500 1016022 -2478
It doesn't seem to make a difference to do this COPY in all cases, even though it is only useful in the case of a register loop with other uses of the register.