For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like:
R0-R3, R12, LR, R4-R11
As a result, a lot of instructs that use R12/LR will be wide instrs.
This patch changes the allocation order to:
R0-R7, R12, LR, R8-R11
for thumb2 and -Oz.
In most cases, there is no extra push/pop instrs as they will be folded into
existing ones. There might be slight performance impact due to more stack
usage, so we only enable it when opt for min size.
For an embedded application with 83K code, this patch saves 430 bytes (0.5%).
I'm not very knowledgeable in this part of the code, but it seems you're destroying everything the above code was trying to do with the RCI order. It looks to me as though you should try to add the logic into the loop above, rather than splitting and discarding.