We currently have problems with the way that low overhead loops are specified, with LR sometimes being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail.
Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour.
This might obviously be a little contentious, so I would like to get opinions from people who know what they are talking about. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions.
This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them.