Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case.
The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.)
Instead of traversing all the blocks, would it be possible (or make sense) to set a flag in ARMMachineFunctionInfo, indicating that the function contains tail calls, whenever we create a TC_RETURN in LowerCall,