mflr is kind of expensive on Power version smaller than 10, so we should schedule the store for the mflr's def away from mflr.
In epilogue, the expensive mtlr has no user for its def, so it doesn't matter that the load and the mtlr are back-to-back.
The cost of an mflr is a characteristic of an implementation, not of the architecture. A future processor might have a slow mflr. I think a new subtarget feature is needed for this.