This patch is in preparation for the async unwind CFI.
Put the first LDP the end, so that the load-store optimizer can run
and merge the LDP and the ADD into a post-index LDP.
Do this always and as early as at the time of the initial creation of
the CSR restore instructions, even if that LDP is not guaranteed to
be mergeable with a subsequent SP increment.
This greatly simplifies the CFI generation for prologue, as otherwise
we have to take extra steps to ensure reordering does not cross CFI
instructions.