This patch adds support to the ARM Load Store Optimizer to generate ldrd/strd's for V7-M class cores. I've adapted code from the AArch64 Load Store Optimizer to implement this optimization in the ARM Load Store Optimizer and I've kept the comments the same in some places.
This patch will only collapse ldr/str's to ldrd/strd for V7-M, a follow up patch will add support for generating these instruction sequences for V7-AR class cores.
By default always using ldrd/strd without reference to if it's faster on the target CPU sounds like a bad idea, e.g. according to the Cortex-M3 TRM LDRD is 3 cycles, but LDM is 2 + (nr registers - 1).