A Thumb-2 post-indexed LDR instruction such as:
ldr.w r0, [r1], #4
Can be rewritten as:
ldm.n r1!, {r0}
LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode. This is a trick learned from ARM Compiler 5 ("armcc").
Shouldn't you also update the statistics here (++NumLdSts)?