Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate an indexed access.
The resulting code, at least for Arm, is that usually pre-indexed loads are used as the last access, but sometimes the first.
@kparzysz Would you be able to provide feedback on how this effects Hexagon? It's a target that I don't build and I haven't looked at the tests, but I'm assuming this would interest you.
Is this optimization inherently code-size unfriendly for ARM? (The patch actually reduces the instruction count in LSR's Cost when this optimization kicks in)