Workaround the shortcomings of LoopStrengthReduce and ScalarEvolutionExpander to allow inbounds getelemenptr instructions, in unrolled loops, to reach the backend. This enables greater usage of memory operations with immediate offsets.
The pass simply pattern matches geps, in the form: (getelementptr ptr %base (or i32 %reg_offset, i32 constant)) and then creates a new base from %base and %reg_offset, using pointer arithmetic. This is then shared amongst the geps, all of each can then use an immediate index too.
Can you add a motivating example of small code snippets that tries to solve that the status quo doesn't?