We can often fold an ADDI into the offset of load/store instructions:
(load (addi base, off1), off2) -> (load base, off1+off2) (store val, (addi base, off1), off2) -> (store val, base, off1+off2)
This is possible when the off1+off2 continues to fit the 12-bit immediate. We remove the previous restriction where we would never fold the ADDIs if the load/stores had nonzero offsets. We now do the fold the the resulting constant still fits a 12-bit immediate, or if off1 is a variable's address and we know based on that variable's alignment that off1+offs2 won't overflow. The first case doesn't seem to currently be exercised by the backend, but the code change is simple and easy to reason about, and handling it specially was actually making the code and the surrounding comments harder to understand.
clang-tidy: warning: 'auto GO' can be declared as 'const auto *GO' [llvm-qualified-auto]
not useful