For an addition with an immediate in specific ranges, a pair of
addi-addi can be generated instead of the ordinary lui-addi-add serial.
for (add i32 %a, 4094), it can be simplfied to
addi a0, a0, 2047
addi a0, a0, 2047.
The original assembly is
lui a1, 1
addi a1, a1, -2
add a0, a0, a1
However, this optimization can only be performed if the value generated by lui-addi has no other use.
Given that you can have an ADDI of -2048, that should be -4096, not -4095. (Please update the comment, the code and the test results accordingly).
This is a matter of style so it's not very important, but I would lean towards implementing this check with a comparison logic that matches the comment about the required range, as that would be make it easier to check at a glance.
Nitpick: make it the other way around, so that the second addi is the "smaller" one. That generated code seems more intuitive to me that way :-)
When you have i32 and the i64 tests they should both have riscv32 and riscv64 test CHECKs. You can have operations on i64s in riscv32, they just doesn't correspond to a single native instruction. But in this case I don't see much advantage in having test variants for both types, as this optimization should be orthogonal to that. I suggest keeping one i32+i64 test (to show that it works with both types), and trimming the remaining tests down to just the i32 variant.
The patch seems correct to me. I'm not quite sure this is the best place to do the this transformation (guidance on that from other people would be welcome), but I don't have a better solution to propose right now. I'm going to approve this patch, but I suggest you wait a couple more days to give other people an opportunity to comment on that issue.
Regarding the tests, as a nitpicking issue, I would only suggest giving the test functions slightly more semantic names (maybe something like negative_bound_reject, etc.).