In many cases, the cpi instruction can be skipped because a previous instruction already sets the needed flags.
This saves around 1% in binary size.
Future improvements:
- remove cp in sub r1, r2 and cp r1, r2
- optimize andi + breq/brne to sbrs/sbrc + rjmp like avr-gcc does (this avoids clobbering a register and should therefore result in better generated code)
- maybe do the same optimization for other flags too?
It would be better to add a range check here.