The register r1 is defined to have the constant value 0 in the avr-gcc calling convention (which we follow). Unfortunately, we don't really make use of it. This patch replaces LDI 0 instructions with a copy from r1.
This reduces code size: my AVR build of compiler-rt goes from 50660 to 50240 bytes of code size, which is a 0.8% reduction. Presumably it will also improve execution speed, although I didn't measure this.
This patch took me a looong time, with many failed attempts. Finally I have something that works. It can still be improved, but that can happen in follow up patches. A 0.8% decrease in code size is already pretty significant.
I think we need to check the device family, AFAIK, r17 is used as zero_reg on avr-tiny family.