The source changes are fairly straightforward. The most interesting change is the new code at lines 489-499 that promotes a 32-bit value to 64 bits. I'd appreciate a careful review to make sure I am doing that correctly.
I would also appreciate any suggestions for improving the unit test. I modeled the test after the existing movtopush.ll that was written for 32-bit targets. I am testing all the same things, though naturally the details are different due to differences between the calling conventions.
Not surprisingly, the code size improvements are small compared to IA-32 due the in-register 64-bit calling convention. I measured 0.2% improvement across cpu2k. Performance is basically flat. There are further improvement opportunities, e.g. adding support for scheduling the pushes and relaxing the early bail-out at line 351.