Instead of inserting everything after the 'root' of the reduction, insert all instructions as close to their operands as possible. This can help reduce register pressure.
Note: I have no idea why git has decided that I've made a change to an MC test.