The idea is to go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).
This could be made more precise (e.g. by looking at the size of the constants, or even constructing the potential instruction and asking the MC layer for the encoding size. Not to mention trying to figure out the gains from folding.) but this should be a decent first approximation.
Not if the calling convention is callee-pop. In fact, if the convention is callee-pop, using a reserved call frame requires a sub, which should give the 'mov' lowering a penalty.
Anyway, not a blocking issue, just a heuristic worth adding.