When calling a function and passing large argument by value, and when argument size is above certain threshold, memcpy is used to copy argument on stack instead of sequence of loads and stores. In that case callseq* nodes for memcpy are nested inside callseq* nodes for the called function. This patch corrects this behavior by moving callseq_start of the called function after arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear.
This is a second try to fix this problem. Previously reverted here rL316215.