[ModuloSchedule] Fix modulo expansion for data loop carried dependencies.
The new experimental expansion has a problem when a value has a data
dependency with an instruction from a previous stage. This is due to
the way we peel out the kernel. To fix that I'm changing the way we
peel out the kernel. We now peel the kernel NumberStage - 1 times.
The code would be correct at this point if we didn't have to handle
cases where the loop iteration is smaller than the number of stages.
To handle this case we move instructions between different epilogues
based on their stage and remap the PHI instructions correctly.
Differential Revision: https://reviews.llvm.org/D69538