This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1
Before this patch following code was transformed to memmove:
for (int i = 15; i >= 1; i--) { p[i] = p[i-1]; sum += p[i-1]; }
However load from p[i-1] is used not only by store to p[i] but also by sum computation.
Therefore we cannot emit memmove in loop header.
The new bailout needs some similar comment.