Following https://reviews.llvm.org/D104464. In addition to load and store
instructions, also handle memcpy intrinsics in loop body. The constraints are
same as before (same base pointer, etc.).
Fixed the FIXME added in D104464, and added two more tests.