Following https://reviews.llvm.org/D104464. In addition to load and store
instructions, also handle memcpy intrinsics in loop body. The constraints are
same as before (same base pointer, etc.).
Fixed the FIXME added in D104464, and added two more tests.
IIRC I think I had 'kind of' similar idea to use 4th mayLoopAccessLocation but since it seems expensive I ended up in 3 calls: https://reviews.llvm.org/D107075 I admit I didn't measure impact on compile time. Maybe it's not so bad.