This is continuation of https://reviews.llvm.org/D45098. This patch gangs up loads and stores for pairing (in case of memcpy). We address only memcpy and not other memory operations, because in mempy source and destination addresses are disjoint and hence no alias analysis is required to disambiguate loads and stores.
Each target defines numbers of loads and stores to be ganged up. For Aarch64, it is set at 4. Other targets have defaulted to 0, and hence no effect.
s/inline/inlined/