This patch shows the benefit of converting masked vector loads to regular vector loads for x86 AVX.
I've raised the legality issue of reading the extra memory bytes on llvm-dev.
- x86 already does this kind of optimization for multiple scalar loads -> vector load.
- If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.