The bug is due to absence of in order uses of scalars which needs to be available for VectorizeTree() API. This API uses it for proper mask computation to be used in "shufflevector" IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.