Implement last set of TODOs from rL371452. Three cases:
- A loop which loads N out of every M bytes. (i.e. there are gaps)
- A loop which loads N bytes every M bytes where N > M. (i.e. the loads overlap, and the alignment must be less than natural alignment)
- An exact constant trip count is not known, but an upper bound is and the entire region is dereferenceable.
The SCEV change is needed for (3) above. I'm going to separate that into it's own review (with it's own tests) and then rebase once that's landed.
Note that StepC may be negative; LV can vectorize accesses such as A[N-i], and also group together accesses with negative non-unit stride into "reversed" interleave groups.