This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/Transforms/LoopVectorize/
-
test/
-
Transforms/
-
LoopVectorize/
1
runtime-checks-hoist.ll

Differential D154075

[LoopVectorize] Add pre-commit tests for D152366
ClosedPublic

Authored by david-arm on Jun 29 2023, 5:37 AM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal

Commits

rG494d28ec07dd: [LoopVectorize] Add pre-commit tests for D152366

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Jun 29 2023, 5:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 5:37 AM

Herald added subscribers: artagnon, shiva0217, StephenFan. · View Herald Transcript

david-arm requested review of this revision.Jun 29 2023, 5:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 5:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a child revision: D152366: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop.Jun 29 2023, 5:38 AM

Harbormaster completed remote builds in B242060: Diff 535745.Jun 29 2023, 6:48 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jul 19 2023, 10:25 AM

Herald added a subscriber: wangpc. · View Herald TranscriptJul 19 2023, 10:25 AM

Add new test case with triple nested loop due to comments on D152366

Harbormaster completed remote builds in B247197: Diff 542899.Jul 21 2023, 9:58 AM

Taking another look, I think we need a few more tests for cases not covered by the current tests:

uncomputable BTC in the outer loop
inner and outer inductions decreasing (constant step)
inner and outer induction incremented by non-constant

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
746	nit: remove redundant indvars prefixes.

inner and outer inductions decreasing (constant step)

Hi @fhahn, perhaps I'm missing something but the runtime checks already look broken to me for decreasing IVs in the inner loop:

#include <stdint.h>

void decreasing_inner_iv(int32_t *dst, int32_t *src, int m, int n) {
  for (int i = 0; i < m; i++) {
    for (int j = n - 1; j >= 0; j--) {
      dst[(i * (n + 1)) + j] += src[(i * n) + j];
    }
  }
}

These are the SCEVs for the checks:

LAA: Adding RT check for range:
Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.co
nd1.preheader.us>
LAA: Adding RT check for range:
Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheade
r.us>
Calculating cost of runtime checks:
  1  for   %bound0 = icmp ult ptr %scevgep, %scevgep36
  1  for   %bound1 = icmp ult ptr %scevgep35, %scevgep34
  1  for   %found.conflict = and i1 %bound0, %bound1

I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?

Added more test cases for uncomputable trip count in the outer loop, decreasing inner IV and decreasing outer IV.

In D154075#4531095, @fhahn wrote:

Taking another look, I think we need a few more tests for cases not covered by the current tests:

inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:

for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }

where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Harbormaster completed remote builds in B248834: Diff 545140.Jul 28 2023, 8:22 AM

In D154075#4542401, @david-arm wrote:
In D154075#4531095, @fhahn wrote:

Taking another look, I think we need a few more tests for cases not covered by the current tests:

inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:
for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }
where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Thanks for the update, I'll take a closer look in the middle of the week when I am back from traveling.

In D154075#4532246, @david-arm wrote:

LAA: Adding RT check for range:
Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.co
nd1.preheader.us>
LAA: Adding RT check for range:
Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheader.us> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%for.cond1.preheade
r.us>

I don't see how the Start and End values correspond to actual range of addresses accessed in the inner loop?

IIUC : {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is effectively %dst + i * (4 *(1 + n)) where j = 0 (lower bound) and {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%for.cond1.preheader.us> is %dst + n i * (4 *(1 + n)) for j=n-1.

Does that make sense?

In D154075#4542401, @david-arm wrote:
In D154075#4531095, @fhahn wrote:

Taking another look, I think we need a few more tests for cases not covered by the current tests:

inner and outer induction incremented by non-constant

I've added the other tests you suggested, but I'm not sure of the value of having tests for non-constant IV increments. For the inner loop a non-constant IV increment will lead to SCEV checks to ensure we only enter the vector loop if the stride is 1, in which case it's going to be a constant anyway. For the outer loop the tests already have outer loop non-constant memory access strides, i.e. tests like this:
for (int i = m - 1; i >= 0; i--) {
  for (int j = 0; j <= n; j++) {
    dst[(i * stride1) + j] += src[(i * stride2) + j];
 }
where the stride for the outer loop is stride1 so by making the outer IV increment non-constant all I'm doing is transforming an already unknown stride into another unknown stride, unless I'm missing something?

Even though we version with 1 at the moment, I think it is still valuable to have the additional test coverage in case the versioning (or something else) changes which may impact the correctness of the generated RT checks.

Closed by commit rG494d28ec07dd: [LoopVectorize] Add pre-commit tests for D152366 (authored by david-arm). · Explain WhyAug 24 2023, 3:52 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG494d28ec07dd: [LoopVectorize] Add pre-commit tests for D152366.

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 24 2023, 3:52 AM

Hi @fhahn, I added another test case for the inner strides being unknown - see @unknown_inner_stride.

Revision Contents

Path

Size

llvm/

test/

Transforms/

LoopVectorize/

runtime-checks-hoist.ll

1391 lines

Diff 553067

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; REQUIRES: asserts
				; RUN: opt < %s -p 'loop-vectorize' -force-vector-interleave=1 -S \
				; RUN: -force-vector-width=4 -debug-only=loop-accesses,loop-vectorize,loop-utils 2> %t \| FileCheck %s
				; RUN: cat %t \| FileCheck %s --check-prefix=DEBUG


				; Equivalent example in C:
				; void diff_checks(int32_t dst, int32_t src, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * (n + 1)) + j] = src[(i * n) + j];
				; }
				; }
				; }
				; NOTE: The strides of the starting address values in the inner loop differ, i.e.
				; '(i * (n + 1))' vs '(i * n)'.

				; DEBUG-LABEL: LAA: Found a loop in diff_checks:
				; DEBUG-NOT: LAA: Adding RT check for range:

				define void @diff_checks(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {
				; CHECK-LABEL: define void @diff_checks
				; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SRC2:%.*]] = ptrtoint ptr [[SRC]] to i64
				; CHECK-NEXT: [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
				; CHECK-NEXT: [[ADD5:%.*]] = add nuw i32 [[N]], 1
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[ADD5]] to i64
				; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[WIDE_N]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_EXIT:%.]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[IV_OUTER]]
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[DST1]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP3]], [[IV_OUTER]]
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[SRC2]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
				; CHECK-NEXT: [[TMP9:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP1]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[TMP10:%.*]] = sub i64 [[TMP5]], [[TMP7]]
				; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP10]], 16
				; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[TMP11]], [[TMP8]]
				; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP12]]
				; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP14]], align 4
				; CHECK-NEXT: [[TMP15:%.*]] = add nsw i64 [[TMP11]], [[TMP9]]
				; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP15]]
				; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0
				; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP8]]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP19]]
				; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[IV_INNER]], [[TMP9]]
				; CHECK-NEXT: [[ARRAYIDX9_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP21]]
				; CHECK-NEXT: store i32 [[TMP20]], ptr [[ARRAYIDX9_US]], align 4
				; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
				; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
				; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK: inner.exit:
				; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
				; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
				; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%add5 = add nuw i32 %n, 1
				%0 = zext i32 %n to i64
				%1 = sext i32 %add5 to i64
				%wide.m = zext i32 %m to i64
				%wide.n = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %inner.exit ]
				%2 = mul nsw i64 %iv.outer, %0
				%3 = mul nsw i64 %iv.outer, %1
				br label %inner.loop

				inner.loop:
				%iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
				%4 = add nuw nsw i64 %iv.inner, %2
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %4
				%5 = load i32, ptr %arrayidx.us, align 4
				%6 = add nsw i64 %iv.inner, %3
				%arrayidx9.us = getelementptr inbounds i32, ptr %dst, i64 %6
				store i32 %5, ptr %arrayidx9.us, align 4
				%iv.inner.next = add nuw nsw i64 %iv.inner, 1
				%inner.exit.cond = icmp eq i64 %iv.inner.next, %wide.n
				br i1 %inner.exit.cond, label %inner.exit, label %inner.loop

				inner.exit:
				%iv.outer.next = add nuw nsw i64 %iv.outer, 1
				%outer.exit.cond = icmp eq i64 %iv.outer.next, %wide.m
				br i1 %outer.exit.cond, label %outer.exit, label %outer.loop

				outer.exit:
				ret void
				}


				; Equivalent example in C:
				; void full_checks(int32_t dst, int32_t src, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * n) + j] += src[(i * n) + j];
				; }
				; }
				; }
				; We decide to do full runtime checks here (as opposed to diff checks) due to
				; the additional load of 'dst[(i * n) + j]' in the loop.

				; DEBUG-LABEL: LAA: Found a loop in full_checks:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>

				define void @full_checks(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {
				; CHECK-LABEL: define void @full_checks
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], [[TMP3]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP3]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP5]]
				; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, !alias.scope !4
				; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP11]], align 4, !alias.scope !7, !noalias !4
				; CHECK-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP12]], ptr [[TMP11]], align 4, !alias.scope !7, !noalias !4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP5]]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP14]]
				; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]
				; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP16]], [[TMP15]]
				; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
				; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
				; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP10:![0-9]+]]
				; CHECK: inner.exit:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]
				; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = zext i32 %n to i64
				%wide.m = zext i32 %m to i64
				%wide.n = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ 0, %entry ], [ %outer.iv.next, %inner.exit ]
				%1 = mul nsw i64 %outer.iv, %0
				br label %inner.loop

				inner.loop:
				%iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
				%2 = add nuw nsw i64 %iv.inner, %1
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %2
				%3 = load i32, ptr %arrayidx.us, align 4
				%arrayidx8.us = getelementptr inbounds i32, ptr %dst, i64 %2
				%4 = load i32, ptr %arrayidx8.us, align 4
				%add9.us = add nsw i32 %4, %3
				store i32 %add9.us, ptr %arrayidx8.us, align 4
				%iv.inner.next = add nuw nsw i64 %iv.inner, 1
				%inner.exit.cond = icmp eq i64 %iv.inner.next, %wide.n
				br i1 %inner.exit.cond, label %inner.exit, label %inner.loop

				inner.exit:
				%outer.iv.next = add nuw nsw i64 %outer.iv, 1
				%outer.exit.cond = icmp eq i64 %outer.iv.next, %wide.m
				br i1 %outer.exit.cond, label %outer.exit, label %outer.loop

				outer.exit:
				ret void
				}


				; Equivalent example in C:
				; void full_checks_diff_strides(int32_t dst, int32_t src, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * (n + 1)) + j] += src[(i * n) + j];
				; }
				; }
				; }
				; We decide to do full runtime checks here (as opposed to diff checks) due to
				; the additional load of 'dst[(i * n) + j]' in the loop.
				; NOTE: This is different to the test above (@full_checks) because the dst array
				; is accessed with a higher stride compared src, and therefore the inner loop
				; runtime checks will vary for each outer loop iteration.

				; DEBUG-LABEL: LAA: Found a loop in full_checks_diff_strides:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 + (4 * (zext i32 %n to i64))<nuw><nsw>)<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 + (4 * (zext i32 %n to i64))<nuw><nsw>)<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>


				define void @full_checks_diff_strides(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {
				; CHECK-LABEL: define void @full_checks_diff_strides
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[TMP0]], 4
				; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[WIDE_N]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP0]], [[TMP3]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP2]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP5]]
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP0]], [[TMP5]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP6]]
				; CHECK-NEXT: [[NPLUS1:%.*]] = add nuw nsw i32 [[N]], 1
				; CHECK-NEXT: [[WIDE_NPLUS1:%.*]] = zext i32 [[NPLUS1]] to i64
				; CHECK-NEXT: [[TMP7:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_N]]
				; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_NPLUS1]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP7]]
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP10]]
				; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4, !alias.scope !11
				; CHECK-NEXT: [[TMP13:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP8]]
				; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP13]]
				; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !14, !noalias !11
				; CHECK-NEXT: [[TMP16:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP16]], ptr [[TMP15]], align 4, !alias.scope !14, !noalias !11
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP7]]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP18]]
				; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP8]]
				; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]
				; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP21]], [[TMP19]]
				; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
				; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
				; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP17:![0-9]+]]
				; CHECK: inner.exit:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]
				; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%wide.m = zext i32 %m to i64
				%wide.n = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ 0, %entry ], [ %outer.iv.next, %inner.exit ]
				%nplus1 = add nuw nsw i32 %n, 1
				%wide.nplus1 = zext i32 %nplus1 to i64
				%0 = mul nsw i64 %outer.iv, %wide.n
				%1 = mul nsw i64 %outer.iv, %wide.nplus1
				br label %inner.loop

				inner.loop:
				%iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
				%2 = add nuw nsw i64 %iv.inner, %0
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %2
				%3 = load i32, ptr %arrayidx.us, align 4
				%4 = add nuw nsw i64 %iv.inner, %1
				%arrayidx8.us = getelementptr inbounds i32, ptr %dst, i64 %4
				%5 = load i32, ptr %arrayidx8.us, align 4
				%add9.us = add nsw i32 %5, %3
				store i32 %add9.us, ptr %arrayidx8.us, align 4
				%iv.inner.next = add nuw nsw i64 %iv.inner, 1
				%inner.exit.cond = icmp eq i64 %iv.inner.next, %wide.n
				br i1 %inner.exit.cond, label %inner.exit, label %inner.loop

				inner.exit:
				%outer.iv.next = add nuw nsw i64 %outer.iv, 1
				%outer.exit.cond = icmp eq i64 %outer.iv.next, %wide.m
				br i1 %outer.exit.cond, label %outer.exit, label %outer.loop

				outer.exit:
				ret void
				}


				; Equivalent example in C:
				; void diff_checks_src_start_invariant(int32_t dst, int32_t src, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * n) + j] = src[j];
				; }
				; }
				; }

				; DEBUG-LABEL: LAA: Found a loop in diff_checks_src_start_invariant:

				define void @diff_checks_src_start_invariant(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
				; CHECK-LABEL: define void @diff_checks_src_start_invariant
				; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SRC2:%.*]] = ptrtoint ptr [[SRC]] to i64
				; CHECK-NEXT: [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[WIDE_N]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_LOOP_EXIT:%.]] ]
				; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], [[IV_OUTER]]
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[DST1]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[TMP5:%.*]] = sub i64 [[TMP3]], [[SRC2]]
				; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP5]], 16
				; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP8]], align 4
				; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP4]]
				; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0
				; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP11]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]
				; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP4]]
				; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]
				; CHECK-NEXT: store i32 [[TMP13]], ptr [[ARRAYIDX6_US]], align 4
				; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
				; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
				; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP19:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
				; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
				; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.loop.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = zext i32 %n to i64
				%wide.m = zext i32 %m to i64
				%wide.n = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %inner.loop.exit ]
				%1 = mul nsw i64 %iv.outer, %0
				br label %inner.loop

				inner.loop:
				%iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %iv.inner
				%2 = load i32, ptr %arrayidx.us, align 4
				%3 = add nuw nsw i64 %iv.inner, %1
				%arrayidx6.us = getelementptr inbounds i32, ptr %dst, i64 %3
				store i32 %2, ptr %arrayidx6.us, align 4
				%iv.inner.next = add nuw nsw i64 %iv.inner, 1
				%inner.exit.cond = icmp eq i64 %iv.inner.next, %wide.n
				br i1 %inner.exit.cond, label %inner.loop.exit, label %inner.loop

				inner.loop.exit:
				%iv.outer.next = add nuw nsw i64 %iv.outer, 1
				%outer.exit.cond = icmp eq i64 %iv.outer.next, %wide.m
				br i1 %outer.exit.cond, label %outer.loop.exit, label %outer.loop

				outer.loop.exit:
				ret void
				}


				; Equivalent example in C:
				; void full_checks_src_start_invariant(int32_t dst, int32_t src, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * n) + j] += src[j];
				; }
				; }
				; }

				; DEBUG-LABEL: LAA: Found a loop in full_checks_src_start_invariant:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %n to i64))<nuw><nsw> + %src)

				define void @full_checks_src_start_invariant(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
				; CHECK-LABEL: define void @full_checks_src_start_invariant
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP2]]
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_LOOP_EXIT:%.]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[IV_OUTER]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], [[TMP3]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP8]], align 4, !alias.scope !20
				; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP5]]
				; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP11]], align 4, !alias.scope !23, !noalias !20
				; CHECK-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[WIDE_LOAD3]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP12]], ptr [[TMP11]], align 4, !alias.scope !23, !noalias !20
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]
				; CHECK-NEXT: [[TMP14:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP15:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP5]]
				; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP15]]
				; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX6_US]], align 4
				; CHECK-NEXT: [[ADD7_US:%.*]] = add nsw i32 [[TMP16]], [[TMP14]]
				; CHECK-NEXT: store i32 [[ADD7_US]], ptr [[ARRAYIDX6_US]], align 4
				; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
				; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
				; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP26:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
				; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
				; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.loop.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = zext i32 %n to i64
				%wide.m = zext i32 %m to i64
				%wide.n = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %inner.loop.exit ]
				%1 = mul nsw i64 %iv.outer, %0
				br label %inner.loop

				inner.loop:
				%iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %iv.inner
				%2 = load i32, ptr %arrayidx.us, align 4
				%3 = add nuw nsw i64 %iv.inner, %1
				%arrayidx6.us = getelementptr inbounds i32, ptr %dst, i64 %3
				%4 = load i32, ptr %arrayidx6.us, align 4
				%add7.us = add nsw i32 %4, %2
				store i32 %add7.us, ptr %arrayidx6.us, align 4
				%iv.inner.next = add nuw nsw i64 %iv.inner, 1
				%inner.exit.cond = icmp eq i64 %iv.inner.next, %wide.n
				br i1 %inner.exit.cond, label %inner.loop.exit, label %inner.loop

				inner.loop.exit:
				%iv.outer.next = add nuw nsw i64 %iv.outer, 1
				%outer.exit.cond = icmp eq i64 %iv.outer.next, %wide.m
				br i1 %outer.exit.cond, label %outer.loop.exit, label %outer.loop

				outer.loop.exit:
				ret void
				}


				; Equivalent example in C:
				; void triple_nested_loop_mixed_access(int dst, int src, int m, int n, int o) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; for (int l = 0; l < o; l++) {
				; dst[(i * n * (o + 1)) + (j * o) + l] += src[(i * n * o) + l];
				; }
				; }
				; }
				; }


				; DEBUG-LABEL: LAA: Found a loop in triple_nested_loop_mixed_access:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {{[{][{]}}%dst,+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop>,+,(4 * (zext i32 %o to i64))<nuw><nsw>}<%outer.loop> End: {{[{][{]}}((4 * (zext i32 %o to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop>,+,(4 * (zext i32 %o to i64))<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop> End: {((4 * (zext i32 %o to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop>

				define void @triple_nested_loop_mixed_access(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n, i32 noundef %o) {
				; CHECK-LABEL: define void @triple_nested_loop_mixed_access
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]], i32 noundef [[O:%.*]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[O]], 1
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[O]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[ADD11]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT68:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT60:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[O]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
				; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
				; CHECK-NEXT: br label [[OUTER_OUTER_LOOP:%.*]]
				; CHECK: outer.outer.loop:
				; CHECK-NEXT: [[OUTER_OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_OUTER_IV_NEXT:%.]], [[OUTER_LOOP_END:%.]] ]
				; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP4]], [[OUTER_OUTER_IV]]
				; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[TMP6]], [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP8]], [[OUTER_OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP11]]
				; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP6]], [[TMP11]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
				; CHECK-NEXT: [[TMP13:%.*]] = mul nsw i64 [[OUTER_OUTER_IV]], [[TMP1]]
				; CHECK-NEXT: [[TMP14:%.*]] = mul nsw i64 [[TMP13]], [[TMP0]]
				; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP13]], [[TMP2]]
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_END:%.*]] ], [ 0, [[OUTER_OUTER_LOOP]] ]
				; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP5]], [[OUTER_IV]]
				; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[TMP9]], [[TMP16]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP17]]
				; CHECK-NEXT: [[TMP18:%.*]] = add i64 [[TMP10]], [[TMP16]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP18]]
				; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
				; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[TMP19]], [[TMP15]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP22:%.*]] = add nuw nsw i64 [[TMP21]], [[TMP14]]
				; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP22]]
				; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP24]], align 4, !alias.scope !27
				; CHECK-NEXT: [[TMP25:%.*]] = add nuw nsw i64 [[TMP20]], [[TMP21]]
				; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP25]]
				; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP27]], align 4, !alias.scope !30, !noalias !27
				; CHECK-NEXT: [[TMP28:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP28]], ptr [[TMP27]], align 4, !alias.scope !30, !noalias !27
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_END]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[TMP30:%.*]] = add nuw nsw i64 [[INNER_IV]], [[TMP14]]
				; CHECK-NEXT: [[ARRAYIDX_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP30]]
				; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX_US_US_US]], align 4
				; CHECK-NEXT: [[TMP32:%.*]] = add nuw nsw i64 [[TMP20]], [[INNER_IV]]
				; CHECK-NEXT: [[ARRAYIDX17_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP32]]
				; CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX17_US_US_US]], align 4
				; CHECK-NEXT: [[ADD18_US_US_US:%.*]] = add nsw i32 [[TMP33]], [[TMP31]]
				; CHECK-NEXT: store i32 [[ADD18_US_US_US]], ptr [[ARRAYIDX17_US_US_US]], align 4
				; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_END]], label [[INNER_LOOP]], !llvm.loop [[LOOP33:![0-9]+]]
				; CHECK: inner.loop.end:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[EXIT_OUTER:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT60]]
				; CHECK-NEXT: br i1 [[EXIT_OUTER]], label [[OUTER_LOOP_END]], label [[OUTER_LOOP]]
				; CHECK: outer.loop.end:
				; CHECK-NEXT: [[OUTER_OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_OUTER_IV]], 1
				; CHECK-NEXT: [[EXIT_OUTER_OUTER:%.*]] = icmp eq i64 [[OUTER_OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT68]]
				; CHECK-NEXT: br i1 [[EXIT_OUTER_OUTER]], label [[EXIT:%.*]], label [[OUTER_OUTER_LOOP]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%add11 = add nsw i32 %o, 1
				%0 = zext i32 %o to i64
				%1 = zext i32 %n to i64
				%2 = zext i32 %add11 to i64
				%wide.trip.count68 = zext i32 %m to i64
				%wide.trip.count60 = zext i32 %n to i64
				%wide.trip.count = zext i32 %o to i64
				br label %outer.outer.loop

				outer.outer.loop:
				%outer.outer.iv = phi i64 [ 0, %entry ], [ %outer.outer.iv.next, %outer.loop.end ]
				%3 = mul nsw i64 %outer.outer.iv, %1
				%4 = mul nsw i64 %3, %0
				%5 = mul nsw i64 %3, %2
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ %outer.iv.next, %inner.loop.end ], [ 0, %outer.outer.loop ]
				%6 = mul nsw i64 %outer.iv, %0
				%7 = add nuw nsw i64 %6, %5
				br label %inner.loop

				inner.loop:
				%inner.iv = phi i64 [ %inner.iv.next, %inner.loop ], [ 0, %outer.loop ]
				%8 = add nuw nsw i64 %inner.iv, %4
				fhahnUnsubmitted Not Done Reply Inline Actions nit: remove redundant indvars prefixes. fhahn: nit: remove redundant indvars prefixes.
				%arrayidx.us.us.us = getelementptr inbounds i32, ptr %src, i64 %8
				%9 = load i32, ptr %arrayidx.us.us.us, align 4
				%10 = add nuw nsw i64 %7, %inner.iv
				%arrayidx17.us.us.us = getelementptr inbounds i32, ptr %dst, i64 %10
				%11 = load i32, ptr %arrayidx17.us.us.us, align 4
				%add18.us.us.us = add nsw i32 %11, %9
				store i32 %add18.us.us.us, ptr %arrayidx17.us.us.us, align 4
				%inner.iv.next = add nuw nsw i64 %inner.iv, 1
				%exitcond.not = icmp eq i64 %inner.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %inner.loop.end, label %inner.loop

				inner.loop.end:
				%outer.iv.next = add nuw nsw i64 %outer.iv, 1
				%exit.outer = icmp eq i64 %outer.iv.next, %wide.trip.count60
				br i1 %exit.outer, label %outer.loop.end, label %outer.loop

				outer.loop.end:
				%outer.outer.iv.next = add nuw nsw i64 %outer.outer.iv, 1
				%exit.outer.outer = icmp eq i64 %outer.outer.iv.next, %wide.trip.count68
				br i1 %exit.outer.outer, label %exit, label %outer.outer.loop

				exit:
				ret void
				}


				; Equivalent example in C:
				; void uncomputable_outer_tc(int32_t dst, int32_t src, char *str, int n) {
				; int i;
				; while (str[i] != '\0') {
				; for (int j = 0; j < n; j++) {
				; dst[(i * (n + 1)) + j] += src[(i * n) + j];
				; }
				; i++;
				; }
				; }

				; DEBUG-LABEL: LAA: Found a loop in uncomputable_outer_tc:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>

				define void @uncomputable_outer_tc(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, ptr nocapture noundef readonly %str, i32 noundef %n) {
				; CHECK-LABEL: define void @uncomputable_outer_tc
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], ptr nocapture noundef readonly [[STR:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[STR]], align 1
				; CHECK-NEXT: [[CMP_NOT23:%.*]] = icmp ne i8 [[TMP0]], 0
				; CHECK-NEXT: [[CMP221:%.*]] = icmp sgt i32 [[N]], 0
				; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP_NOT23]], [[CMP221]]
				; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PREHEADER:%.]], label [[WHILE_END:%.]]
				; CHECK: outer.loop.preheader:
				; CHECK-NEXT: [[ADD6:%.*]] = add nuw nsw i32 [[N]], 1
				; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[ADD6]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PREHEADER]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
				; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP3]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP6]]
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP4]], [[TMP6]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP5]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[TMP4]], [[TMP8]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP10:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP1]]
				; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP13:%.*]] = add nsw i64 [[TMP12]], [[TMP10]]
				; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP13]]
				; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !34
				; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP12]], [[TMP11]]
				; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !37, !noalias !34
				; CHECK-NEXT: [[TMP19:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP19]], ptr [[TMP18]], align 4, !alias.scope !37, !noalias !34
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[INNER_IV]], [[TMP10]]
				; CHECK-NEXT: [[ARRAYIDX5_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP21]]
				; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX5_US]], align 4
				; CHECK-NEXT: [[TMP23:%.*]] = add nsw i64 [[INNER_IV]], [[TMP11]]
				; CHECK-NEXT: [[ARRAYIDX10_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]
				; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[ARRAYIDX10_US]], align 4
				; CHECK-NEXT: [[ADD11_US:%.*]] = add nsw i32 [[TMP24]], [[TMP22]]
				; CHECK-NEXT: store i32 [[ADD11_US]], ptr [[ARRAYIDX10_US]], align 4
				; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP40:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i8, ptr [[STR]], i64 [[OUTER_IV_NEXT]]
				; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[ARRAYIDX_US]], align 1
				; CHECK-NEXT: [[CMP_NOT_US:%.*]] = icmp eq i8 [[TMP25]], 0
				; CHECK-NEXT: br i1 [[CMP_NOT_US]], label [[WHILE_END_LOOPEXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: while.end.loopexit:
				; CHECK-NEXT: br label [[WHILE_END]]
				; CHECK: while.end:
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = load i8, ptr %str, align 1
				%cmp.not23 = icmp ne i8 %0, 0
				%cmp221 = icmp sgt i32 %n, 0
				%or.cond = and i1 %cmp.not23, %cmp221
				br i1 %or.cond, label %outer.loop.preheader, label %while.end

				outer.loop.preheader:
				%add6 = add nuw nsw i32 %n, 1
				%1 = zext i32 %n to i64
				%2 = zext i32 %add6 to i64
				%wide.trip.count = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ 0, %outer.loop.preheader ], [ %outer.iv.next, %inner.loop.exit ]
				%3 = mul nsw i64 %outer.iv, %1
				%4 = mul nsw i64 %outer.iv, %2
				br label %inner.loop

				inner.loop:
				%inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
				%5 = add nsw i64 %inner.iv, %3
				%arrayidx5.us = getelementptr inbounds i32, ptr %src, i64 %5
				%6 = load i32, ptr %arrayidx5.us, align 4
				%7 = add nsw i64 %inner.iv, %4
				%arrayidx10.us = getelementptr inbounds i32, ptr %dst, i64 %7
				%8 = load i32, ptr %arrayidx10.us, align 4
				%add11.us = add nsw i32 %8, %6
				store i32 %add11.us, ptr %arrayidx10.us, align 4
				%inner.iv.next = add nuw nsw i64 %inner.iv, 1
				%exitcond.not = icmp eq i64 %inner.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %inner.loop.exit, label %inner.loop

				inner.loop.exit:
				%outer.iv.next = add i64 %outer.iv, 1
				%arrayidx.us = getelementptr inbounds i8, ptr %str, i64 %outer.iv.next
				%9 = load i8, ptr %arrayidx.us, align 1
				%cmp.not.us = icmp eq i8 %9, 0
				br i1 %cmp.not.us, label %while.end, label %outer.loop

				while.end:
				ret void
				}


				; Equivalent example in C:
				; void decreasing_inner_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = n; j >= 0; j--) {
				; dst[(i * stride1) + j] += src[(i * stride2) + j];
				; }
				; }
				; }

				; DEBUG-LABEL: LAA: Found a loop in decreasing_inner_iv:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop> End: {(4 + (4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop> End: {(4 + (4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop>

				define void @decreasing_inner_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
				; CHECK-LABEL: define void @decreasing_inner_iv
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[M]], 0
				; CHECK-NEXT: [[CMP218:%.*]] = icmp sgt i32 [[N]], -1
				; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP20]], [[CMP218]]
				; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PRE:%.]], label [[EXIT:%.]]
				; CHECK: outer.loop.pre:
				; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE2]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP0]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 4
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP1]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP0]], 1
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
				; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP3]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[TMP5]], [[TMP8]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP6]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP10]]
				; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[TMP5]], [[TMP10]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP11]]
				; CHECK-NEXT: [[TMP12:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP1]]
				; CHECK-NEXT: [[TMP13:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP7]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP7]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP7]], [[N_MOD_VF]]
				; CHECK-NEXT: [[IND_END:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[TMP0]], [[INDEX]]
				; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[OFFSET_IDX]], 0
				; CHECK-NEXT: [[TMP15:%.*]] = add nsw i64 [[TMP14]], [[TMP12]]
				; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP15]]
				; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0
				; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 -3
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !41
				; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				; CHECK-NEXT: [[TMP19:%.*]] = add nsw i64 [[TMP14]], [[TMP13]]
				; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP19]]
				; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i32 0
				; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 -3
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !44, !noalias !41
				; CHECK-NEXT: [[REVERSE5:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD4]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[REVERSE5]], [[REVERSE]]
				; CHECK-NEXT: [[REVERSE6:%.*]] = shufflevector <4 x i32> [[TMP23]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				; CHECK-NEXT: store <4 x i32> [[REVERSE6]], ptr [[TMP22]], align 4, !alias.scope !44, !noalias !41
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP46:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP7]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP0]], [[OUTER_LOOP]] ], [ [[TMP0]], [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP25:%.*]] = add nsw i64 [[INNER_IV]], [[TMP12]]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP25]]
				; CHECK-NEXT: [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP27:%.*]] = add nsw i64 [[INNER_IV]], [[TMP13]]
				; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP27]]
				; CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP28]], [[TMP26]]
				; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
				; CHECK-NEXT: [[INNER_IV_NEXT]] = add nsw i64 [[INNER_IV]], -1
				; CHECK-NEXT: [[CMP2_US:%.*]] = icmp sgt i64 [[INNER_IV]], 0
				; CHECK-NEXT: br i1 [[CMP2_US]], label [[INNER_LOOP]], label [[INNER_LOOP_EXIT]], !llvm.loop [[LOOP47:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: outer.loop.exit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp20 = icmp sgt i32 %m, 0
				%cmp218 = icmp sgt i32 %n, -1
				%or.cond = and i1 %cmp20, %cmp218
				br i1 %or.cond, label %outer.loop.pre, label %exit

				outer.loop.pre:
				%0 = zext i32 %n to i64
				%1 = sext i32 %stride2 to i64
				%2 = sext i32 %stride1 to i64
				%wide.trip.count = zext i32 %m to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ 0, %outer.loop.pre ], [ %outer.iv.next, %inner.loop.exit ]
				%3 = mul nsw i64 %outer.iv, %1
				%4 = mul nsw i64 %outer.iv, %2
				br label %inner.loop

				inner.loop:
				%inner.iv = phi i64 [ %0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
				%5 = add nsw i64 %inner.iv, %3
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %5
				%6 = load i32, ptr %arrayidx.us, align 4
				%7 = add nsw i64 %inner.iv, %4
				%arrayidx8.us = getelementptr inbounds i32, ptr %dst, i64 %7
				%8 = load i32, ptr %arrayidx8.us, align 4
				%add9.us = add nsw i32 %8, %6
				store i32 %add9.us, ptr %arrayidx8.us, align 4
				%inner.iv.next = add nsw i64 %inner.iv, -1
				%cmp2.us = icmp sgt i64 %inner.iv, 0
				br i1 %cmp2.us, label %inner.loop, label %inner.loop.exit

				inner.loop.exit:
				%outer.iv.next = add nuw nsw i64 %outer.iv, 1
				%exitcond.not = icmp eq i64 %outer.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %outer.loop.exit, label %outer.loop

				outer.loop.exit:
				br label %exit

				exit:
				ret void
				}


				; Equivalent example in C:
				; void decreasing_outer_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {
				; for (int i = m - 1; i >= 0; i--) {
				; for (int j = 0; j <= n; j++) {
				; dst[(i * stride1) + j] += src[(i * stride2) + j];
				; }
				; }
				; }

				; DEBUG-LABEL: LAA: Found a loop in decreasing_outer_iv:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {((4 * (zext i32 %m to i64) * (sext i32 %stride1 to i64)) + %dst),+,(-4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop> End: {((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (zext i32 %m to i64) * (sext i32 %stride1 to i64)) + %dst),+,(-4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {((4 * (zext i32 %m to i64) * (sext i32 %stride2 to i64)) + %src),+,(-4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop> End: {((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (zext i32 %m to i64) * (sext i32 %stride2 to i64)) + %src),+,(-4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop>

				define void @decreasing_outer_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
				; CHECK-LABEL: define void @decreasing_outer_iv
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP21:%.*]] = icmp slt i32 [[M]], 1
				; CHECK-NEXT: [[CMP2_NOT18:%.*]] = icmp slt i32 [[N]], 0
				; CHECK-NEXT: [[OR_COND:%.*]] = or i1 [[CMP21]], [[CMP2_NOT18]]
				; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT:%.]], label [[OUTER_LOOP_PRE:%.]]
				; CHECK: outer.loop.pre:
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], 1
				; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[STRIDE2]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[TMP0]] to i64
				; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP2]], -4
				; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP5]], [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP3]], [[TMP1]]
				; CHECK-NEXT: [[TMP10:%.*]] = shl i64 [[TMP9]], 2
				; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP3]], -4
				; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP10]], [[TMP7]]
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[INDVAR:%.]] = phi i64 [ [[INDVAR_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ], [ 0, [[OUTER_LOOP_PRE]] ]
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[TMP1]], [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT]] ]
				; CHECK-NEXT: [[TMP13:%.*]] = mul i64 [[TMP6]], [[INDVAR]]
				; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[TMP5]], [[TMP13]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
				; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[TMP8]], [[TMP13]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP15]]
				; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP11]], [[INDVAR]]
				; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[TMP10]], [[TMP16]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP17]]
				; CHECK-NEXT: [[TMP18:%.*]] = add i64 [[TMP12]], [[TMP16]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP18]]
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nsw i64 [[OUTER_IV]], -1
				; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]
				; CHECK-NEXT: [[TMP20:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP22:%.*]] = add nsw i64 [[TMP21]], [[TMP19]]
				; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP22]]
				; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP24]], align 4, !alias.scope !48
				; CHECK-NEXT: [[TMP25:%.*]] = add nsw i64 [[TMP21]], [[TMP20]]
				; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP25]]
				; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP27]], align 4, !alias.scope !51, !noalias !48
				; CHECK-NEXT: [[TMP28:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP28]], ptr [[TMP27]], align 4, !alias.scope !51, !noalias !48
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP53:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP30:%.*]] = add nsw i64 [[INNER_IV]], [[TMP19]]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP30]]
				; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TMP32:%.*]] = add nsw i64 [[INNER_IV]], [[TMP20]]
				; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP32]]
				; CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX8]], align 4
				; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP33]], [[TMP31]]
				; CHECK-NEXT: store i32 [[ADD9]], ptr [[ARRAYIDX8]], align 4
				; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP54:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[INDVAR_NEXT]] = add i64 [[INDVAR]], 1
				; CHECK-NEXT: br i1 [[CMP]], label [[OUTER_LOOP]], label [[OUTER_LOOP_EXIT:%.*]]
				; CHECK: outer.loop.exit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp21 = icmp slt i32 %m, 1
				%cmp2.not18 = icmp slt i32 %n, 0
				%or.cond = or i1 %cmp21, %cmp2.not18
				br i1 %or.cond, label %exit, label %outer.loop.pre

				outer.loop.pre:
				%0 = add nuw i32 %n, 1
				%1 = zext i32 %m to i64
				%2 = sext i32 %stride1 to i64
				%3 = sext i32 %stride2 to i64
				%wide.trip.count = zext i32 %0 to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ %1, %outer.loop.pre ], [ %outer.iv.next, %inner.loop.exit ]
				%outer.iv.next = add nsw i64 %outer.iv, -1
				%4 = mul nsw i64 %outer.iv, %3
				%5 = mul nsw i64 %outer.iv, %2
				br label %inner.loop

				inner.loop:
				%inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
				%6 = add nsw i64 %inner.iv, %4
				%arrayidx = getelementptr inbounds i32, ptr %src, i64 %6
				%7 = load i32, ptr %arrayidx, align 4
				%8 = add nsw i64 %inner.iv, %5
				%arrayidx8 = getelementptr inbounds i32, ptr %dst, i64 %8
				%9 = load i32, ptr %arrayidx8, align 4
				%add9 = add nsw i32 %9, %7
				store i32 %add9, ptr %arrayidx8, align 4
				%inner.iv.next = add nuw nsw i64 %inner.iv, 1
				%exitcond.not = icmp eq i64 %inner.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %inner.loop.exit, label %inner.loop

				inner.loop.exit:
				%cmp = icmp sgt i64 %outer.iv, 1
				br i1 %cmp, label %outer.loop, label %outer.loop.exit

				outer.loop.exit:
				br label %exit

				exit:
				ret void
				}


				; Equivalent example in C:
				; void foo(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {
				; for (int i = 0; i < m; i++) {
				; for (int j = 0; j < n; j++) {
				; dst[(i * (n + 1)) + (j * stride1)] += src[(i * n) + (j * stride2)];
				; }
				; }
				; }


				; DEBUG-LABEL: LAA: Found a loop in unknown_inner_stride:
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop>
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>

				define void @unknown_inner_stride(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
				; CHECK-LABEL: define void @unknown_inner_stride
				; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP26:%.*]] = icmp sgt i32 [[M]], 0
				; CHECK-NEXT: [[CMP224:%.*]] = icmp sgt i32 [[N]], 0
				; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP26]], [[CMP224]]
				; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: outer.loop.preheader:
				; CHECK-NEXT: [[ADD6:%.*]] = add nuw nsw i32 [[N]], 1
				; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[STRIDE2]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE1]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[ADD6]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT39:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
				; CHECK: outer.loop:
				; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PREHEADER]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP4]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP5]], [[TMP7]]
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP6]], [[OUTER_IV]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[TMP5]], [[TMP9]]
				; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP10]]
				; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
				; CHECK-NEXT: [[TMP12:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
				; CHECK: vector.scevcheck:
				; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[STRIDE1]], 1
				; CHECK-NEXT: [[IDENT_CHECK1:%.*]] = icmp ne i32 [[STRIDE2]], 1
				; CHECK-NEXT: [[TMP13:%.*]] = or i1 [[IDENT_CHECK]], [[IDENT_CHECK1]]
				; CHECK-NEXT: br i1 [[TMP13]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP4]]
				; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP14]], [[TMP0]]
				; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP15]], [[TMP11]]
				; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !55
				; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[TMP14]], [[TMP1]]
				; CHECK-NEXT: [[TMP20:%.*]] = add nsw i64 [[TMP19]], [[TMP12]]
				; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]
				; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !58, !noalias !55
				; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD5]], [[WIDE_LOAD]]
				; CHECK-NEXT: store <4 x i32> [[TMP23]], ptr [[TMP22]], align 4, !alias.scope !58, !noalias !55
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP60:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
				; CHECK: inner.loop:
				; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
				; CHECK-NEXT: [[TMP25:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP0]]
				; CHECK-NEXT: [[TMP26:%.*]] = add nsw i64 [[TMP25]], [[TMP11]]
				; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP26]]
				; CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
				; CHECK-NEXT: [[TMP28:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP1]]
				; CHECK-NEXT: [[TMP29:%.*]] = add nsw i64 [[TMP28]], [[TMP12]]
				; CHECK-NEXT: [[ARRAYIDX11_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP29]]
				; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[ARRAYIDX11_US]], align 4
				; CHECK-NEXT: [[ADD12_US:%.*]] = add nsw i32 [[TMP30]], [[TMP27]]
				; CHECK-NEXT: store i32 [[ADD12_US]], ptr [[ARRAYIDX11_US]], align 4
				; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP61:![0-9]+]]
				; CHECK: inner.loop.exit:
				; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
				; CHECK-NEXT: [[EXITCOND40_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT39]]
				; CHECK-NEXT: br i1 [[EXITCOND40_NOT]], label [[EXIT_LOOPEXIT:%.*]], label [[OUTER_LOOP]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp26 = icmp sgt i32 %m, 0
				%cmp224 = icmp sgt i32 %n, 0
				%or.cond = and i1 %cmp26, %cmp224
				br i1 %or.cond, label %outer.loop.preheader, label %exit

				outer.loop.preheader:
				%add6 = add nuw nsw i32 %n, 1
				%0 = sext i32 %stride2 to i64
				%1 = sext i32 %stride1 to i64
				%2 = zext i32 %n to i64
				%3 = zext i32 %add6 to i64
				%wide.trip.count39 = zext i32 %m to i64
				%wide.trip.count = zext i32 %n to i64
				br label %outer.loop

				outer.loop:
				%outer.iv = phi i64 [ 0, %outer.loop.preheader ], [ %outer.iv.next, %inner.loop.exit ]
				%4 = mul nsw i64 %outer.iv, %2
				%5 = mul nsw i64 %outer.iv, %3
				br label %inner.loop

				inner.loop:
				%inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
				%6 = mul nsw i64 %inner.iv, %0
				%7 = add nsw i64 %6, %4
				%arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %7
				%8 = load i32, ptr %arrayidx.us, align 4
				%9 = mul nsw i64 %inner.iv, %1
				%10 = add nsw i64 %9, %5
				%arrayidx11.us = getelementptr inbounds i32, ptr %dst, i64 %10
				%11 = load i32, ptr %arrayidx11.us, align 4
				%add12.us = add nsw i32 %11, %8
				store i32 %add12.us, ptr %arrayidx11.us, align 4
				%inner.iv.next = add nuw nsw i64 %inner.iv, 1
				%exitcond.not = icmp eq i64 %inner.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %inner.loop.exit, label %inner.loop

				inner.loop.exit:
				%outer.iv.next = add nuw nsw i64 %outer.iv, 1
				%exitcond40.not = icmp eq i64 %outer.iv.next, %wide.trip.count39
				br i1 %exitcond40.not, label %exit, label %outer.loop

				exit:
				ret void
				}