This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
1/1
LoopAccessAnalysis.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
1/2
LoopAccessAnalysis.cpp
-
Transforms/
-
Utils/
10/12
LoopUtils.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
2/4
runtime-checks-hoist.ll

Differential D152366

[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
ClosedPublic

Authored by david-arm on Jun 7 2023, 5:11 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
fhahn
kmclaughlin
dmgreen
paulwalker-arm

Commits

rGc02184f286d1: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer…

Summary

Suppose we have a nested loop like this:

void foo(int32_t *dst, int32_t *src, int m, int n) {
  for (int i = 0; i < m; i++) {
    for (int j = 0; j < n; j++) {
      dst[(i * n) + j] += src[(i * n) + j];
    }
  }
}

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

&dst[(i * n)] -> &dst[(i * n) + n]
&src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

&dst[0] -> &dst[((m - 1) * n) + n]
&src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I suspect the omnetpp and xz differences are noise, but I know the
x264 improvement is real because it has some hot nested loops
with low trip counts where I can see this hoisting is beneficial.

Tests have been added here:

Transforms/LoopVectorize/runtime-checks-hoist.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Jun 7 2023, 5:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 5:11 AM

Herald added subscribers: wlei, shiva0217, StephenFan and 2 others. · View Herald Transcript

david-arm requested review of this revision.Jun 7 2023, 5:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 5:11 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

david-arm added inline comments.Jun 7 2023, 5:17 AM

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
24	Hmm, I just noticed these CHECK lines look wrong, since we still create the diff checks here. I'll fix this in a new patch.

Removed the target-specific attributes from the test and fixed the DEBUG check lines to be more accurate.

Harbormaster completed remote builds in B237249: Diff 529272.Jun 7 2023, 6:47 AM

Gentle ping. :) Hi @fhahn, I don't suppose you have any thoughts about the approach I've taken in this patch, such as whether or not I've missed something critical?

Herald added a subscriber: artagnon. · View Herald TranscriptJun 19 2023, 6:32 AM

Does this require the extra-vectorizer-passes to hoist the checks out of the loop in the full pipeline? Do you know if we already have sufficient runtime tests in llvm-test-suite for this kind of code? If not, it would be good to add some, perhaps to https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/Vectorizer/runtime-checks.cpp.

Also, could you clean up the block & value names in the tests a bit so they are easier to read and land them separately?

Ayal added a subscriber: Ayal.Jun 27 2023, 11:27 AM

Ayal added inline comments.

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
147	This raises a thought, which may deserve a separate patch: In this example the runtime checks should be optimized into a form independent of the enclosing i loop and hoisted out of it w/o expansion - and w/o asking a flag, as an obvious loop invariant unswitching opportunity. I.e., instead of checking if intervals dst[in : in+n) and src[in : in + n) intersect, suffice to check if dst[0 : n) and src[0 : n) intersect. When checking if the end SCEV point of one precedes the start SCEV of the other, try to cancel common addends, or check if the SCEV of their difference is positive - subject to wrapping? This also holds for the motivating memcopy example w/o the additional load, which is also worth adding a test for? Perhaps the first diff_checks() test above can serve as a better example where expansion is indeed necessary in order to loop unswitch a (non invariant) runtime check. Wonder if the SPEC2017 significant cases are invariant(?)

In D152366#4450222, @fhahn wrote:

Does this require the extra-vectorizer-passes to hoist the checks out of the loop in the full pipeline? Do you know if we already have sufficient runtime tests in llvm-test-suite for this kind of code? If not, it would be good to add some, perhaps to https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/Vectorizer/runtime-checks.cpp.

Also, could you clean up the block & value names in the tests a bit so they are easier to read and land them separately?

Hi @fhahn, so I can see there are tests in the LLVM test suite that have examples of nested loops where the inner loop runtime checks could be hoisted out with this patch. For example, see MicroBenchmarks/ImageProcessing/Dither/floydDitherKernel.c and MicroBenchmarks/ImageProcessing/Dither/orderedDitherKernel.c. However, I am looking at trying to add some specific tests to SingleSource/UnitTests/Vectorizer/runtime-checks.cpp as well. Thanks for the suggestion!

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
147	Hi @Ayal, that's a good spot and thanks for pointing that out! The tests I wrote don't really bear any resemblance to the specific loops in x264 that benefit from this optimisation, where the inner loop runtime checks are not loop invariant. For example, one loop looks like a bit like this in x264: for(int i = 0; i < m; i++) { for(int j = 0; j < n; j++) out[j] = ... do something with in1[j] + in2[j] ...; out += out_stride; in1 += in1_stride; in2 += in2_stride; } However, what you said makes me wonder if the tests I added in this patch are unreliable, because like you said they may get optimised away in future because they are genuinely invariant. I'll look into adding another test.

Add extra full_checks_diff_strides test to ensure the original inner loop runtime checks were not already loop invariant.

david-arm added a parent revision: D154075: [LoopVectorize] Add pre-commit tests for D152366.Jun 29 2023, 5:38 AM

Hi @fhahn, when vectorising as part of the normal pipeline we always run LICM after the vectoriser which ensures the runtime checks are hoisted out. This is why we see the improvement for x264. However, if we vectorise a loop during the LTO pipeline then you're right that LICM is not currently one. This will probably change with D143631 though, since there are other problems with not running LICM after InstCombine in the LTO pipeline.

I've tried cleaning up the tests, and added a new one too. I'll continue working separately on adding a specific test to the llvm test suite, although obviously the test suite won't be run with this new flag enabled.

Harbormaster completed remote builds in B242061: Diff 535748.Jun 29 2023, 7:05 AM

Hi @fhahn, I've now created some nested loop runtime check tests in the LLVM test suite - see D154719.

Herald added a subscriber: wangpc. · View Herald TranscriptJul 7 2023, 7:06 AM

In D152366#4450222, @fhahn wrote:

Does this require the extra-vectorizer-passes to hoist the checks out of the loop in the full pipeline?

Hi @fhahn, I just realised I forgot to reply to this comment. It does require follow-on passes like LICM to hoist the checks out of the loop, but this does seem to be happening for x264. For both normal and LTO pipelines we do run LICM after the vectoriser, so I think it's fine without the extra-vectorizer-passes flag.

paulwalker-arm added inline comments.Jul 18 2023, 3:23 AM

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
51–52	This doesn't read very well.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
146	Typo, should be "inner".
llvm/lib/Transforms/Utils/LoopUtils.cpp
1654	"This is why the"
1658	Given you evaluate `High` using the outer loop's exit count shouldn't you also check the following? cast<SCEVAddRecExpr>(High)->getLoop() == TheLoop->getParentLoop()

Addressed review comments.

david-arm marked 4 inline comments as done.Jul 19 2023, 7:21 AM

Harbormaster completed remote builds in B246551: Diff 542014.Jul 19 2023, 9:32 AM

paulwalker-arm accepted this revision.Jul 19 2023, 9:36 AM

paulwalker-arm added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
353	This space is redundant because you've already got one at the end of the previous string.
llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
152	Up to you but long `NOT` lines like this are fragile. I think you'd be better off adding another `DEBUG` line to tryToCreateDiffCheck for the case when a diff check is created and then explicitly checking for the specific string you expect to see.

This revision is now accepted and ready to land.Jul 19 2023, 9:36 AM

fhahn added inline comments.Jul 19 2023, 10:28 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1658	Could you add a test case for this case?
1666	Do we need to check if the AddRecs here don't wrap? If it would wrap, could NewHigh < NewLow?

david-arm added inline comments.Jul 21 2023, 5:35 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1658	I can have a look, but the scenario you're thinking about would require 3 levels of nesting. I didn't originally add a test for that because I was worried about introducing so many increasingly complicated negative test cases.
1666	That's an interesting point! So I've looked into this and in order to even get to this point we must have already called evaluateAtIteration once already to get the original High value. This happens in `RuntimePointerChecking::insert`, which does something similar: const SCEV Ex = PSE.getBackedgeTakenCount(); ScStart = AR->getStart(); ScEnd = AR->evaluateAtIteration(Ex, SE); I can't see any existing code that worries about ScEnd being < ScStart here and all I'm doing is performing the same evaluation for the outer loop backedge count. It's quite difficult to follow the whole paper trail of complex code in LoopAccessAnalysis, but it seems that AccessAnalysis::createCheckForAccess only worries about overflow if we failed a dependency check and retry the runtime memory checks. Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not?

Rebased off new triple nested loop test and add TODO

david-arm marked an inline comment as done.Jul 21 2023, 7:09 AM

Harbormaster completed remote builds in B247198: Diff 542900.Jul 21 2023, 11:03 AM

fhahn added inline comments.Jul 25 2023, 12:40 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1666	Yeah it looks like this is handled a bit inconsistently at the moment. One other potential case that might be missed here is when the outer induction is decreasing. In that case I think the new high and lows would be swapped? Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not? What is your plan in general to enable the flag by default? I think we want to avoid adding this behind a flag and not working towards enabling it by default in the near term;

david-arm added inline comments.Jul 25 2023, 1:23 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1666	So I would like to enable this by default in the near term after running a sufficient number of benchmarks on at least AArch64 targets. Having the flag in early though allows other people to test it or make use of it until it's enabled. Since it's under a flag, if we do enable it by default it's easy to turn off again should any problems arise. With regards to decreasing outer induction variables I think I tried this and the hoisting doesn't happen, but I can double check.

Added support for cases where the stride of the outer loop memory accesses is negative. In such cases we may have to add an extra runtime check that the stride is positive because otherwise the range expansion will be incorrect.

david-arm marked an inline comment as done.Jul 28 2023, 6:58 AM

david-arm added inline comments.

llvm/lib/Transforms/Utils/LoopUtils.cpp
1666	I believe I've now addressed this problem by checking if the stride is known non-negative. If we can't prove it's not negative then we need additional checks for positive strides. The extra check doesn't seem to hurt the x264 benchmark and I still see a nice improvement.

Harbormaster completed remote builds in B248835: Diff 545141.Jul 28 2023, 8:04 AM

paulwalker-arm added inline comments.Aug 3 2023, 10:29 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1762	Should this be `IsNegativeStride` given you're planting `A.StrideToCheck < 0`? Perhaps it's worth giving the node a name like "stride.check"?

Renamed IsPositiveStride -> IsNegativeStride and given IR variables more meaningful names.

david-arm marked an inline comment as done.Aug 4 2023, 7:39 AM

paulwalker-arm accepted this revision.Aug 4 2023, 8:28 AM

Harbormaster completed remote builds in B250318: Diff 547207.Aug 4 2023, 8:33 AM

fhahn added inline comments.Aug 6 2023, 1:08 PM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1666	IIRC for regular runtime checks we create SCEV expressions for the minimum and maximum directly (something like `Start = SE->getUMinExpr(Start, End);`). Could this be used here as well?

david-arm added inline comments.Aug 21 2023, 7:55 AM

llvm/lib/Transforms/Utils/LoopUtils.cpp
1666	That's a good suggestion. Do you mean something like: TmpLow = umin(Low, High); TmpHigh = umax(Low, High); Low = TmpLow; High = TmpHigh; ? If so I gave that a try and it comes at a price, since for my simple test case: void foo(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j <= n; j++) { dst[(i * stride1) + j] += src[(i * stride2) + j]; } } } we end up with 2 extra instructions for the runtime checks in the preheader when building for SVE. For loops with short trip counts like x264 this does matter. Given this is just an initial patch and likely to be a work in progress are you happy with me landing this version as is for now? We can always revisit this in future if we can figure out a better way of doing this that's just as efficient.

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 21 2023, 7:55 AM

Matt added a subscriber: Matt.Aug 21 2023, 1:02 PM

david-arm mentioned this in rG494d28ec07dd: [LoopVectorize] Add pre-commit tests for D152366.Aug 24 2023, 3:52 AM

This revision was landed with ongoing or failed builds.Aug 24 2023, 5:14 AM

Closed by commit rGc02184f286d1: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer… (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGc02184f286d1: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer….

GitHub <noreply@github.com> mentioned this in rG49b0e6dcc296: [LoopVectorize] Enable hoisting of runtime checks by default (#71538).Mon, Dec 18, 1:42 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

7 lines

Transforms/

Utils/

LoopUtils.h

2 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

30 lines

Transforms/

Utils/

LoopUtils.cpp

83 lines

Vectorize/

LoopVectorize.cpp

6 lines

test/

Transforms/

LoopVectorize/

runtime-checks-hoist.ll

595 lines

Diff 553081

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	struct VectorizerParams {
/// Interleave factor as overridden by the user.		/// Interleave factor as overridden by the user.
static unsigned VectorizationInterleave;		static unsigned VectorizationInterleave;
/// True if force-vector-interleave was specified by the user.		/// True if force-vector-interleave was specified by the user.
static bool isInterleaveForced();		static bool isInterleaveForced();

/// \When performing memory disambiguation checks at runtime do not		/// \When performing memory disambiguation checks at runtime do not
/// make more than this number of comparisons.		/// make more than this number of comparisons.
static unsigned RuntimeMemoryCheckThreshold;		static unsigned RuntimeMemoryCheckThreshold;

		// When creating runtime checks for nested loops, where possible try to
		// write the checks in a form that allows them to be easily hoisted out of
		paulwalker-armUnsubmitted Done Reply Inline Actions This doesn't read very well. paulwalker-arm: This doesn't read very well.
		// the outermost loop. For example, we can do this by expanding the range of
		// addresses considered to include the entire nested loop so that they are
		// loop invariant.
		static bool HoistRuntimeChecks;
};		};

/// Checks memory dependences among accesses to the same underlying		/// Checks memory dependences among accesses to the same underlying
/// object to determine whether there vectorization is legal or not (and at		/// object to determine whether there vectorization is legal or not (and at
/// which vectorization factor).		/// which vectorization factor).
///		///
/// Note: This class will compute a conservative dependence for access to		/// Note: This class will compute a conservative dependence for access to
/// different underlying pointers. Clients, such as the loop vectorizer, will		/// different underlying pointers. Clients, such as the loop vectorizer, will
▲ Show 20 Lines • Show All 759 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 517 Lines • ▼ Show 20 Lines
	Loop cloneLoop(Loop L, Loop *PL, ValueToValueMapTy &VM,			Loop cloneLoop(Loop L, Loop *PL, ValueToValueMapTy &VM,
	LoopInfo LI, LPPassManager LPM);			LoopInfo LI, LPPassManager LPM);

	/// Add code that checks at runtime if the accessed arrays in \p PointerChecks			/// Add code that checks at runtime if the accessed arrays in \p PointerChecks
	/// overlap. Returns the final comparator value or NULL if no check is needed.			/// overlap. Returns the final comparator value or NULL if no check is needed.
	Value *			Value *
	addRuntimeChecks(Instruction Loc, Loop TheLoop,			addRuntimeChecks(Instruction Loc, Loop TheLoop,
	const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,			const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,
	SCEVExpander &Expander);			SCEVExpander &Expander, bool HoistRuntimeChecks = false);

	Value *addDiffRuntimeChecks(			Value *addDiffRuntimeChecks(
	Instruction *Loc, ArrayRef<PointerDiffInfo> Checks, SCEVExpander &Expander,			Instruction *Loc, ArrayRef<PointerDiffInfo> Checks, SCEVExpander &Expander,
	function_ref<Value *(IRBuilderBase &, unsigned)> GetVF, unsigned IC);			function_ref<Value *(IRBuilderBase &, unsigned)> GetVF, unsigned IC);

	/// Struct to hold information about a partially invariant condition.			/// Struct to hold information about a partially invariant condition.
	struct IVConditionInfo {			struct IVConditionInfo {
	/// Instructions that need to be duplicated and checked for the unswitching			/// Instructions that need to be duplicated and checked for the unswitching
	Show All 34 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> MaxForkedSCEVDepth(
cl::desc("Maximum recursion depth when finding forked SCEVs (default = 5)"),		cl::desc("Maximum recursion depth when finding forked SCEVs (default = 5)"),
cl::init(5));		cl::init(5));

static cl::opt<bool> SpeculateUnitStride(		static cl::opt<bool> SpeculateUnitStride(
"laa-speculate-unit-stride", cl::Hidden,		"laa-speculate-unit-stride", cl::Hidden,
cl::desc("Speculate that non-constant strides are unit in LAA"),		cl::desc("Speculate that non-constant strides are unit in LAA"),
cl::init(true));		cl::init(true));

		static cl::opt<bool, true> HoistRuntimeChecks(
		"hoist-runtime-checks", cl::Hidden,
		cl::desc(
		"Hoist inner loop runtime memory checks to outer loop if possible"),
		paulwalker-armUnsubmitted Done Reply Inline Actions Typo, should be "inner". paulwalker-arm: Typo, should be "inner".
		cl::location(VectorizerParams::HoistRuntimeChecks), cl::init(false));
		bool VectorizerParams::HoistRuntimeChecks;

bool VectorizerParams::isInterleaveForced() {		bool VectorizerParams::isInterleaveForced() {
return ::VectorizationInterleave.getNumOccurrences() > 0;		return ::VectorizationInterleave.getNumOccurrences() > 0;
}		}

const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,		const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
const DenseMap<Value , const SCEV > &PtrToStride,		const DenseMap<Value , const SCEV > &PtrToStride,
Value *Ptr) {		Value *Ptr) {
const SCEV *OrigSCEV = PSE.getSCEV(Ptr);		const SCEV *OrigSCEV = PSE.getSCEV(Ptr);
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	void RuntimePointerChecking::tryToCreateDiffCheck(

const SCEV *SinkStartInt = SE->getPtrToIntExpr(SinkAR->getStart(), IntTy);		const SCEV *SinkStartInt = SE->getPtrToIntExpr(SinkAR->getStart(), IntTy);
const SCEV *SrcStartInt = SE->getPtrToIntExpr(SrcAR->getStart(), IntTy);		const SCEV *SrcStartInt = SE->getPtrToIntExpr(SrcAR->getStart(), IntTy);
if (isa<SCEVCouldNotCompute>(SinkStartInt) \|\|		if (isa<SCEVCouldNotCompute>(SinkStartInt) \|\|
isa<SCEVCouldNotCompute>(SrcStartInt)) {		isa<SCEVCouldNotCompute>(SrcStartInt)) {
CanUseDiffCheck = false;		CanUseDiffCheck = false;
return;		return;
}		}

		const Loop *InnerLoop = SrcAR->getLoop();
		// If the start values for both Src and Sink also vary according to an outer
		// loop, then it's probably better to avoid creating diff checks because
		// they may not be hoisted. We should instead let llvm::addRuntimeChecks
		// do the expanded full range overlap checks, which can be hoisted.
		if (HoistRuntimeChecks && InnerLoop->getParentLoop() &&
		isa<SCEVAddRecExpr>(SinkStartInt) && isa<SCEVAddRecExpr>(SrcStartInt)) {
		auto *SrcStartAR = cast<SCEVAddRecExpr>(SrcStartInt);
		auto *SinkStartAR = cast<SCEVAddRecExpr>(SinkStartInt);
		const Loop *StartARLoop = SrcStartAR->getLoop();
		if (StartARLoop == SinkStartAR->getLoop() &&
		StartARLoop == InnerLoop->getParentLoop()) {
		LLVM_DEBUG(dbgs() << "LAA: Not creating diff runtime check, since these "
		"cannot be hoisted out of the outer loop\n");
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This space is redundant because you've already got one at the end of the previous string. paulwalker-arm: This space is redundant because you've already got one at the end of the previous string.
		CanUseDiffCheck = false;
		return;
		}
		}

		LLVM_DEBUG(dbgs() << "LAA: Creating diff runtime check for:\n"
		<< "SrcStart: " << *SrcStartInt << '\n'
		<< "SinkStartInt: " << *SinkStartInt << '\n');
DiffChecks.emplace_back(SrcStartInt, SinkStartInt, AllocSize,		DiffChecks.emplace_back(SrcStartInt, SinkStartInt, AllocSize,
Src->NeedsFreeze \|\| Sink->NeedsFreeze);		Src->NeedsFreeze \|\| Sink->NeedsFreeze);
}		}

SmallVector<RuntimePointerCheck, 4> RuntimePointerChecking::generateChecks() {		SmallVector<RuntimePointerCheck, 4> RuntimePointerChecking::generateChecks() {
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

for (unsigned I = 0; I < CheckingGroups.size(); ++I) {		for (unsigned I = 0; I < CheckingGroups.size(); ++I) {
▲ Show 20 Lines • Show All 2,520 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines

/// IR Values for the lower and upper bounds of a pointer evolution. We		/// IR Values for the lower and upper bounds of a pointer evolution. We
/// need to use value-handles because SCEV expansion can invalidate previously		/// need to use value-handles because SCEV expansion can invalidate previously
/// expanded values. Thus expansion of a pointer can invalidate the bounds for		/// expanded values. Thus expansion of a pointer can invalidate the bounds for
/// a previous one.		/// a previous one.
struct PointerBounds {		struct PointerBounds {
TrackingVH<Value> Start;		TrackingVH<Value> Start;
TrackingVH<Value> End;		TrackingVH<Value> End;
		Value *StrideToCheck;
};		};

/// Expand code for the lower and upper bound of the pointer group \p CG		/// Expand code for the lower and upper bound of the pointer group \p CG
/// in \p TheLoop. \return the values for the bounds.		/// in \p TheLoop. \return the values for the bounds.
static PointerBounds expandBounds(const RuntimeCheckingPtrGroup *CG,		static PointerBounds expandBounds(const RuntimeCheckingPtrGroup *CG,
Loop TheLoop, Instruction Loc,		Loop TheLoop, Instruction Loc,
SCEVExpander &Exp) {		SCEVExpander &Exp, bool HoistRuntimeChecks) {
LLVMContext &Ctx = Loc->getContext();		LLVMContext &Ctx = Loc->getContext();
Type *PtrArithTy = PointerType::get(Ctx, CG->AddressSpace);		Type *PtrArithTy = PointerType::get(Ctx, CG->AddressSpace);

Value Start = nullptr, End = nullptr;		Value Start = nullptr, End = nullptr;
LLVM_DEBUG(dbgs() << "LAA: Adding RT check for range:\n");		LLVM_DEBUG(dbgs() << "LAA: Adding RT check for range:\n");
Start = Exp.expandCodeFor(CG->Low, PtrArithTy, Loc);		const SCEV Low = CG->Low, High = CG->High, *Stride = nullptr;
End = Exp.expandCodeFor(CG->High, PtrArithTy, Loc);
		// If the Low and High values are themselves loop-variant, then we may want
		// to expand the range to include those covered by the outer loop as well.
		// There is a trade-off here with the advantage being that creating checks
		// using the expanded range permits the runtime memory checks to be hoisted
		// out of the outer loop. This reduces the cost of entering the inner loop,
		// which can be significant for low trip counts. The disadvantage is that
		// there is a chance we may now never enter the vectorized inner loop,
		// whereas using a restricted range check could have allowed us to enter at
		// least once. This is why the behaviour is not currently the default and is
		paulwalker-armUnsubmitted Done Reply Inline Actions "This is why the" paulwalker-arm: "This is why the"
		// controlled by the parameter 'HoistRuntimeChecks'.
		if (HoistRuntimeChecks && TheLoop->getParentLoop() &&
		isa<SCEVAddRecExpr>(High) && isa<SCEVAddRecExpr>(Low)) {
		auto *HighAR = cast<SCEVAddRecExpr>(High);
		paulwalker-armUnsubmitted Done Reply Inline Actions Given you evaluate `High` using the outer loop's exit count shouldn't you also check the following? cast<SCEVAddRecExpr>(High)->getLoop() == TheLoop->getParentLoop() paulwalker-arm: Given you evaluate `High` using the outer loop's exit count shouldn't you also check the…
		fhahnUnsubmitted Done Reply Inline Actions Could you add a test case for this case? fhahn: Could you add a test case for this case?
		david-armAuthorUnsubmitted Done Reply Inline Actions I can have a look, but the scenario you're thinking about would require 3 levels of nesting. I didn't originally add a test for that because I was worried about introducing so many increasingly complicated negative test cases. david-arm: I can have a look, but the scenario you're thinking about would require 3 levels of nesting. I…
		auto *LowAR = cast<SCEVAddRecExpr>(Low);
		const Loop *OuterLoop = TheLoop->getParentLoop();
		const SCEV Recur = LowAR->getStepRecurrence(Exp.getSE());
		if (Recur == HighAR->getStepRecurrence(*Exp.getSE()) &&
		HighAR->getLoop() == OuterLoop && LowAR->getLoop() == OuterLoop) {
		BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
		const SCEV *OuterExitCount =
		Exp.getSE()->getExitCount(OuterLoop, OuterLoopLatch);
		fhahnUnsubmitted Done Reply Inline Actions Do we need to check if the AddRecs here don't wrap? If it would wrap, could NewHigh < NewLow? fhahn: Do we need to check if the AddRecs here don't wrap? If it would wrap, could NewHigh < NewLow?
		david-armAuthorUnsubmitted Done Reply Inline Actions That's an interesting point! So I've looked into this and in order to even get to this point we must have already called evaluateAtIteration once already to get the original High value. This happens in `RuntimePointerChecking::insert`, which does something similar: const SCEV Ex = PSE.getBackedgeTakenCount(); ScStart = AR->getStart(); ScEnd = AR->evaluateAtIteration(Ex, SE); I can't see any existing code that worries about ScEnd being < ScStart here and all I'm doing is performing the same evaluation for the outer loop backedge count. It's quite difficult to follow the whole paper trail of complex code in LoopAccessAnalysis, but it seems that AccessAnalysis::createCheckForAccess only worries about overflow if we failed a dependency check and retry the runtime memory checks. Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not? david-arm: That's an interesting point! So I've looked into this and in order to even get to this point…
		fhahnUnsubmitted Not Done Reply Inline Actions Yeah it looks like this is handled a bit inconsistently at the moment. One other potential case that might be missed here is when the outer induction is decreasing. In that case I think the new high and lows would be swapped? Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not? What is your plan in general to enable the flag by default? I think we want to avoid adding this behind a flag and not working towards enabling it by default in the near term; fhahn: Yeah it looks like this is handled a bit inconsistently at the moment. One other potential case…
		david-armAuthorUnsubmitted Done Reply Inline Actions So I would like to enable this by default in the near term after running a sufficient number of benchmarks on at least AArch64 targets. Having the flag in early though allows other people to test it or make use of it until it's enabled. Since it's under a flag, if we do enable it by default it's easy to turn off again should any problems arise. With regards to decreasing outer induction variables I think I tried this and the hoisting doesn't happen, but I can double check. david-arm: So I would like to enable this by default in the near term after running a sufficient number of…
		david-armAuthorUnsubmitted Done Reply Inline Actions I believe I've now addressed this problem by checking if the stride is known non-negative. If we can't prove it's not negative then we need additional checks for positive strides. The extra check doesn't seem to hurt the x264 benchmark and I still see a nice improvement. david-arm: I believe I've now addressed this problem by checking if the stride is known non-negative. If…
		fhahnUnsubmitted Not Done Reply Inline Actions IIRC for regular runtime checks we create SCEV expressions for the minimum and maximum directly (something like `Start = SE->getUMinExpr(Start, End);`). Could this be used here as well? fhahn: IIRC for regular runtime checks we create SCEV expressions for the minimum and maximum directly…
		david-armAuthorUnsubmitted Done Reply Inline Actions That's a good suggestion. Do you mean something like: TmpLow = umin(Low, High); TmpHigh = umax(Low, High); Low = TmpLow; High = TmpHigh; ? If so I gave that a try and it comes at a price, since for my simple test case: void foo(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j <= n; j++) { dst[(i * stride1) + j] += src[(i * stride2) + j]; } } } we end up with 2 extra instructions for the runtime checks in the preheader when building for SVE. For loops with short trip counts like x264 this does matter. Given this is just an initial patch and likely to be a work in progress are you happy with me landing this version as is for now? We can always revisit this in future if we can figure out a better way of doing this that's just as efficient. david-arm: That's a good suggestion. Do you mean something like: TmpLow = umin(Low, High); TmpHigh =…
		if (!isa<SCEVCouldNotCompute>(OuterExitCount) &&
		OuterExitCount->getType()->isIntegerTy()) {
		const SCEV *NewHigh = cast<SCEVAddRecExpr>(High)->evaluateAtIteration(
		OuterExitCount, *Exp.getSE());
		if (!isa<SCEVCouldNotCompute>(NewHigh)) {
		LLVM_DEBUG(dbgs() << "LAA: Expanded RT check for range to include "
		"outer loop in order to permit hoisting\n");
		High = NewHigh;
		Low = cast<SCEVAddRecExpr>(Low)->getStart();
		// If there is a possibility that the stride is negative then we have
		// to generate extra checks to ensure the stride is positive.
		if (!Exp.getSE()->isKnownNonNegative(Recur)) {
		Stride = Recur;
		LLVM_DEBUG(dbgs() << "LAA: ... but need to check stride is "
		"positive: "
		<< *Stride << '\n');
		}
		}
		}
		}
		}

		Start = Exp.expandCodeFor(Low, PtrArithTy, Loc);
		End = Exp.expandCodeFor(High, PtrArithTy, Loc);
if (CG->NeedsFreeze) {		if (CG->NeedsFreeze) {
IRBuilder<> Builder(Loc);		IRBuilder<> Builder(Loc);
Start = Builder.CreateFreeze(Start, Start->getName() + ".fr");		Start = Builder.CreateFreeze(Start, Start->getName() + ".fr");
End = Builder.CreateFreeze(End, End->getName() + ".fr");		End = Builder.CreateFreeze(End, End->getName() + ".fr");
}		}
LLVM_DEBUG(dbgs() << "Start: " << CG->Low << " End: " << CG->High << "\n");		Value *StrideVal =
return {Start, End};		Stride ? Exp.expandCodeFor(Stride, Type::getInt64Ty(Ctx), Loc) : nullptr;
		LLVM_DEBUG(dbgs() << "Start: " << Low << " End: " << High << "\n");
		return {Start, End, StrideVal};
}		}

/// Turns a collection of checks into a collection of expanded upper and		/// Turns a collection of checks into a collection of expanded upper and
/// lower bounds for both pointers in the check.		/// lower bounds for both pointers in the check.
static SmallVector<std::pair<PointerBounds, PointerBounds>, 4>		static SmallVector<std::pair<PointerBounds, PointerBounds>, 4>
expandBounds(const SmallVectorImpl<RuntimePointerCheck> &PointerChecks, Loop *L,		expandBounds(const SmallVectorImpl<RuntimePointerCheck> &PointerChecks, Loop *L,
Instruction *Loc, SCEVExpander &Exp) {		Instruction *Loc, SCEVExpander &Exp, bool HoistRuntimeChecks) {
SmallVector<std::pair<PointerBounds, PointerBounds>, 4> ChecksWithBounds;		SmallVector<std::pair<PointerBounds, PointerBounds>, 4> ChecksWithBounds;

// Here we're relying on the SCEV Expander's cache to only emit code for the		// Here we're relying on the SCEV Expander's cache to only emit code for the
// same bounds once.		// same bounds once.
transform(PointerChecks, std::back_inserter(ChecksWithBounds),		transform(PointerChecks, std::back_inserter(ChecksWithBounds),
[&](const RuntimePointerCheck &Check) {		[&](const RuntimePointerCheck &Check) {
PointerBounds First = expandBounds(Check.first, L, Loc, Exp),		PointerBounds First = expandBounds(Check.first, L, Loc, Exp,
Second = expandBounds(Check.second, L, Loc, Exp);		HoistRuntimeChecks),
		Second = expandBounds(Check.second, L, Loc, Exp,
		HoistRuntimeChecks);
return std::make_pair(First, Second);		return std::make_pair(First, Second);
});		});

return ChecksWithBounds;		return ChecksWithBounds;
}		}

Value *llvm::addRuntimeChecks(		Value *llvm::addRuntimeChecks(
Instruction Loc, Loop TheLoop,		Instruction Loc, Loop TheLoop,
const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,		const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,
SCEVExpander &Exp) {		SCEVExpander &Exp, bool HoistRuntimeChecks) {
// TODO: Move noalias annotation code from LoopVersioning here and share with LV if possible.		// TODO: Move noalias annotation code from LoopVersioning here and share with LV if possible.
// TODO: Pass RtPtrChecking instead of PointerChecks and SE separately, if possible		// TODO: Pass RtPtrChecking instead of PointerChecks and SE separately, if possible
auto ExpandedChecks = expandBounds(PointerChecks, TheLoop, Loc, Exp);		auto ExpandedChecks =
		expandBounds(PointerChecks, TheLoop, Loc, Exp, HoistRuntimeChecks);

LLVMContext &Ctx = Loc->getContext();		LLVMContext &Ctx = Loc->getContext();
IRBuilder<InstSimplifyFolder> ChkBuilder(Ctx,		IRBuilder<InstSimplifyFolder> ChkBuilder(Ctx,
Loc->getModule()->getDataLayout());		Loc->getModule()->getDataLayout());
ChkBuilder.SetInsertPoint(Loc);		ChkBuilder.SetInsertPoint(Loc);
// Our instructions might fold to a constant.		// Our instructions might fold to a constant.
Value *MemoryRuntimeCheck = nullptr;		Value *MemoryRuntimeCheck = nullptr;

Show All 14 Lines	for (const auto &Check : ExpandedChecks) {
// NoConflict = (B.Start >= A.End) \|\| (A.Start >= B.End)		// NoConflict = (B.Start >= A.End) \|\| (A.Start >= B.End)
//		//
// bound0 = (B.Start < A.End)		// bound0 = (B.Start < A.End)
// bound1 = (A.Start < B.End)		// bound1 = (A.Start < B.End)
// IsConflict = bound0 & bound1		// IsConflict = bound0 & bound1
Value *Cmp0 = ChkBuilder.CreateICmpULT(A.Start, B.End, "bound0");		Value *Cmp0 = ChkBuilder.CreateICmpULT(A.Start, B.End, "bound0");
Value *Cmp1 = ChkBuilder.CreateICmpULT(B.Start, A.End, "bound1");		Value *Cmp1 = ChkBuilder.CreateICmpULT(B.Start, A.End, "bound1");
Value *IsConflict = ChkBuilder.CreateAnd(Cmp0, Cmp1, "found.conflict");		Value *IsConflict = ChkBuilder.CreateAnd(Cmp0, Cmp1, "found.conflict");
		if (A.StrideToCheck) {
		Value *IsNegativeStride = ChkBuilder.CreateICmpSLT(
		paulwalker-armUnsubmitted Done Reply Inline Actions Should this be `IsNegativeStride` given you're planting `A.StrideToCheck < 0`? Perhaps it's worth giving the node a name like "stride.check"? paulwalker-arm: Should this be `IsNegativeStride` given you're planting `A.StrideToCheck < 0`? Perhaps it's…
		A.StrideToCheck, ConstantInt::get(A.StrideToCheck->getType(), 0),
		"stride.check");
		IsConflict = ChkBuilder.CreateOr(IsConflict, IsNegativeStride);
		}
		if (B.StrideToCheck) {
		Value *IsNegativeStride = ChkBuilder.CreateICmpSLT(
		B.StrideToCheck, ConstantInt::get(B.StrideToCheck->getType(), 0),
		"stride.check");
		IsConflict = ChkBuilder.CreateOr(IsConflict, IsNegativeStride);
		}
if (MemoryRuntimeCheck) {		if (MemoryRuntimeCheck) {
IsConflict =		IsConflict =
ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict, "conflict.rdx");		ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict, "conflict.rdx");
}		}
MemoryRuntimeCheck = IsConflict;		MemoryRuntimeCheck = IsConflict;
}		}

return MemoryRuntimeCheck;		return MemoryRuntimeCheck;
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,968 Lines • ▼ Show 20 Lines	if (RtPtrChecking.Need) {
MemCheckBlock->getTerminator(), *DiffChecks, MemCheckExp,		MemCheckBlock->getTerminator(), *DiffChecks, MemCheckExp,
[VF, &RuntimeVF](IRBuilderBase &B, unsigned Bits) {		[VF, &RuntimeVF](IRBuilderBase &B, unsigned Bits) {
if (!RuntimeVF)		if (!RuntimeVF)
RuntimeVF = getRuntimeVF(B, B.getIntNTy(Bits), VF);		RuntimeVF = getRuntimeVF(B, B.getIntNTy(Bits), VF);
return RuntimeVF;		return RuntimeVF;
},		},
IC);		IC);
} else {		} else {
MemRuntimeCheckCond =		MemRuntimeCheckCond = addRuntimeChecks(
addRuntimeChecks(MemCheckBlock->getTerminator(), L,		MemCheckBlock->getTerminator(), L, RtPtrChecking.getChecks(),
RtPtrChecking.getChecks(), MemCheckExp);		MemCheckExp, VectorizerParams::HoistRuntimeChecks);
}		}
assert(MemRuntimeCheckCond &&		assert(MemRuntimeCheckCond &&
"no RT checks generated although RtPtrChecking "		"no RT checks generated although RtPtrChecking "
"claimed checks are required");		"claimed checks are required");
}		}

if (!MemCheckBlock && !SCEVCheckBlock)		if (!MemCheckBlock && !SCEVCheckBlock)
return;		return;
▲ Show 20 Lines • Show All 8,367 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -p 'loop-vectorize' -force-vector-interleave=1 -S \			; RUN: opt < %s -hoist-runtime-checks -p 'loop-vectorize' -force-vector-interleave=1 -S \
	; RUN: -force-vector-width=4 -debug-only=loop-accesses,loop-vectorize,loop-utils 2> %t \| FileCheck %s			; RUN: -force-vector-width=4 -debug-only=loop-accesses,loop-vectorize,loop-utils 2> %t \| FileCheck %s
	; RUN: cat %t \| FileCheck %s --check-prefix=DEBUG			; RUN: cat %t \| FileCheck %s --check-prefix=DEBUG


	; Equivalent example in C:			; Equivalent example in C:
	; void diff_checks(int32_t dst, int32_t src, int m, int n) {			; void diff_checks(int32_t dst, int32_t src, int m, int n) {
	; for (int i = 0; i < m; i++) {			; for (int i = 0; i < m; i++) {
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; dst[(i * (n + 1)) + j] = src[(i * n) + j];			; dst[(i * (n + 1)) + j] = src[(i * n) + j];
	; }			; }
	; }			; }
	; }			; }
	; NOTE: The strides of the starting address values in the inner loop differ, i.e.			; NOTE: The strides of the starting address values in the inner loop differ, i.e.
	; '(i * (n + 1))' vs '(i * n)'.			; '(i * (n + 1))' vs '(i * n)'.

	; DEBUG-LABEL: LAA: Found a loop in diff_checks:			; DEBUG-LABEL: LAA: Found a loop in diff_checks:
	; DEBUG-NOT: LAA: Adding RT check for range:			; DEBUG: LAA: Not creating diff runtime check, since these cannot be hoisted out of the outer loop
				; DEBUG: LAA: Adding RT check for range:
				; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: LAA: ... but need to check stride is positive: (4 * (sext i32 (1 + %n)<nuw> to i64))<nsw>
				; DEBUG-NEXT: Start: %dst End: ((4 * (zext i32 %n to i64))<nuw><nsw> + (4 * (sext i32 (1 + %n)<nuw> to i64) * (-1 + (zext i32 %m to i64))<nsw>) + %dst)
				david-armAuthorUnsubmitted Done Reply Inline Actions Hmm, I just noticed these CHECK lines look wrong, since we still create the diff checks here. I'll fix this in a new patch. david-arm: Hmm, I just noticed these CHECK lines look wrong, since we still create the diff checks here.
				; DEBUG-NEXT: LAA: Adding RT check for range:
				; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %src)

	define void @diff_checks(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {			define void @diff_checks(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @diff_checks			; CHECK-LABEL: define void @diff_checks
	; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SRC2:%.*]] = ptrtoint ptr [[SRC]] to i64
	; CHECK-NEXT: [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
	; CHECK-NEXT: [[ADD5:%.*]] = add nuw i32 [[N]], 1			; CHECK-NEXT: [[ADD5:%.*]] = add nuw i32 [[N]], 1
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[ADD5]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[ADD5]] to i64
	; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add nsw i64 [[WIDE_M]], -1
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], [[TMP5]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP6]]
				; CHECK-NEXT: [[TMP7:%.*]] = shl nsw i64 [[TMP1]], 2
				; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[WIDE_N]], [[WIDE_M]]
				; CHECK-NEXT: [[TMP9:%.*]] = shl i64 [[TMP8]], 2
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP9]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_EXIT:%.]] ]			; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_EXIT:%.]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[IV_OUTER]]			; CHECK-NEXT: [[TMP10:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[DST1]], [[TMP4]]			; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP1]]
	; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP3]], [[IV_OUTER]]
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[SRC2]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP1]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP10:%.*]] = sub i64 [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
	; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP10]], 16			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[STRIDE_CHECK:%.*]] = icmp slt i64 [[TMP7]], 0
				; CHECK-NEXT: [[TMP12:%.*]] = or i1 [[FOUND_CONFLICT]], [[STRIDE_CHECK]]
				; CHECK-NEXT: br i1 [[TMP12]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[TMP11]], [[TMP8]]			; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[TMP13]], [[TMP10]]
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP14]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP16]], align 4, !alias.scope !0
	; CHECK-NEXT: [[TMP15:%.*]] = add nsw i64 [[TMP11]], [[TMP9]]			; CHECK-NEXT: [[TMP17:%.*]] = add nsw i64 [[TMP13]], [[TMP11]]
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP18]], i32 0
	; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP17]], align 4			; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP19]], align 4, !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP8]]			; CHECK-NEXT: [[TMP21:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP10]]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP19]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP21]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[IV_INNER]], [[TMP9]]			; CHECK-NEXT: [[TMP23:%.*]] = add nsw i64 [[IV_INNER]], [[TMP11]]
	; CHECK-NEXT: [[ARRAYIDX9_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP21]]			; CHECK-NEXT: [[ARRAYIDX9_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]
	; CHECK-NEXT: store i32 [[TMP20]], ptr [[ARRAYIDX9_US]], align 4			; CHECK-NEXT: store i32 [[TMP22]], ptr [[ARRAYIDX9_US]], align 4
	; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1			; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
	; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]			; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
	; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP3:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: inner.exit:			; CHECK: inner.exit:
	; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1			; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
	; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]			; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
	; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.exit:			; CHECK: outer.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 34 Lines

	; Equivalent example in C:			; Equivalent example in C:
	; void full_checks(int32_t dst, int32_t src, int m, int n) {			; void full_checks(int32_t dst, int32_t src, int m, int n) {
	; for (int i = 0; i < m; i++) {			; for (int i = 0; i < m; i++) {
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; dst[(i * n) + j] += src[(i * n) + j];			; dst[(i * n) + j] += src[(i * n) + j];
	; }			; }
	; }			; }
	; }			; }
				AyalUnsubmitted Not Done Reply Inline Actions This raises a thought, which may deserve a separate patch: In this example the runtime checks should be optimized into a form independent of the enclosing i loop and hoisted out of it w/o expansion - and w/o asking a flag, as an obvious loop invariant unswitching opportunity. I.e., instead of checking if intervals dst[in : in+n) and src[in : in + n) intersect, suffice to check if dst[0 : n) and src[0 : n) intersect. When checking if the end SCEV point of one precedes the start SCEV of the other, try to cancel common addends, or check if the SCEV of their difference is positive - subject to wrapping? This also holds for the motivating memcopy example w/o the additional load, which is also worth adding a test for? Perhaps the first diff_checks() test above can serve as a better example where expansion is indeed necessary in order to loop unswitch a (non invariant) runtime check. Wonder if the SPEC2017 significant cases are invariant(?) Ayal: This raises a thought, which may deserve a separate patch: In this example the runtime checks…
				david-armAuthorUnsubmitted Done Reply Inline Actions Hi @Ayal, that's a good spot and thanks for pointing that out! The tests I wrote don't really bear any resemblance to the specific loops in x264 that benefit from this optimisation, where the inner loop runtime checks are not loop invariant. For example, one loop looks like a bit like this in x264: for(int i = 0; i < m; i++) { for(int j = 0; j < n; j++) out[j] = ... do something with in1[j] + in2[j] ...; out += out_stride; in1 += in1_stride; in2 += in2_stride; } However, what you said makes me wonder if the tests I added in this patch are unreliable, because like you said they may get optimised away in future because they are genuinely invariant. I'll look into adding another test. david-arm: Hi @Ayal, that's a good spot and thanks for pointing that out! The tests I wrote don't really…
	; We decide to do full runtime checks here (as opposed to diff checks) due to			; We decide to do full runtime checks here (as opposed to diff checks) due to
	; the additional load of 'dst[(i * n) + j]' in the loop.			; the additional load of 'dst[(i * n) + j]' in the loop.

	; DEBUG-LABEL: LAA: Found a loop in full_checks:			; DEBUG-LABEL: LAA: Found a loop in full_checks:
				; DEBUG-NOT: LAA: Creating diff runtime check for:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Up to you but long `NOT` lines like this are fragile. I think you'd be better off adding another `DEBUG` line to tryToCreateDiffCheck for the case when a diff check is created and then explicitly checking for the specific string you expect to see. paulwalker-arm: Up to you but long `NOT` lines like this are fragile. I think you'd be better off adding…
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %dst End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %src)

	define void @full_checks(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {			define void @full_checks(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @full_checks			; CHECK-LABEL: define void @full_checks
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[WIDE_N]], [[WIDE_M]]
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP2]]
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP2]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[OUTER_IV]]			; CHECK-NEXT: [[TMP3:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP5]]
				; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, !alias.scope !9
				; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, !alias.scope !4			; CHECK-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, !alias.scope !12, !noalias !9
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <4 x i32> [[WIDE_LOAD2]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0			; CHECK-NEXT: store <4 x i32> [[TMP10]], ptr [[TMP9]], align 4, !alias.scope !12, !noalias !9
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP11]], align 4, !alias.scope !7, !noalias !4
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP12]], ptr [[TMP11]], align 4, !alias.scope !7, !noalias !4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP5]]			; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP3]]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP14]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]			; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: [[TMP14:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP16]], [[TMP15]]			; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP14]], [[TMP13]]
	; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1			; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
	; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]			; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
	; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP15:![0-9]+]]
	; CHECK: inner.exit:			; CHECK: inner.exit:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]			; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]
	; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.exit:			; CHECK: outer.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 40 Lines
	; }			; }
	; We decide to do full runtime checks here (as opposed to diff checks) due to			; We decide to do full runtime checks here (as opposed to diff checks) due to
	; the additional load of 'dst[(i * n) + j]' in the loop.			; the additional load of 'dst[(i * n) + j]' in the loop.
	; NOTE: This is different to the test above (@full_checks) because the dst array			; NOTE: This is different to the test above (@full_checks) because the dst array
	; is accessed with a higher stride compared src, and therefore the inner loop			; is accessed with a higher stride compared src, and therefore the inner loop
	; runtime checks will vary for each outer loop iteration.			; runtime checks will vary for each outer loop iteration.

	; DEBUG-LABEL: LAA: Found a loop in full_checks_diff_strides:			; DEBUG-LABEL: LAA: Found a loop in full_checks_diff_strides:
				; DEBUG-NOT: LAA: Creating diff runtime check for:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 + (4 * (zext i32 %n to i64))<nuw><nsw>)<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 + (4 * (zext i32 %n to i64))<nuw><nsw>)<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %dst End: ((4 * (zext i32 %n to i64))<nuw><nsw> + ((4 + (4 * (zext i32 %n to i64))<nuw><nsw>)<nuw><nsw> * (-1 + (zext i32 %m to i64))<nsw>) + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %src)

	define void @full_checks_diff_strides(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) #0 {			define void @full_checks_diff_strides(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @full_checks_diff_strides			; CHECK-LABEL: define void @full_checks_diff_strides
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP0:%.*]] = add nsw i64 [[WIDE_M]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[TMP0]], 4			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 4
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP0]], [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP3]], [[TMP1]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[WIDE_N]], [[WIDE_M]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP6]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP0]], [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP2]], [[OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP0]], [[TMP5]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP6]]
	; CHECK-NEXT: [[NPLUS1:%.*]] = add nuw nsw i32 [[N]], 1			; CHECK-NEXT: [[NPLUS1:%.*]] = add nuw nsw i32 [[N]], 1
	; CHECK-NEXT: [[WIDE_NPLUS1:%.*]] = zext i32 [[NPLUS1]] to i64			; CHECK-NEXT: [[WIDE_NPLUS1:%.*]] = zext i32 [[NPLUS1]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_N]]			; CHECK-NEXT: [[TMP7:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_N]]
	; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_NPLUS1]]			; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[OUTER_IV]], [[WIDE_NPLUS1]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4, !alias.scope !11			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4, !alias.scope !16
	; CHECK-NEXT: [[TMP13:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[TMP13:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !14, !noalias !11			; CHECK-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !19, !noalias !16
	; CHECK-NEXT: [[TMP16:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP16:%.*]] = add nsw <4 x i32> [[WIDE_LOAD2]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP16]], ptr [[TMP15]], align 4, !alias.scope !14, !noalias !11			; CHECK-NEXT: store <4 x i32> [[TMP16]], ptr [[TMP15]], align 4, !alias.scope !19, !noalias !16
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP7]]			; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP7]]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP18]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP8]]			; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP8]]
	; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]			; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP21]], [[TMP19]]			; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP21]], [[TMP19]]
	; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1			; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
	; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]			; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
	; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP17:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: inner.exit:			; CHECK: inner.exit:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]			; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_M]]
	; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.exit:			; CHECK: outer.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 38 Lines
	; for (int i = 0; i < m; i++) {			; for (int i = 0; i < m; i++) {
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; dst[(i * n) + j] = src[j];			; dst[(i * n) + j] = src[j];
	; }			; }
	; }			; }
	; }			; }

	; DEBUG-LABEL: LAA: Found a loop in diff_checks_src_start_invariant:			; DEBUG-LABEL: LAA: Found a loop in diff_checks_src_start_invariant:
				; DEBUG-NOT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting

	define void @diff_checks_src_start_invariant(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {			define void @diff_checks_src_start_invariant(ptr nocapture noundef writeonly %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @diff_checks_src_start_invariant			; CHECK-LABEL: define void @diff_checks_src_start_invariant
	; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef writeonly [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SRC2:%.*]] = ptrtoint ptr [[SRC]] to i64			; CHECK-NEXT: [[SRC2:%.*]] = ptrtoint ptr [[SRC]] to i64
	; CHECK-NEXT: [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64			; CHECK-NEXT: [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
	Show All 23 Lines
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP8]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP8]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP4]]			; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP4]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0
	; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP11]], align 4			; CHECK-NEXT: store <4 x i32> [[WIDE_LOAD]], ptr [[TMP11]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]
	; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP4]]			; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP4]]
	; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]			; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]
	; CHECK-NEXT: store i32 [[TMP13]], ptr [[ARRAYIDX6_US]], align 4			; CHECK-NEXT: store i32 [[TMP13]], ptr [[ARRAYIDX6_US]], align 4
	; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1			; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
	; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]			; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
	; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP19:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1			; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
	; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]			; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
	; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.loop.exit:			; CHECK: outer.loop.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 34 Lines
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; dst[(i * n) + j] += src[j];			; dst[(i * n) + j] += src[j];
	; }			; }
	; }			; }
	; }			; }

	; DEBUG-LABEL: LAA: Found a loop in full_checks_src_start_invariant:			; DEBUG-LABEL: LAA: Found a loop in full_checks_src_start_invariant:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %dst End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %n to i64))<nuw><nsw> + %src)			; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %n to i64))<nuw><nsw> + %src)

	define void @full_checks_src_start_invariant(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {			define void @full_checks_src_start_invariant(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @full_checks_src_start_invariant			; CHECK-LABEL: define void @full_checks_src_start_invariant
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_M:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_N:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[WIDE_N]], [[WIDE_M]]
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2			; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP2]]			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP2]]
				; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[WIDE_N]], 2
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP3]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_LOOP_EXIT:%.]] ]			; CHECK-NEXT: [[IV_OUTER:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_OUTER_NEXT:%.]], [[INNER_LOOP_EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[IV_OUTER]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul nsw i64 [[IV_OUTER]], [[TMP0]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_N]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP8]], align 4, !alias.scope !20			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, !alias.scope !25
	; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[TMP5]], [[TMP4]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP11]], align 4, !alias.scope !23, !noalias !20			; CHECK-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP10]], align 4, !alias.scope !28, !noalias !25
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[WIDE_LOAD3]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP11:%.*]] = add nsw <4 x i32> [[WIDE_LOAD2]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP12]], ptr [[TMP11]], align 4, !alias.scope !23, !noalias !20			; CHECK-NEXT: store <4 x i32> [[TMP11]], ptr [[TMP10]], align 4, !alias.scope !28, !noalias !25
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[IV_INNER:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_INNER_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[IV_INNER]]
	; CHECK-NEXT: [[TMP14:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP5]]			; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[IV_INNER]], [[TMP4]]
	; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP15]]			; CHECK-NEXT: [[ARRAYIDX6_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX6_US]], align 4			; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[ARRAYIDX6_US]], align 4
	; CHECK-NEXT: [[ADD7_US:%.*]] = add nsw i32 [[TMP16]], [[TMP14]]			; CHECK-NEXT: [[ADD7_US:%.*]] = add nsw i32 [[TMP15]], [[TMP13]]
	; CHECK-NEXT: store i32 [[ADD7_US]], ptr [[ARRAYIDX6_US]], align 4			; CHECK-NEXT: store i32 [[ADD7_US]], ptr [[ARRAYIDX6_US]], align 4
	; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1			; CHECK-NEXT: [[IV_INNER_NEXT]] = add nuw nsw i64 [[IV_INNER]], 1
	; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]			; CHECK-NEXT: [[INNER_EXIT_COND:%.*]] = icmp eq i64 [[IV_INNER_NEXT]], [[WIDE_N]]
	; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_EXIT_COND]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP31:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1			; CHECK-NEXT: [[IV_OUTER_NEXT]] = add nuw nsw i64 [[IV_OUTER]], 1
	; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]			; CHECK-NEXT: [[OUTER_EXIT_COND:%.*]] = icmp eq i64 [[IV_OUTER_NEXT]], [[WIDE_M]]
	; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[OUTER_EXIT_COND]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.loop.exit:			; CHECK: outer.loop.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 35 Lines
	; for (int i = 0; i < m; i++) {			; for (int i = 0; i < m; i++) {
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; for (int l = 0; l < o; l++) {			; for (int l = 0; l < o; l++) {
	; dst[(i * n * (o + 1)) + (j * o) + l] += src[(i * n * o) + l];			; dst[(i * n * (o + 1)) + (j * o) + l] += src[(i * n * o) + l];
	; }			; }
	; }			; }
	; }			; }
	; }			; }
				; The 'src' access varies with the outermost loop, rather than the parent of the
				; innermost loop. Hence we don't expand `src`, although in theory we could do.

	; DEBUG-LABEL: LAA: Found a loop in triple_nested_loop_mixed_access:			; DEBUG-LABEL: LAA: Found a loop in triple_nested_loop_mixed_access:
				; DEBUG-NOT: LAA: Creating diff runtime check for:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {{[{][{]}}%dst,+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop>,+,(4 * (zext i32 %o to i64))<nuw><nsw>}<%outer.loop> End: {{[{][{]}}((4 * (zext i32 %o to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop>,+,(4 * (zext i32 %o to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop> End: {((4 * (zext i32 %n to i64) * (zext i32 %o to i64)) + %dst),+,(4 * (zext i32 (1 + %o)<nsw> to i64) * (zext i32 %n to i64))}<%outer.outer.loop>
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop> End: {((4 * (zext i32 %o to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop>			; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop> End: {((4 * (zext i32 %o to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64) * (zext i32 %o to i64))}<%outer.outer.loop>

	define void @triple_nested_loop_mixed_access(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n, i32 noundef %o) {			define void @triple_nested_loop_mixed_access(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %m, i32 noundef %n, i32 noundef %o) {
	; CHECK-LABEL: define void @triple_nested_loop_mixed_access			; CHECK-LABEL: define void @triple_nested_loop_mixed_access
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]], i32 noundef [[O:%.*]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]], i32 noundef [[O:%.*]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[O]], 1			; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[O]], 1
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[O]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[O]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[ADD11]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[ADD11]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT68:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT68:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT60:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT60:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[O]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[O]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2			; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2
	; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[WIDE_TRIP_COUNT]], 2			; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2			; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
	; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[WIDE_TRIP_COUNT]], [[TMP1]]			; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[WIDE_TRIP_COUNT]], [[TMP1]]
	; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2			; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
				; CHECK-NEXT: [[TMP9:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
	; CHECK-NEXT: br label [[OUTER_OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_OUTER_LOOP:%.*]]
	; CHECK: outer.outer.loop:			; CHECK: outer.outer.loop:
	; CHECK-NEXT: [[OUTER_OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_OUTER_IV_NEXT:%.]], [[OUTER_LOOP_END:%.]] ]			; CHECK-NEXT: [[OUTER_OUTER_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[OUTER_OUTER_IV_NEXT:%.]], [[OUTER_LOOP_END:%.]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP4]], [[OUTER_OUTER_IV]]			; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP4]], [[OUTER_OUTER_IV]]
	; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP8]], [[OUTER_OUTER_IV]]			; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[TMP6]], [[TMP10]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP11]]			; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP6]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = mul i64 [[TMP8]], [[OUTER_OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]			; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = mul nsw i64 [[OUTER_OUTER_IV]], [[TMP1]]			; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[TMP9]], [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = mul nsw i64 [[TMP13]], [[TMP0]]			; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP13]], [[TMP2]]			; CHECK-NEXT: [[TMP14:%.*]] = mul nsw i64 [[OUTER_OUTER_IV]], [[TMP1]]
				; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP14]], [[TMP0]]
				; CHECK-NEXT: [[TMP16:%.*]] = mul nsw i64 [[TMP14]], [[TMP2]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_END:%.*]] ], [ 0, [[OUTER_OUTER_LOOP]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_END:%.*]] ], [ 0, [[OUTER_OUTER_LOOP]] ]
	; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP5]], [[OUTER_IV]]			; CHECK-NEXT: [[TMP17:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
	; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[TMP9]], [[TMP16]]			; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw i64 [[TMP17]], [[TMP16]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP18:%.*]] = add i64 [[TMP10]], [[TMP16]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP0]]
	; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[TMP19]], [[TMP15]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP19:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP22:%.*]] = add nuw nsw i64 [[TMP21]], [[TMP14]]			; CHECK-NEXT: [[TMP20:%.*]] = add nuw nsw i64 [[TMP19]], [[TMP15]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP22]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP24]], align 4, !alias.scope !27			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !32
	; CHECK-NEXT: [[TMP25:%.*]] = add nuw nsw i64 [[TMP20]], [[TMP21]]			; CHECK-NEXT: [[TMP23:%.*]] = add nuw nsw i64 [[TMP18]], [[TMP19]]
	; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP25]]			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]
	; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i32 0			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP27]], align 4, !alias.scope !30, !noalias !27			; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP25]], align 4, !alias.scope !35, !noalias !32
	; CHECK-NEXT: [[TMP28:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP26:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP28]], ptr [[TMP27]], align 4, !alias.scope !30, !noalias !27			; CHECK-NEXT: store <4 x i32> [[TMP26]], ptr [[TMP25]], align 4, !alias.scope !35, !noalias !32
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP37:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_END]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_END]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[TMP30:%.*]] = add nuw nsw i64 [[INNER_IV]], [[TMP14]]			; CHECK-NEXT: [[TMP28:%.*]] = add nuw nsw i64 [[INNER_IV]], [[TMP15]]
	; CHECK-NEXT: [[ARRAYIDX_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP30]]			; CHECK-NEXT: [[ARRAYIDX_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP28]]
	; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX_US_US_US]], align 4			; CHECK-NEXT: [[TMP29:%.*]] = load i32, ptr [[ARRAYIDX_US_US_US]], align 4
	; CHECK-NEXT: [[TMP32:%.*]] = add nuw nsw i64 [[TMP20]], [[INNER_IV]]			; CHECK-NEXT: [[TMP30:%.*]] = add nuw nsw i64 [[TMP18]], [[INNER_IV]]
	; CHECK-NEXT: [[ARRAYIDX17_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP32]]			; CHECK-NEXT: [[ARRAYIDX17_US_US_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP30]]
	; CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX17_US_US_US]], align 4			; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX17_US_US_US]], align 4
	; CHECK-NEXT: [[ADD18_US_US_US:%.*]] = add nsw i32 [[TMP33]], [[TMP31]]			; CHECK-NEXT: [[ADD18_US_US_US:%.*]] = add nsw i32 [[TMP31]], [[TMP29]]
	; CHECK-NEXT: store i32 [[ADD18_US_US_US]], ptr [[ARRAYIDX17_US_US_US]], align 4			; CHECK-NEXT: store i32 [[ADD18_US_US_US]], ptr [[ARRAYIDX17_US_US_US]], align 4
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_END]], label [[INNER_LOOP]], !llvm.loop [[LOOP33:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_END]], label [[INNER_LOOP]], !llvm.loop [[LOOP38:![0-9]+]]
	; CHECK: inner.loop.end:			; CHECK: inner.loop.end:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[EXIT_OUTER:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT60]]			; CHECK-NEXT: [[EXIT_OUTER:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT60]]
	; CHECK-NEXT: br i1 [[EXIT_OUTER]], label [[OUTER_LOOP_END]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[EXIT_OUTER]], label [[OUTER_LOOP_END]], label [[OUTER_LOOP]]
	; CHECK: outer.loop.end:			; CHECK: outer.loop.end:
	; CHECK-NEXT: [[OUTER_OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_OUTER_IV]], 1
	; CHECK-NEXT: [[EXIT_OUTER_OUTER:%.*]] = icmp eq i64 [[OUTER_OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT68]]			; CHECK-NEXT: [[EXIT_OUTER_OUTER:%.*]] = icmp eq i64 [[OUTER_OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT68]]
	; CHECK-NEXT: br i1 [[EXIT_OUTER_OUTER]], label [[EXIT:%.*]], label [[OUTER_OUTER_LOOP]]			; CHECK-NEXT: br i1 [[EXIT_OUTER_OUTER]], label [[EXIT:%.*]], label [[OUTER_OUTER_LOOP]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; int i;			; int i;
	; while (str[i] != '\0') {			; while (str[i] != '\0') {
	; for (int j = 0; j < n; j++) {			; for (int j = 0; j < n; j++) {
	; dst[(i * (n + 1)) + j] += src[(i * n) + j];			; dst[(i * (n + 1)) + j] += src[(i * n) + j];
	; }			; }
	; i++;			; i++;
	; }			; }
	; }			; }
				; Outer loop trip count is uncomputable so we shouldn't expand the ranges.

	; DEBUG-LABEL: LAA: Found a loop in uncomputable_outer_tc:			; DEBUG-LABEL: LAA: Found a loop in uncomputable_outer_tc:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop>
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>

	define void @uncomputable_outer_tc(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, ptr nocapture noundef readonly %str, i32 noundef %n) {			define void @uncomputable_outer_tc(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, ptr nocapture noundef readonly %str, i32 noundef %n) {
	Show All 38 Lines
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw i64 [[TMP12]], [[TMP10]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw i64 [[TMP12]], [[TMP10]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !34			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP15]], align 4, !alias.scope !39
	; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP12]], [[TMP11]]			; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP12]], [[TMP11]]
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP16]]
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !37, !noalias !34			; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !42, !noalias !39
	; CHECK-NEXT: [[TMP19:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP19:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP19]], ptr [[TMP18]], align 4, !alias.scope !37, !noalias !34			; CHECK-NEXT: store <4 x i32> [[TMP19]], ptr [[TMP18]], align 4, !alias.scope !42, !noalias !39
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP44:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[INNER_IV]], [[TMP10]]			; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[INNER_IV]], [[TMP10]]
	; CHECK-NEXT: [[ARRAYIDX5_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP21]]			; CHECK-NEXT: [[ARRAYIDX5_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP21]]
	; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX5_US]], align 4			; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX5_US]], align 4
	; CHECK-NEXT: [[TMP23:%.*]] = add nsw i64 [[INNER_IV]], [[TMP11]]			; CHECK-NEXT: [[TMP23:%.*]] = add nsw i64 [[INNER_IV]], [[TMP11]]
	; CHECK-NEXT: [[ARRAYIDX10_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]			; CHECK-NEXT: [[ARRAYIDX10_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]
	; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[ARRAYIDX10_US]], align 4			; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[ARRAYIDX10_US]], align 4
	; CHECK-NEXT: [[ADD11_US:%.*]] = add nsw i32 [[TMP24]], [[TMP22]]			; CHECK-NEXT: [[ADD11_US:%.*]] = add nsw i32 [[TMP24]], [[TMP22]]
	; CHECK-NEXT: store i32 [[ADD11_US]], ptr [[ARRAYIDX10_US]], align 4			; CHECK-NEXT: store i32 [[ADD11_US]], ptr [[ARRAYIDX10_US]], align 4
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP40:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP45:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i8, ptr [[STR]], i64 [[OUTER_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i8, ptr [[STR]], i64 [[OUTER_IV_NEXT]]
	; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[ARRAYIDX_US]], align 1			; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[ARRAYIDX_US]], align 1
	; CHECK-NEXT: [[CMP_NOT_US:%.*]] = icmp eq i8 [[TMP25]], 0			; CHECK-NEXT: [[CMP_NOT_US:%.*]] = icmp eq i8 [[TMP25]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT_US]], label [[WHILE_END_LOOPEXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[CMP_NOT_US]], label [[WHILE_END_LOOPEXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: while.end.loopexit:			; CHECK: while.end.loopexit:
	; CHECK-NEXT: br label [[WHILE_END]]			; CHECK-NEXT: br label [[WHILE_END]]
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; Equivalent example in C:			; Equivalent example in C:
	; void decreasing_inner_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {			; void decreasing_inner_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {
	; for (int i = 0; i < m; i++) {			; for (int i = 0; i < m; i++) {
	; for (int j = n; j >= 0; j--) {			; for (int j = n; j >= 0; j--) {
	; dst[(i * stride1) + j] += src[(i * stride2) + j];			; dst[(i * stride1) + j] += src[(i * stride2) + j];
	; }			; }
	; }			; }
	; }			; }
				; Inner IV is decreasing, but this isn't a problem and we can still expand the
				; runtime checks correctly to cover the whole loop.

	; DEBUG-LABEL: LAA: Found a loop in decreasing_inner_iv:			; DEBUG-LABEL: LAA: Found a loop in decreasing_inner_iv:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop> End: {(4 + (4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: LAA: ... but need to check stride is positive: (4 * (sext i32 %stride1 to i64))<nsw>
				; DEBUG-NEXT: Start: %dst End: (4 + (4 * (zext i32 %n to i64))<nuw><nsw> + (4 * (sext i32 %stride1 to i64) * (-1 + (zext i32 %m to i64))<nsw>) + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop> End: {(4 + (4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: LAA: ... but need to check stride is positive: (4 * (sext i32 %stride2 to i64))<nsw>
				; DEBUG-NEXT: Start: %src End: (4 + (4 * (zext i32 %n to i64))<nuw><nsw> + (4 * (sext i32 %stride2 to i64) * (-1 + (zext i32 %m to i64))<nsw>) + %src)

	define void @decreasing_inner_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {			define void @decreasing_inner_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @decreasing_inner_iv			; CHECK-LABEL: define void @decreasing_inner_iv
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[M]], 0			; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[M]], 0
	; CHECK-NEXT: [[CMP218:%.*]] = icmp sgt i32 [[N]], -1			; CHECK-NEXT: [[CMP218:%.*]] = icmp sgt i32 [[N]], -1
	; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP20]], [[CMP218]]			; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP20]], [[CMP218]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PRE:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PRE:%.]], label [[EXIT:%.]]
	; CHECK: outer.loop.pre:			; CHECK: outer.loop.pre:
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE2]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE2]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 2			; CHECK-NEXT: [[TMP3:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 4			; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2
	; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP1]], 2			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP0]], 1			; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP5]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP7]], 4
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = shl nsw i64 [[TMP2]], 2
				; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP3]], [[TMP1]]
				; CHECK-NEXT: [[TMP11:%.*]] = shl i64 [[TMP10]], 2
				; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP11]], [[TMP6]]
				; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[TMP12]], 4
				; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP13]]
				; CHECK-NEXT: [[TMP14:%.*]] = shl nsw i64 [[TMP1]], 2
				; CHECK-NEXT: [[TMP15:%.*]] = add nuw nsw i64 [[TMP0]], 1
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP3]], [[OUTER_IV]]			; CHECK-NEXT: [[TMP16:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP1]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP17:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
	; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[TMP5]], [[TMP8]]			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP15]], 4
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP6]], [[OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[TMP5]], [[TMP10]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP1]]
	; CHECK-NEXT: [[TMP13:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP7]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: [[STRIDE_CHECK:%.*]] = icmp slt i64 [[TMP9]], 0
				; CHECK-NEXT: [[TMP18:%.*]] = or i1 [[FOUND_CONFLICT]], [[STRIDE_CHECK]]
				; CHECK-NEXT: [[STRIDE_CHECK2:%.*]] = icmp slt i64 [[TMP14]], 0
				; CHECK-NEXT: [[TMP19:%.*]] = or i1 [[TMP18]], [[STRIDE_CHECK2]]
				; CHECK-NEXT: br i1 [[TMP19]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP7]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP15]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP7]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP15]], [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = sub i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[IND_END:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[TMP0]], [[INDEX]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[TMP0]], [[INDEX]]
	; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP20:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP15:%.*]] = add nsw i64 [[TMP14]], [[TMP12]]			; CHECK-NEXT: [[TMP21:%.*]] = add nsw i64 [[TMP20]], [[TMP16]]
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP21]]
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0			; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[TMP22]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 -3			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 -3
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !41			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP24]], align 4, !alias.scope !46
	; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP19:%.*]] = add nsw i64 [[TMP14]], [[TMP13]]			; CHECK-NEXT: [[TMP25:%.*]] = add nsw i64 [[TMP20]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP19]]			; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP25]]
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i32 0			; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 -3			; CHECK-NEXT: [[TMP28:%.*]] = getelementptr inbounds i32, ptr [[TMP27]], i32 -3
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !44, !noalias !41			; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP28]], align 4, !alias.scope !49, !noalias !46
	; CHECK-NEXT: [[REVERSE5:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD4]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE4:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD3]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[REVERSE5]], [[REVERSE]]			; CHECK-NEXT: [[TMP29:%.*]] = add nsw <4 x i32> [[REVERSE4]], [[REVERSE]]
	; CHECK-NEXT: [[REVERSE6:%.*]] = shufflevector <4 x i32> [[TMP23]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE5:%.*]] = shufflevector <4 x i32> [[TMP29]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: store <4 x i32> [[REVERSE6]], ptr [[TMP22]], align 4, !alias.scope !44, !noalias !41			; CHECK-NEXT: store <4 x i32> [[REVERSE5]], ptr [[TMP28]], align 4, !alias.scope !49, !noalias !46
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP46:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP30]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP51:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP7]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP15]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP0]], [[OUTER_LOOP]] ], [ [[TMP0]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP0]], [[OUTER_LOOP]] ], [ [[TMP0]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP25:%.*]] = add nsw i64 [[INNER_IV]], [[TMP12]]			; CHECK-NEXT: [[TMP31:%.*]] = add nsw i64 [[INNER_IV]], [[TMP16]]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP25]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP31]]
	; CHECK-NEXT: [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP32:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP27:%.*]] = add nsw i64 [[INNER_IV]], [[TMP13]]			; CHECK-NEXT: [[TMP33:%.*]] = add nsw i64 [[INNER_IV]], [[TMP17]]
	; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP27]]			; CHECK-NEXT: [[ARRAYIDX8_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP33]]
	; CHECK-NEXT: [[TMP28:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: [[TMP34:%.*]] = load i32, ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP28]], [[TMP26]]			; CHECK-NEXT: [[ADD9_US:%.*]] = add nsw i32 [[TMP34]], [[TMP32]]
	; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4			; CHECK-NEXT: store i32 [[ADD9_US]], ptr [[ARRAYIDX8_US]], align 4
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nsw i64 [[INNER_IV]], -1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nsw i64 [[INNER_IV]], -1
	; CHECK-NEXT: [[CMP2_US:%.*]] = icmp sgt i64 [[INNER_IV]], 0			; CHECK-NEXT: [[CMP2_US:%.*]] = icmp sgt i64 [[INNER_IV]], 0
	; CHECK-NEXT: br i1 [[CMP2_US]], label [[INNER_LOOP]], label [[INNER_LOOP_EXIT]], !llvm.loop [[LOOP47:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP2_US]], label [[INNER_LOOP]], label [[INNER_LOOP_EXIT]], !llvm.loop [[LOOP52:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[OUTER_LOOP_EXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: outer.loop.exit:			; CHECK: outer.loop.exit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; Equivalent example in C:			; Equivalent example in C:
	; void decreasing_outer_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {			; void decreasing_outer_iv(int32_t dst, int32_t src, int stride1, int stride2, int m, int n) {
	; for (int i = m - 1; i >= 0; i--) {			; for (int i = m - 1; i >= 0; i--) {
	; for (int j = 0; j <= n; j++) {			; for (int j = 0; j <= n; j++) {
	; dst[(i * stride1) + j] += src[(i * stride2) + j];			; dst[(i * stride1) + j] += src[(i * stride2) + j];
	; }			; }
	; }			; }
	; }			; }
				; Outer IV is decreasing, but the direction of memory accesses also depends
				; upon the signedness of stride1.

	; DEBUG-LABEL: LAA: Found a loop in decreasing_outer_iv:			; DEBUG-LABEL: LAA: Found a loop in decreasing_outer_iv:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {((4 * (zext i32 %m to i64) * (sext i32 %stride1 to i64)) + %dst),+,(-4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop> End: {((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (zext i32 %m to i64) * (sext i32 %stride1 to i64)) + %dst),+,(-4 * (sext i32 %stride1 to i64))<nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: LAA: ... but need to check stride is positive: (-4 * (sext i32 %stride1 to i64))<nsw>
				; DEBUG-NEXT: Start: ((4 * (zext i32 %m to i64) * (sext i32 %stride1 to i64)) + %dst) End: ((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (sext i32 %stride1 to i64))<nsw> + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {((4 * (zext i32 %m to i64) * (sext i32 %stride2 to i64)) + %src),+,(-4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop> End: {((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (zext i32 %m to i64) * (sext i32 %stride2 to i64)) + %src),+,(-4 * (sext i32 %stride2 to i64))<nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: LAA: ... but need to check stride is positive: (-4 * (sext i32 %stride2 to i64))<nsw>
				; DEBUG-NEXT: Start: ((4 * (zext i32 %m to i64) * (sext i32 %stride2 to i64)) + %src) End: ((4 * (zext i32 (1 + %n) to i64))<nuw><nsw> + (4 * (sext i32 %stride2 to i64))<nsw> + %src)

	define void @decreasing_outer_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {			define void @decreasing_outer_iv(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @decreasing_outer_iv			; CHECK-LABEL: define void @decreasing_outer_iv
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP21:%.*]] = icmp slt i32 [[M]], 1			; CHECK-NEXT: [[CMP21:%.*]] = icmp slt i32 [[M]], 1
	; CHECK-NEXT: [[CMP2_NOT18:%.*]] = icmp slt i32 [[N]], 0			; CHECK-NEXT: [[CMP2_NOT18:%.*]] = icmp slt i32 [[N]], 0
	; CHECK-NEXT: [[OR_COND:%.*]] = or i1 [[CMP21]], [[CMP2_NOT18]]			; CHECK-NEXT: [[OR_COND:%.*]] = or i1 [[CMP21]], [[CMP2_NOT18]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT:%.]], label [[OUTER_LOOP_PRE:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT:%.]], label [[OUTER_LOOP_PRE:%.]]
	; CHECK: outer.loop.pre:			; CHECK: outer.loop.pre:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], 1			; CHECK-NEXT: [[TMP0:%.*]] = add nuw i32 [[N]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[STRIDE1]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[STRIDE2]] to i64			; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[STRIDE2]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2			; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2
	; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP2]], -4			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP5]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl nsw i64 [[TMP2]], 2
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2			; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = shl i64 [[TMP9]], 2			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw i64 [[TMP2]], -4
	; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP3]], -4			; CHECK-NEXT: [[TMP10:%.*]] = mul i64 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP10]], [[TMP7]]			; CHECK-NEXT: [[TMP11:%.*]] = shl i64 [[TMP10]], 2
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP11]]
				; CHECK-NEXT: [[TMP12:%.*]] = shl nsw i64 [[TMP3]], 2
				; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[TMP12]], [[TMP7]]
				; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP13]]
				; CHECK-NEXT: [[TMP14:%.*]] = mul nsw i64 [[TMP3]], -4
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[INDVAR:%.]] = phi i64 [ [[INDVAR_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ], [ 0, [[OUTER_LOOP_PRE]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[TMP1]], [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[TMP1]], [[OUTER_LOOP_PRE]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = mul i64 [[TMP6]], [[INDVAR]]
	; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[TMP5]], [[TMP13]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[TMP8]], [[TMP13]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP15]]
	; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP11]], [[INDVAR]]
	; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[TMP10]], [[TMP16]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP18:%.*]] = add i64 [[TMP12]], [[TMP16]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP18]]
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nsw i64 [[OUTER_IV]], -1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nsw i64 [[OUTER_IV]], -1
	; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]			; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]
	; CHECK-NEXT: [[TMP20:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]			; CHECK-NEXT: [[TMP16:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: [[STRIDE_CHECK:%.*]] = icmp slt i64 [[TMP9]], 0
				; CHECK-NEXT: [[TMP17:%.*]] = or i1 [[FOUND_CONFLICT]], [[STRIDE_CHECK]]
				; CHECK-NEXT: [[STRIDE_CHECK4:%.*]] = icmp slt i64 [[TMP14]], 0
				; CHECK-NEXT: [[TMP18:%.*]] = or i1 [[TMP17]], [[STRIDE_CHECK4]]
				; CHECK-NEXT: br i1 [[TMP18]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP19:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP22:%.*]] = add nsw i64 [[TMP21]], [[TMP19]]			; CHECK-NEXT: [[TMP20:%.*]] = add nsw i64 [[TMP19]], [[TMP15]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP22]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP24]], align 4, !alias.scope !48			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !53
	; CHECK-NEXT: [[TMP25:%.*]] = add nsw i64 [[TMP21]], [[TMP20]]			; CHECK-NEXT: [[TMP23:%.*]] = add nsw i64 [[TMP19]], [[TMP16]]
	; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP25]]			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP23]]
	; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i32 0			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP27]], align 4, !alias.scope !51, !noalias !48			; CHECK-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP25]], align 4, !alias.scope !56, !noalias !53
	; CHECK-NEXT: [[TMP28:%.*]] = add nsw <4 x i32> [[WIDE_LOAD4]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP26:%.*]] = add nsw <4 x i32> [[WIDE_LOAD5]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP28]], ptr [[TMP27]], align 4, !alias.scope !51, !noalias !48			; CHECK-NEXT: store <4 x i32> [[TMP26]], ptr [[TMP25]], align 4, !alias.scope !56, !noalias !53
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP53:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP58:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP30:%.*]] = add nsw i64 [[INNER_IV]], [[TMP19]]			; CHECK-NEXT: [[TMP28:%.*]] = add nsw i64 [[INNER_IV]], [[TMP15]]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP30]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP28]]
	; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP29:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP32:%.*]] = add nsw i64 [[INNER_IV]], [[TMP20]]			; CHECK-NEXT: [[TMP30:%.*]] = add nsw i64 [[INNER_IV]], [[TMP16]]
	; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP32]]			; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP30]]
	; CHECK-NEXT: [[TMP33:%.*]] = load i32, ptr [[ARRAYIDX8]], align 4			; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[ARRAYIDX8]], align 4
	; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP33]], [[TMP31]]			; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP31]], [[TMP29]]
	; CHECK-NEXT: store i32 [[ADD9]], ptr [[ARRAYIDX8]], align 4			; CHECK-NEXT: store i32 [[ADD9]], ptr [[ARRAYIDX8]], align 4
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP54:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP59:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[INDVAR_NEXT]] = add i64 [[INDVAR]], 1
	; CHECK-NEXT: br i1 [[CMP]], label [[OUTER_LOOP]], label [[OUTER_LOOP_EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[OUTER_LOOP]], label [[OUTER_LOOP_EXIT:%.*]]
	; CHECK: outer.loop.exit:			; CHECK: outer.loop.exit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp21 = icmp slt i32 %m, 1			%cmp21 = icmp slt i32 %m, 1
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; dst[(i * (n + 1)) + (j * stride1)] += src[(i * n) + (j * stride2)];			; dst[(i * (n + 1)) + (j * stride1)] += src[(i * n) + (j * stride2)];
	; }			; }
	; }			; }
	; }			; }


	; DEBUG-LABEL: LAA: Found a loop in unknown_inner_stride:			; DEBUG-LABEL: LAA: Found a loop in unknown_inner_stride:
	; DEBUG: LAA: Adding RT check for range:			; DEBUG: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%dst,+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %dst),+,(4 * (zext i32 (1 + %n) to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %dst End: ((4 * (zext i32 %n to i64))<nuw><nsw> + (4 * (zext i32 (1 + %n) to i64) * (-1 + (zext i32 %m to i64))<nsw>) + %dst)
	; DEBUG-NEXT: LAA: Adding RT check for range:			; DEBUG-NEXT: LAA: Adding RT check for range:
	; DEBUG-NEXT: Start: {%src,+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop> End: {((4 * (zext i32 %n to i64))<nuw><nsw> + %src),+,(4 * (zext i32 %n to i64))<nuw><nsw>}<%outer.loop>			; DEBUG-NEXT: LAA: Expanded RT check for range to include outer loop in order to permit hoisting
				; DEBUG-NEXT: Start: %src End: ((4 * (zext i32 %m to i64) * (zext i32 %n to i64)) + %src)

	define void @unknown_inner_stride(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {			define void @unknown_inner_stride(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i32 noundef %stride1, i32 noundef %stride2, i32 noundef %m, i32 noundef %n) {
	; CHECK-LABEL: define void @unknown_inner_stride			; CHECK-LABEL: define void @unknown_inner_stride
	; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {			; CHECK-SAME: (ptr nocapture noundef [[DST:%.]], ptr nocapture noundef readonly [[SRC:%.]], i32 noundef [[STRIDE1:%.]], i32 noundef [[STRIDE2:%.]], i32 noundef [[M:%.]], i32 noundef [[N:%.]]) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP26:%.*]] = icmp sgt i32 [[M]], 0			; CHECK-NEXT: [[CMP26:%.*]] = icmp sgt i32 [[M]], 0
	; CHECK-NEXT: [[CMP224:%.*]] = icmp sgt i32 [[N]], 0			; CHECK-NEXT: [[CMP224:%.*]] = icmp sgt i32 [[N]], 0
	; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP26]], [[CMP224]]			; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP26]], [[CMP224]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PREHEADER:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[OUTER_LOOP_PREHEADER:%.]], label [[EXIT:%.]]
	; CHECK: outer.loop.preheader:			; CHECK: outer.loop.preheader:
	; CHECK-NEXT: [[ADD6:%.*]] = add nuw nsw i32 [[N]], 1			; CHECK-NEXT: [[ADD6:%.*]] = add nuw nsw i32 [[N]], 1
	; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[STRIDE2]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[STRIDE2]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE1]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[STRIDE1]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[ADD6]] to i64			; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[ADD6]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT39:%.*]] = zext i32 [[M]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT39:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[TMP3]], 2			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT39]], -1
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2			; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[WIDE_TRIP_COUNT]], 2			; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2
				; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP6]], [[TMP7]]
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[WIDE_TRIP_COUNT]], [[WIDE_TRIP_COUNT39]]
				; CHECK-NEXT: [[TMP10:%.*]] = shl i64 [[TMP9]], 2
				; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP10]]
	; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]			; CHECK-NEXT: br label [[OUTER_LOOP:%.*]]
	; CHECK: outer.loop:			; CHECK: outer.loop:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PREHEADER]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ 0, [[OUTER_LOOP_PREHEADER]] ], [ [[OUTER_IV_NEXT:%.]], [[INNER_LOOP_EXIT:%.*]] ]
	; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP4]], [[OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP6]], [[OUTER_IV]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]			; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP2]]
	; CHECK-NEXT: [[TMP12:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = mul nsw i64 [[OUTER_IV]], [[TMP3]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
	; CHECK: vector.scevcheck:			; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[STRIDE1]], 1			; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[STRIDE1]], 1
	; CHECK-NEXT: [[IDENT_CHECK1:%.*]] = icmp ne i32 [[STRIDE2]], 1			; CHECK-NEXT: [[IDENT_CHECK1:%.*]] = icmp ne i32 [[STRIDE2]], 1
	; CHECK-NEXT: [[TMP13:%.*]] = or i1 [[IDENT_CHECK]], [[IDENT_CHECK1]]			; CHECK-NEXT: [[TMP13:%.*]] = or i1 [[IDENT_CHECK]], [[IDENT_CHECK1]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]			; CHECK-NEXT: br i1 [[TMP13]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP4]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP2]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP3]], [[SCEVGEP2]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SRC]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP14]], [[TMP0]]			; CHECK-NEXT: [[TMP15:%.*]] = mul nsw i64 [[TMP14]], [[TMP0]]
	; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP15]], [[TMP11]]			; CHECK-NEXT: [[TMP16:%.*]] = add nsw i64 [[TMP15]], [[TMP11]]
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP16]]
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !55			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP18]], align 4, !alias.scope !60
	; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[TMP14]], [[TMP1]]			; CHECK-NEXT: [[TMP19:%.*]] = mul nsw i64 [[TMP14]], [[TMP1]]
	; CHECK-NEXT: [[TMP20:%.*]] = add nsw i64 [[TMP19]], [[TMP12]]			; CHECK-NEXT: [[TMP20:%.*]] = add nsw i64 [[TMP19]], [[TMP12]]
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !58, !noalias !55			; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP22]], align 4, !alias.scope !63, !noalias !60
	; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD5]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD3]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <4 x i32> [[TMP23]], ptr [[TMP22]], align 4, !alias.scope !58, !noalias !55			; CHECK-NEXT: store <4 x i32> [[TMP23]], ptr [[TMP22]], align 4, !alias.scope !63, !noalias !60
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP60:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP65:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[INNER_LOOP_EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[OUTER_LOOP]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[INNER_LOOP:%.*]]			; CHECK-NEXT: br label [[INNER_LOOP:%.*]]
	; CHECK: inner.loop:			; CHECK: inner.loop:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER_LOOP]] ]
	; CHECK-NEXT: [[TMP25:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP0]]			; CHECK-NEXT: [[TMP25:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP0]]
	; CHECK-NEXT: [[TMP26:%.*]] = add nsw i64 [[TMP25]], [[TMP11]]			; CHECK-NEXT: [[TMP26:%.*]] = add nsw i64 [[TMP25]], [[TMP11]]
	; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP26]]			; CHECK-NEXT: [[ARRAYIDX_US:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[TMP26]]
	; CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4			; CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr [[ARRAYIDX_US]], align 4
	; CHECK-NEXT: [[TMP28:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP1]]			; CHECK-NEXT: [[TMP28:%.*]] = mul nsw i64 [[INNER_IV]], [[TMP1]]
	; CHECK-NEXT: [[TMP29:%.*]] = add nsw i64 [[TMP28]], [[TMP12]]			; CHECK-NEXT: [[TMP29:%.*]] = add nsw i64 [[TMP28]], [[TMP12]]
	; CHECK-NEXT: [[ARRAYIDX11_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP29]]			; CHECK-NEXT: [[ARRAYIDX11_US:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP29]]
	; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[ARRAYIDX11_US]], align 4			; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[ARRAYIDX11_US]], align 4
	; CHECK-NEXT: [[ADD12_US:%.*]] = add nsw i32 [[TMP30]], [[TMP27]]			; CHECK-NEXT: [[ADD12_US:%.*]] = add nsw i32 [[TMP30]], [[TMP27]]
	; CHECK-NEXT: store i32 [[ADD12_US]], ptr [[ARRAYIDX11_US]], align 4			; CHECK-NEXT: store i32 [[ADD12_US]], ptr [[ARRAYIDX11_US]], align 4
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nuw nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP61:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[INNER_LOOP_EXIT]], label [[INNER_LOOP]], !llvm.loop [[LOOP66:![0-9]+]]
	; CHECK: inner.loop.exit:			; CHECK: inner.loop.exit:
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nuw nsw i64 [[OUTER_IV]], 1
	; CHECK-NEXT: [[EXITCOND40_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT39]]			; CHECK-NEXT: [[EXITCOND40_NOT:%.*]] = icmp eq i64 [[OUTER_IV_NEXT]], [[WIDE_TRIP_COUNT39]]
	; CHECK-NEXT: br i1 [[EXITCOND40_NOT]], label [[EXIT_LOOPEXIT:%.*]], label [[OUTER_LOOP]]			; CHECK-NEXT: br i1 [[EXITCOND40_NOT]], label [[EXIT_LOOPEXIT:%.*]], label [[OUTER_LOOP]]
	; CHECK: exit.loopexit:			; CHECK: exit.loopexit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loopClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 553081

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll

[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
ClosedPublic