This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/2
LoopAccessAnalysis.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
9/10
LoopAccessAnalysis.cpp
-
Transforms/
-
Utils/
2/2
LoopUtils.cpp
-
Vectorize/
2/3
LoopVectorize.cpp
-
test/Transforms/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-vector-reverse.ll
-
ARM/
-
mve-qabs.ll
-
X86/
-
masked_load_store.ll
-
first-order-recurrence.ll
-
fpsat.ll
-
multiple-exits-versioning.ll
-
no_outside_user.ll
-
runtime-check-readonly.ll
-
runtime-check-small-clamped-bounds.ll
-
runtime-check.ll
-
runtime-checks-difference.ll
-
scalable-loop-unpredicated-body-scalar-tail.ll
-
tbaa-nodep.ll
-
PhaseOrdering/
-
AArch64/
-
hoisting-sinking-required-for-vectorization.ll
-
X86/
-
vdiv.ll

Differential D119078

[LAA,LV] Add initial support for pointer-diff memory checks.
ClosedPublic

Authored by fhahn on Feb 6 2022, 6:08 AM.

Download Raw Diff

Details

Reviewers

efriedma
reames
Ayal
anemet
dmgreen
lebedev.ri

Commits

rGb7315ffc3c92: [LAA,LV] Add initial support for pointer-diff memory checks.

Summary

This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.

The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
VF * UF * AccessSize. It is sufficient to check
(Sink - Src) <u VF * UF * AccessSize to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.

Note that the initial version is restricted in multiple ways:

1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information)
Source and sink pointers must be add-recs, with matching steps
The step must be a constant.
abs(step) == AccessSize.

Most of those restrictions can be relaxed in the future.

See https://github.com/llvm/llvm-project/issues/53590.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Feb 6 2022, 6:08 AM

Herald added subscribers: dmgreen, hiraditya. · View Herald TranscriptFeb 6 2022, 6:08 AM

fhahn requested review of this revision.Feb 6 2022, 6:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2022, 6:08 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lebedev.ri edited the summary of this revision. (Show Details)Feb 6 2022, 6:12 AM

Harbormaster completed remote builds in B147815: Diff 406243.Feb 6 2022, 6:39 AM

peterwaller-arm added a subscriber: peterwaller-arm.Feb 7 2022, 8:38 AM

fhahn mentioned this in rG1049735d0739: [LV] Adjust accesses in test to ensure full RT checks are generated..Feb 7 2022, 10:08 AM

Cleanup, fix failing tests, add comments.

This should now be ready for a first round of reviews.

Harbormaster completed remote builds in B148030: Diff 406517.Feb 7 2022, 12:47 PM

If I'm following this correctly, this is basically doing two things. One, it reduces the minimum distance by basing the required distance on the vector factor, instead of the start/end of the whole loop. Two, it takes advantage of the ordering of different operations: a false dependency is okay as long as you perform the operations in the correct order.

It looks like we end up rejecting at runtime if the two pointers are exactly equal; if I'm following correctly, that's an unnecessary restriction?

Does the code generator actually guarantee that we perform operations in the correct order? Does interleaving matter? Other transforms?

reames added inline comments.Feb 7 2022, 5:05 PM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
310	This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading us towards disallowing pointer subtractions involving distinct memory objects. This change very much relies on the historical behavior of doing an implicit inttoptr. See the comment of doing getMinusSCEV. In fact, I think you are accidentally only applying this new logic in the case where we can prove the two accesses share a common base object. If you're okay documenting that restriction, we can punt the pointer_subtract semantics question one step further down the road.
llvm/lib/Transforms/Utils/LoopUtils.cpp
1628	Somewhat an aside, but this is one more place where having a general SCEV note for a predicate would seem useful. In this case, we'd cache the overlap predicate directly, and the emission code wouldn't need to be so closely coupled.

In D119078#3302863, @efriedma wrote:

If I'm following this correctly, this is basically doing two things. One, it reduces the minimum distance by basing the required distance on the vector factor, instead of the start/end of the whole loop. Two, it takes advantage of the ordering of different operations: a false dependency is okay as long as you perform the operations in the correct order.

That's a great summary, thanks! Those 2 things do not need to be done in a single step/patch. It is also possible (and simpler) to start with 2) - taking advantage of the ordering. I updated the patch to *only* take advantage of the ordering to emit one of the existing checks we generate (instead of 2). (This was also suggested by @Ayal offline)

The latest version of the patch emits the following check for each (src, sink) pair (if they be classified): there is a conflict if lower bound of source < upper bound of sink, i.e. if src starts before sink ends.

Changing the checks to use the pointer-difference approach can be done as follow up and has the benefit Eli mentioned, as well as removing the requirement to have computable bounds.

It looks like we end up rejecting at runtime if the two pointers are exactly equal; if I'm following correctly, that's an unnecessary restriction?

Yes that's an unnecessary restriction of the original patch.

Does the code generator actually guarantee that we perform operations in the correct order? Does interleaving matter? Other transforms?

The distance needs to also include the interleave count, if the vectorizer interleaves the vector loop. I think we also cannot add !noalias metadata with the lightweight checks, but that should guarantee that other transformations work with the correct aliasing assumptions.

AFAICT we do not rely on the noalias property in any part of the vectorizer codegenerator at the moment, but there is a patch in flight that does (D110235). We need to be careful there.

fhahn marked an inline comment as done.Feb 8 2022, 12:49 PM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
310	This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading us towards disallowing pointer subtractions involving distinct memory objects. This change very much relies on the historical behavior of doing an implicit inttoptr. See the comment of doing getMinusSCEV. Thanks for the heads up, I missed that! The latest version of the patch has the pointer-difference part stripped, but I'll keep that in mind for the follow-up. The only reason to use SCEV here was to take advantage of SCEVExpander for convenience, but the code can also be generated without SCEV. In fact, I think you are accidentally only applying this new logic in the case where we can prove the two accesses share a common base object. We should only compute the difference if the bases are may-alias (or the step is not constant). I think `needsChecking` should skip any pointer-group pairs with a common base.
llvm/lib/Transforms/Utils/LoopUtils.cpp
1628	The notion of a predicate in SCEV would indeed be convenient here (and at the other places where we generate runtime checks). It would/should also allow for more convenient checking if the check can be simplified by SCEV.

Harbormaster completed remote builds in B148335: Diff 406940.Feb 8 2022, 4:08 PM

I realized that the current version the checks are overly restrictive. I'll adjust the change back to the difference check, handle the case when both pointers are equal and address the original code comments

bsmith added a subscriber: bsmith.Feb 28 2022, 2:59 AM

fhahn mentioned this in D121008: [MicroBenchmarks] Add benchmarks for vector memory check generation..Mar 4 2022, 9:30 AM

Rebase & ping.

I restored the original diff-check code, with a modification to not create pointer sub expressions using SCEV.

I also put up an initial patch (D121008) that adds a set of microbenchmarks to track the improvements by the patch and guard against future regressions.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 11:01 AM

Harbormaster completed remote builds in B152635: Diff 413074.Mar 4 2022, 11:28 AM

dtemirbulatov added a subscriber: dtemirbulatov.Mar 21 2022, 10:21 AM

ping :)

I have a set of MVE routines (that we were using as benchmarks) where this certainly helps a really decent amount.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
448	-> used to prove
llvm/lib/Analysis/LoopAccessAnalysis.cpp
248	I find its best not to overuse auto when the type isn't obvious. This is a PointerInfo?
284	Does this assume that both the AllocSize's are the same? If it picks the larger size - I think it would be OK so long as the steps below matched. And the smaller size might need some very strange code to cause problems. It might be worth checking for though.
285	Is this one of the methods removed with opaque pointers?

fhahn mentioned this in rT9d37c04e31a9: [MicroBenchmarks] Add benchmarks for vector memory check generation..Mar 28 2022, 10:21 AM

fhahn mentioned this in rG368d35a89440: [LV] Add addiitonal tests for pointer difference memory checks..Apr 4 2022, 9:59 AM

Address latest comments, thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
448	Updated, thanks!
llvm/lib/Analysis/LoopAccessAnalysis.cpp
284	Does this assume that both the AllocSize's are the same? Originally yes, but I updated the code to use the max. Added tests to cover those cases in 368d35a89440.
285	Good point, I updated the code to get the store/loaded type from the actual instructions.

Clean up a bit of code.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
248	Agreed, updated most uses of `auto` that do not use `dyn_cast`.

Harbormaster completed remote builds in B157759: Diff 420224.Apr 4 2022, 10:52 AM

fhahn mentioned this in D109368: [LV] Vectorize cases with larger number of RT checks, execute only if profitable..Apr 5 2022, 9:45 AM

dmgreen added inline comments.Apr 12 2022, 1:17 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
282	Is this the same as getLoadStoreType?

xbolva00 added a subscriber: xbolva00.Apr 12 2022, 1:59 AM

Update to use getLoadStoreType.

fhahn marked an inline comment as done.Apr 12 2022, 2:25 AM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
282	Yes, updated, thanks!

Harbormaster completed remote builds in B159175: Diff 422149.Apr 12 2022, 3:38 AM

Thanks. If there are no other comments from anyone, this LGTM

This revision is now accepted and ready to land.Apr 12 2022, 7:58 AM

Thanks, I plan to land this in the next few days, unless there are any further comments :)

Rebased before commit.

Harbormaster completed remote builds in B164620: Diff 429679.May 16 2022, 5:06 AM

This revision was landed with ongoing or failed builds.May 16 2022, 7:27 AM

Closed by commit rGb7315ffc3c92: [LAA,LV] Add initial support for pointer-diff memory checks. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGb7315ffc3c92: [LAA,LV] Add initial support for pointer-diff memory checks..

bjope added a subscriber: bjope.May 26 2022, 4:09 AM

bjope added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3090	Hi @fhahn and @dmgreen (and well, anyone else who might happen to read this). I've been trying to understand some regressions that we've seen downstream after this patch. It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here). What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well. So far I haven't figured out what to do downstream in this situation. Maybe we should look into the SWP scheduler to see if it can deduce "no alias" in some other way (I'm not sure, but I figure SWP isn't requiring no overlap across all iterations, but depending on how much pipelining it might require no overlap across iteration N and N+1 etc.). Maybe we should add some heuristic already in the LoopVectorizer to not use the new kind of memory checks when we think that it would block SWP (an initial heuristic would probably be to use the old kind of checks for out target). Here I'm not quite sure about the plans in-tree for this. Are the new memory checks supposed to replace the old checks in the future? If anyone has some insight/ideas here, then I'd be happy to read your comments on this.

fhahn marked an inline comment as done.May 26 2022, 6:41 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3090	It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here). Yes exactly, the cheaper check only rules out dependences for the selected VF * IC, so no alias is not guaranteed for all accesses in the loop. What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well. That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky' earlier because the dependence checks by LV were checking more than is required for vectorization. If pipelining is profitable but requires runtime checks, then ideally the software pipeliner would emit them (and replace the LV checks with the stricter checks). If you are talking about `MachinePipeliner`, which runs on machine-functions, this is likely going to be very difficult unfortunately. A more crude solution would be to introduce a TTI hook to opt-out of the more lightweight checks. The drawback here is that the backend misses out on cases where the lightweight checks are sufficient because no pipelining is happening. If it is enough to rule out no-aliasing for the pipelined iterations another option might be for LV to emit the difference checks with a slightly larger distance.

uabelho added a subscriber: uabelho.May 30 2022, 1:06 AM

KennethH added a subscriber: KennethH.May 30 2022, 2:43 AM

bjope added inline comments.May 31 2022, 4:31 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3090	That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky' earlier because the dependence checks by LV were checking more than is required for vectorization. Yes, agree. We've been lucky in the past. If pipelining is profitable but requires runtime checks, then ideally the software pipeliner would emit them (and replace the LV checks with the stricter checks). If you are talking about `MachinePipeliner`, which runs on machine-functions, this is likely going to be very difficult unfortunately. We are not using the in-tree MachinePipeliner directly, but we got a software pipeline pass that runs late in the backend (both pre-RA and also later after register allocation). And yes, it is a bit difficult to add stricter checks at that stage (I figure specially without adding a lot more loop versioning for both the vector and scalar loop). A more crude solution would be to introduce a TTI hook to opt-out of the more lightweight checks. The drawback here is that the backend misses out on cases where the lightweight checks are sufficient because no pipelining is happening. This is probably what we will aim for (downstream), at least as a starting point to give us some time to analyse the regressions a bit more to see if we can find some heuristics or alternative solutions in the future. If it is enough to rule out no-aliasing for the pipelined iterations another option might be for LV to emit the difference checks with a slightly larger distance. This is an interesting solution. One problem could be how to setup that distance (I think SWP is trying to pipeline using different distances). A bigger problem is how to analyze the runtime checks that late in the backend to understand for which distances SWP is "safe". Maybe some kind of metadata could be used, but then that metadata must be propagated properly all the way from the vectorizer to the backend.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

53 lines

Transforms/

Utils/

LoopUtils.h

7 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

98 lines

Transforms/

Utils/

LoopUtils.cpp

34 lines

Vectorize/

LoopVectorize.cpp

42 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-vector-reverse.ll

52 lines

ARM/

mve-qabs.ll

72 lines

X86/

masked_load_store.ll

480 lines

first-order-recurrence.ll

486 lines

fpsat.ll

40 lines

multiple-exits-versioning.ll

6 lines

no_outside_user.ll

4 lines

runtime-check-readonly.ll

22 lines

runtime-check-small-clamped-bounds.ll

44 lines

runtime-check.ll

20 lines

runtime-checks-difference.ll

64 lines

scalable-loop-unpredicated-body-scalar-tail.ll

16 lines

tbaa-nodep.ll

14 lines

PhaseOrdering/

AArch64/

hoisting-sinking-required-for-vectorization.ll

27 lines

X86/

vdiv.ll

40 lines

Diff 429711

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	DenseMap<Instruction *, unsigned> generateInstructionOrderMap() const {

return OrderMap;		return OrderMap;
}		}

/// Find the set of instructions that read or write via \p Ptr.		/// Find the set of instructions that read or write via \p Ptr.
SmallVector<Instruction , 4> getInstructionsForAccess(Value Ptr,		SmallVector<Instruction , 4> getInstructionsForAccess(Value Ptr,
bool isWrite) const;		bool isWrite) const;

		/// Return the program order indices for the access location (Ptr, IsWrite).
		/// Returns an empty ArrayRef if there are no accesses for the location.
		ArrayRef<unsigned> getOrderForAccess(Value *Ptr, bool IsWrite) const {
		auto I = Accesses.find({Ptr, IsWrite});
		if (I != Accesses.end())
		return I->second;
		return {};
		}

private:		private:
/// A wrapper around ScalarEvolution, used to add runtime SCEV checks, and		/// A wrapper around ScalarEvolution, used to add runtime SCEV checks, and
/// applies dynamic knowledge to simplify SCEV expressions and convert them		/// applies dynamic knowledge to simplify SCEV expressions and convert them
/// to a more usable form. We need this in case assumptions about SCEV		/// to a more usable form. We need this in case assumptions about SCEV
/// expressions need to be made in order to avoid unknown dependences. For		/// expressions need to be made in order to avoid unknown dependences. For
/// example we might assume a unit stride for a pointer in order to prove		/// example we might assume a unit stride for a pointer in order to prove
/// that a memory access is strided and doesn't wrap.		/// that a memory access is strided and doesn't wrap.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	struct RuntimeCheckingPtrGroup {
unsigned AddressSpace;		unsigned AddressSpace;
};		};

/// A memcheck which made up of a pair of grouped pointers.		/// A memcheck which made up of a pair of grouped pointers.
typedef std::pair<const RuntimeCheckingPtrGroup *,		typedef std::pair<const RuntimeCheckingPtrGroup *,
const RuntimeCheckingPtrGroup *>		const RuntimeCheckingPtrGroup *>
RuntimePointerCheck;		RuntimePointerCheck;

		struct PointerDiffInfo {
		const SCEV *SrcStart;
		const SCEV *SinkStart;
		unsigned AccessSize;

		PointerDiffInfo(const SCEV SrcStart, const SCEV SinkStart,
		unsigned AccessSize)
		: SrcStart(SrcStart), SinkStart(SinkStart), AccessSize(AccessSize) {}
		};

/// Holds information about the memory runtime legality checks to verify		/// Holds information about the memory runtime legality checks to verify
/// that a group of pointers do not overlap.		/// that a group of pointers do not overlap.
class RuntimePointerChecking {		class RuntimePointerChecking {
friend struct RuntimeCheckingPtrGroup;		friend struct RuntimeCheckingPtrGroup;

public:		public:
struct PointerInfo {		struct PointerInfo {
/// Holds the pointer value that we need to check.		/// Holds the pointer value that we need to check.
Show All 17 Lines	struct PointerInfo {
PointerInfo(Value PointerValue, const SCEV Start, const SCEV *End,		PointerInfo(Value PointerValue, const SCEV Start, const SCEV *End,
bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,		bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,
const SCEV *Expr)		const SCEV *Expr)
: PointerValue(PointerValue), Start(Start), End(End),		: PointerValue(PointerValue), Start(Start), End(End),
IsWritePtr(IsWritePtr), DependencySetId(DependencySetId),		IsWritePtr(IsWritePtr), DependencySetId(DependencySetId),
AliasSetId(AliasSetId), Expr(Expr) {}		AliasSetId(AliasSetId), Expr(Expr) {}
};		};

RuntimePointerChecking(ScalarEvolution *SE) : SE(SE) {}		RuntimePointerChecking(MemoryDepChecker &DC, ScalarEvolution *SE)
		: DC(DC), SE(SE) {}

/// Reset the state of the pointer runtime information.		/// Reset the state of the pointer runtime information.
void reset() {		void reset() {
Need = false;		Need = false;
Pointers.clear();		Pointers.clear();
Checks.clear();		Checks.clear();
}		}

Show All 9 Lines	public:
/// No run-time memory checking is necessary.		/// No run-time memory checking is necessary.
bool empty() const { return Pointers.empty(); }		bool empty() const { return Pointers.empty(); }

/// Generate the checks and store it. This also performs the grouping		/// Generate the checks and store it. This also performs the grouping
/// of pointers to reduce the number of memchecks necessary.		/// of pointers to reduce the number of memchecks necessary.
void generateChecks(MemoryDepChecker::DepCandidates &DepCands,		void generateChecks(MemoryDepChecker::DepCandidates &DepCands,
bool UseDependencies);		bool UseDependencies);

/// Returns the checks that generateChecks created.		/// Returns the checks that generateChecks created. They can be used to ensure
		/// no read/write accesses overlap across all loop iterations.
const SmallVectorImpl<RuntimePointerCheck> &getChecks() const {		const SmallVectorImpl<RuntimePointerCheck> &getChecks() const {
return Checks;		return Checks;
}		}

		// Returns an optional list of (pointer-difference expressions, access size)
		// pairs that can be used to prove that there are no vectorization-preventing
		dmgreenUnsubmitted Done Reply Inline Actions -> used to prove dmgreen: -> used to prove
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated, thanks! fhahn: Updated, thanks!
		// dependencies at runtime. There are is a vectorization-preventing dependency
		// if any pointer-difference is <u VF * InterleaveCount * access size. Returns
		// None if pointer-difference checks cannot be used.
		Optional<ArrayRef<PointerDiffInfo>> getDiffChecks() const {
		if (!CanUseDiffCheck)
		return None;
		return {DiffChecks};
		}

/// Decide if we need to add a check between two groups of pointers,		/// Decide if we need to add a check between two groups of pointers,
/// according to needsChecking.		/// according to needsChecking.
bool needsChecking(const RuntimeCheckingPtrGroup &M,		bool needsChecking(const RuntimeCheckingPtrGroup &M,
const RuntimeCheckingPtrGroup &N) const;		const RuntimeCheckingPtrGroup &N) const;

/// Returns the number of run-time checks required according to		/// Returns the number of run-time checks required according to
/// needsChecking.		/// needsChecking.
unsigned getNumberOfChecks() const { return Checks.size(); }		unsigned getNumberOfChecks() const { return Checks.size(); }
Show All 38 Lines	private:
/// Groups pointers such that a single memcheck is required		/// Groups pointers such that a single memcheck is required
/// between two different groups. This will clear the CheckingGroups vector		/// between two different groups. This will clear the CheckingGroups vector
/// and re-compute it. We will only group dependecies if \p UseDependencies		/// and re-compute it. We will only group dependecies if \p UseDependencies
/// is true, otherwise we will create a separate group for each pointer.		/// is true, otherwise we will create a separate group for each pointer.
void groupChecks(MemoryDepChecker::DepCandidates &DepCands,		void groupChecks(MemoryDepChecker::DepCandidates &DepCands,
bool UseDependencies);		bool UseDependencies);

/// Generate the checks and return them.		/// Generate the checks and return them.
SmallVector<RuntimePointerCheck, 4> generateChecks() const;		SmallVector<RuntimePointerCheck, 4> generateChecks();

		/// Try to create add a new (pointer-difference, access size) pair to
		/// DiffCheck for checking groups \p CGI and \p CGJ. If pointer-difference
		/// checks cannot be used for the groups, set CanUseDiffCheck to false.
		void tryToCreateDiffCheck(const RuntimeCheckingPtrGroup &CGI,
		const RuntimeCheckingPtrGroup &CGJ);

		MemoryDepChecker &DC;

/// Holds a pointer to the ScalarEvolution analysis.		/// Holds a pointer to the ScalarEvolution analysis.
ScalarEvolution *SE;		ScalarEvolution *SE;

/// Set of run-time checks required to establish independence of		/// Set of run-time checks required to establish independence of
/// otherwise may-aliasing pointers in the loop.		/// otherwise may-aliasing pointers in the loop.
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

		/// Flag indicating if pointer-difference checks can be used
		bool CanUseDiffCheck = true;

		/// A list of (pointer-difference, access size) pairs that can be used to
		/// prove that there are no vectorization-preventing dependencies.
		SmallVector<PointerDiffInfo> DiffChecks;
};		};

/// Drive the analysis of memory accesses in the loop		/// Drive the analysis of memory accesses in the loop
///		///
/// This class is responsible for analyzing the memory accesses of a loop. It		/// This class is responsible for analyzing the memory accesses of a loop. It
/// collects the accesses and then its main helper the AccessAnalysis class		/// collects the accesses and then its main helper the AccessAnalysis class
/// finds and categorizes the dependences in buildDependenceSets.		/// finds and categorizes the dependences in buildDependenceSets.
///		///
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	//===- llvm/Transforms/Utils/LoopUtils.h - Loop utilities -------- C++ --===//			//===- llvm/Transforms/Utils/LoopUtils.h - Loop utilities -------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines some loop transformation utilities.			// This file defines some loop transformation utilities.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_UTILS_LOOPUTILS_H			#ifndef LLVM_TRANSFORMS_UTILS_LOOPUTILS_H
	#define LLVM_TRANSFORMS_UTILS_LOOPUTILS_H			#define LLVM_TRANSFORMS_UTILS_LOOPUTILS_H

	#include "llvm/Analysis/IVDescriptors.h"			#include "llvm/Analysis/IVDescriptors.h"
				#include "llvm/Analysis/LoopAccessAnalysis.h"
	#include "llvm/Transforms/Utils/ValueMapper.h"			#include "llvm/Transforms/Utils/ValueMapper.h"

	namespace llvm {			namespace llvm {

	template <typename T> class DomTreeNodeBase;			template <typename T> class DomTreeNodeBase;
	using DomTreeNode = DomTreeNodeBase<BasicBlock>;			using DomTreeNode = DomTreeNodeBase<BasicBlock>;
	class StringRef;			class StringRef;
	class AnalysisUsage;			class AnalysisUsage;
	▲ Show 20 Lines • Show All 469 Lines • ▼ Show 20 Lines

	/// Add code that checks at runtime if the accessed arrays in \p PointerChecks			/// Add code that checks at runtime if the accessed arrays in \p PointerChecks
	/// overlap. Returns the final comparator value or NULL if no check is needed.			/// overlap. Returns the final comparator value or NULL if no check is needed.
	Value *			Value *
	addRuntimeChecks(Instruction Loc, Loop TheLoop,			addRuntimeChecks(Instruction Loc, Loop TheLoop,
	const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,			const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,
	SCEVExpander &Expander);			SCEVExpander &Expander);

				Value *
				addDiffRuntimeChecks(Instruction Loc, Loop TheLoop,
				ArrayRef<PointerDiffInfo> Checks, SCEVExpander &Expander,
				function_ref<Value *(IRBuilderBase &, unsigned)> GetVF,
				unsigned IC);

	/// Struct to hold information about a partially invariant condition.			/// Struct to hold information about a partially invariant condition.
	struct IVConditionInfo {			struct IVConditionInfo {
	/// Instructions that need to be duplicated and checked for the unswitching			/// Instructions that need to be duplicated and checked for the unswitching
	/// condition.			/// condition.
	SmallVector<Instruction *> InstToDuplicate;			SmallVector<Instruction *> InstToDuplicate;

	/// Constant to indicate for which value the condition is invariant.			/// Constant to indicate for which value the condition is invariant.
	Constant *KnownValue = nullptr;			Constant *KnownValue = nullptr;
	Show All 27 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	void RuntimePointerChecking::insert(Loop Lp, Value Ptr, const SCEV *PtrExpr,
auto &DL = Lp->getHeader()->getModule()->getDataLayout();		auto &DL = Lp->getHeader()->getModule()->getDataLayout();
Type *IdxTy = DL.getIndexType(Ptr->getType());		Type *IdxTy = DL.getIndexType(Ptr->getType());
const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);		const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);		ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);

Pointers.emplace_back(Ptr, ScStart, ScEnd, WritePtr, DepSetId, ASId, PtrExpr);		Pointers.emplace_back(Ptr, ScStart, ScEnd, WritePtr, DepSetId, ASId, PtrExpr);
}		}

SmallVector<RuntimePointerCheck, 4>		void RuntimePointerChecking::tryToCreateDiffCheck(
RuntimePointerChecking::generateChecks() const {		const RuntimeCheckingPtrGroup &CGI, const RuntimeCheckingPtrGroup &CGJ) {
		if (!CanUseDiffCheck)
		return;

		// If either group contains multiple different pointers, bail out.
		// TODO: Support multiple pointers by using the minimum or maximum pointer,
		// depending on src & sink.
		if (CGI.Members.size() != 1 \|\| CGJ.Members.size() != 1) {
		CanUseDiffCheck = false;
		return;
		}

		PointerInfo *Src = &Pointers[CGI.Members[0]];
		dmgreenUnsubmitted Done Reply Inline Actions I find its best not to overuse auto when the type isn't obvious. This is a PointerInfo? dmgreen: I find its best not to overuse auto when the type isn't obvious. This is a PointerInfo?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Agreed, updated most uses of `auto` that do not use `dyn_cast`. fhahn: Agreed, updated most uses of `auto` that do not use `dyn_cast`.
		PointerInfo *Sink = &Pointers[CGJ.Members[0]];

		// If either pointer is read and written, multiple checks may be needed. Bail
		// out.
		if (!DC.getOrderForAccess(Src->PointerValue, !Src->IsWritePtr).empty() \|\|
		!DC.getOrderForAccess(Sink->PointerValue, !Sink->IsWritePtr).empty()) {
		CanUseDiffCheck = false;
		return;
		}

		ArrayRef<unsigned> AccSrc =
		DC.getOrderForAccess(Src->PointerValue, Src->IsWritePtr);
		ArrayRef<unsigned> AccSink =
		DC.getOrderForAccess(Sink->PointerValue, Sink->IsWritePtr);
		// If either pointer is accessed multiple times, there may not be a clear
		// src/sink relation. Bail out for now.
		if (AccSrc.size() != 1 \|\| AccSink.size() != 1) {
		CanUseDiffCheck = false;
		return;
		}
		// If the sink is accessed before src, swap src/sink.
		if (AccSink[0] < AccSrc[0])
		std::swap(Src, Sink);

		auto *SrcAR = dyn_cast<SCEVAddRecExpr>(Src->Expr);
		auto *SinkAR = dyn_cast<SCEVAddRecExpr>(Sink->Expr);
		if (!SrcAR \|\| !SinkAR) {
		CanUseDiffCheck = false;
		return;
		}

		const DataLayout &DL =
		SinkAR->getLoop()->getHeader()->getModule()->getDataLayout();
		SmallVector<Instruction *, 4> SrcInsts =
		dmgreenUnsubmitted Done Reply Inline Actions Is this the same as getLoadStoreType? dmgreen: Is this the same as getLoadStoreType?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, updated, thanks! fhahn: Yes, updated, thanks!
		DC.getInstructionsForAccess(Src->PointerValue, Src->IsWritePtr);
		SmallVector<Instruction *, 4> SinkInsts =
		dmgreenUnsubmitted Done Reply Inline Actions Does this assume that both the AllocSize's are the same? If it picks the larger size - I think it would be OK so long as the steps below matched. And the smaller size might need some very strange code to cause problems. It might be worth checking for though. dmgreen: Does this assume that both the AllocSize's are the same? If it picks the larger size - I think…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Does this assume that both the AllocSize's are the same? Originally yes, but I updated the code to use the max. Added tests to cover those cases in 368d35a89440. fhahn: > Does this assume that both the AllocSize's are the same? Originally yes, but I updated the…
		DC.getInstructionsForAccess(Sink->PointerValue, Sink->IsWritePtr);
		dmgreenUnsubmitted Done Reply Inline Actions Is this one of the methods removed with opaque pointers? dmgreen: Is this one of the methods removed with opaque pointers?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Good point, I updated the code to get the store/loaded type from the actual instructions. fhahn: Good point, I updated the code to get the store/loaded type from the actual instructions.
		Type *SrcTy = getLoadStoreType(SrcInsts[0]);
		Type *DstTy = getLoadStoreType(SinkInsts[0]);
		if (isa<ScalableVectorType>(SrcTy) \|\| isa<ScalableVectorType>(DstTy))
		return;
		unsigned AllocSize =
		std::max(DL.getTypeAllocSize(SrcTy), DL.getTypeAllocSize(DstTy));
		IntegerType *IntTy =
		IntegerType::get(Src->PointerValue->getContext(),
		DL.getPointerSizeInBits(CGI.AddressSpace));

		// Only matching constant steps matching the AllocSize are supported at the
		// moment. This simplifies the difference computation. Can be extended in the
		// future.
		auto Step = dyn_cast<SCEVConstant>(SinkAR->getStepRecurrence(SE));
		if (!Step \|\| Step != SrcAR->getStepRecurrence(*SE) \|\|
		Step->getAPInt().abs() != AllocSize) {
		CanUseDiffCheck = false;
		return;
		}

		// When counting down, the dependence distance needs to be swapped.
		if (Step->getValue()->isNegative())
		std::swap(SinkAR, SrcAR);

		const SCEV *SinkStartInt = SE->getPtrToIntExpr(SinkAR->getStart(), IntTy);
		reamesUnsubmitted Not Done Reply Inline Actions This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading us towards disallowing pointer subtractions involving distinct memory objects. This change very much relies on the historical behavior of doing an implicit inttoptr. See the comment of doing getMinusSCEV. In fact, I think you are accidentally only applying this new logic in the case where we can prove the two accesses share a common base object. If you're okay documenting that restriction, we can punt the pointer_subtract semantics question one step further down the road. reames: This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading us…
		fhahnAuthorUnsubmitted Done Reply Inline Actions This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading us towards disallowing pointer subtractions involving distinct memory objects. This change very much relies on the historical behavior of doing an implicit inttoptr. See the comment of doing getMinusSCEV. Thanks for the heads up, I missed that! The latest version of the patch has the pointer-difference part stripped, but I'll keep that in mind for the follow-up. The only reason to use SCEV here was to take advantage of SCEVExpander for convenience, but the code can also be generated without SCEV. In fact, I think you are accidentally only applying this new logic in the case where we can prove the two accesses share a common base object. We should only compute the difference if the bases are may-alias (or the step is not constant). I think `needsChecking` should skip any pointer-group pairs with a common base. fhahn: > This bit makes me uncomfortable. The recent direction of changes to SCEV seem to be leading…
		const SCEV *SrcStartInt = SE->getPtrToIntExpr(SrcAR->getStart(), IntTy);
		if (isa<SCEVCouldNotCompute>(SinkStartInt) \|\|
		isa<SCEVCouldNotCompute>(SrcStartInt)) {
		CanUseDiffCheck = false;
		return;
		}
		DiffChecks.emplace_back(SrcStartInt, SinkStartInt, AllocSize);
		}

		SmallVector<RuntimePointerCheck, 4> RuntimePointerChecking::generateChecks() {
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

for (unsigned I = 0; I < CheckingGroups.size(); ++I) {		for (unsigned I = 0; I < CheckingGroups.size(); ++I) {
for (unsigned J = I + 1; J < CheckingGroups.size(); ++J) {		for (unsigned J = I + 1; J < CheckingGroups.size(); ++J) {
const RuntimeCheckingPtrGroup &CGI = CheckingGroups[I];		const RuntimeCheckingPtrGroup &CGI = CheckingGroups[I];
const RuntimeCheckingPtrGroup &CGJ = CheckingGroups[J];		const RuntimeCheckingPtrGroup &CGJ = CheckingGroups[J];

if (needsChecking(CGI, CGJ))		if (needsChecking(CGI, CGJ)) {
		tryToCreateDiffCheck(CGI, CGJ);
Checks.push_back(std::make_pair(&CGI, &CGJ));		Checks.push_back(std::make_pair(&CGI, &CGJ));
}		}
}		}
		}
return Checks;		return Checks;
}		}

void RuntimePointerChecking::generateChecks(		void RuntimePointerChecking::generateChecks(
MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {		MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {
assert(Checks.empty() && "Checks is not empty");		assert(Checks.empty() && "Checks is not empty");
groupChecks(DepCands, UseDependencies);		groupChecks(DepCands, UseDependencies);
Checks = generateChecks();		Checks = generateChecks();
▲ Show 20 Lines • Show All 2,070 Lines • ▼ Show 20 Lines	void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
SymbolicStrides[Ptr] = Stride;		SymbolicStrides[Ptr] = Stride;
StrideSet.insert(Stride);		StrideSet.insert(Stride);
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const TargetLibraryInfo TLI, AAResults AA,		const TargetLibraryInfo TLI, AAResults AA,
DominatorTree DT, LoopInfo LI)		DominatorTree DT, LoopInfo LI)
: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),		: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),
PtrRtChecking(std::make_unique<RuntimePointerChecking>(SE)),		PtrRtChecking(nullptr),
DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L) {		DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L) {
if (canAnalyzeLoop())		PtrRtChecking = std::make_unique<RuntimePointerChecking>(*DepChecker, SE);
		if (canAnalyzeLoop()) {
analyzeLoop(AA, LI, TLI, DT);		analyzeLoop(AA, LI, TLI, DT);
}		}
		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
if (CanVecMem) {		if (CanVecMem) {
OS.indent(Depth) << "Memory dependences are safe";		OS.indent(Depth) << "Memory dependences are safe";
if (MaxSafeDepDistBytes != -1ULL)		if (MaxSafeDepDistBytes != -1ULL)
OS << " with a maximum dependence distance of " << MaxSafeDepDistBytes		OS << " with a maximum dependence distance of " << MaxSafeDepDistBytes
<< " bytes";		<< " bytes";
if (PtrRtChecking->Need)		if (PtrRtChecking->Need)
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 1,602 Lines • ▼ Show 20 Lines	if (MemoryRuntimeCheck) {
ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict, "conflict.rdx");		ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict, "conflict.rdx");
}		}
MemoryRuntimeCheck = IsConflict;		MemoryRuntimeCheck = IsConflict;
}		}

return MemoryRuntimeCheck;		return MemoryRuntimeCheck;
}		}

		Value *llvm::addDiffRuntimeChecks(
		Instruction Loc, Loop TheLoop, ArrayRef<PointerDiffInfo> Checks,
		SCEVExpander &Expander,
		function_ref<Value *(IRBuilderBase &, unsigned)> GetVF, unsigned IC) {

		LLVMContext &Ctx = Loc->getContext();
		IRBuilder<InstSimplifyFolder> ChkBuilder(Ctx,
		Loc->getModule()->getDataLayout());
		ChkBuilder.SetInsertPoint(Loc);
		// Our instructions might fold to a constant.
		Value *MemoryRuntimeCheck = nullptr;

		for (auto &C : Checks) {
		Type *Ty = C.SinkStart->getType();
		// Compute VF * IC * AccessSize.
		auto *VFTimesUFTimesSize =
		ChkBuilder.CreateMul(GetVF(ChkBuilder, Ty->getScalarSizeInBits()),
		ConstantInt::get(Ty, IC * C.AccessSize));
		reamesUnsubmitted Done Reply Inline Actions Somewhat an aside, but this is one more place where having a general SCEV note for a predicate would seem useful. In this case, we'd cache the overlap predicate directly, and the emission code wouldn't need to be so closely coupled. reames: Somewhat an aside, but this is one more place where having a general SCEV note for a predicate…
		fhahnAuthorUnsubmitted Done Reply Inline Actions The notion of a predicate in SCEV would indeed be convenient here (and at the other places where we generate runtime checks). It would/should also allow for more convenient checking if the check can be simplified by SCEV. fhahn: The notion of a predicate in SCEV would indeed be convenient here (and at the other places…
		Value *Sink = Expander.expandCodeFor(C.SinkStart, Ty, Loc);
		Value *Src = Expander.expandCodeFor(C.SrcStart, Ty, Loc);
		Value *Diff = ChkBuilder.CreateSub(Sink, Src);
		Value *IsConflict =
		ChkBuilder.CreateICmpULT(Diff, VFTimesUFTimesSize, "diff.check");

		if (MemoryRuntimeCheck) {
		IsConflict =
		ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict, "conflict.rdx");
		}
		MemoryRuntimeCheck = IsConflict;
		}

		return MemoryRuntimeCheck;
		}

Optional<IVConditionInfo> llvm::hasPartialIVCondition(Loop &L,		Optional<IVConditionInfo> llvm::hasPartialIVCondition(Loop &L,
unsigned MSSAThreshold,		unsigned MSSAThreshold,
MemorySSA &MSSA,		MemorySSA &MSSA,
AAResults &AA) {		AAResults &AA) {
auto *TI = dyn_cast<BranchInst>(L.getHeader()->getTerminator());		auto *TI = dyn_cast<BranchInst>(L.getHeader()->getTerminator());
if (!TI \|\| !TI->isConditional())		if (!TI \|\| !TI->isConditional())
return {};		return {};

▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,964 Lines • ▼ Show 20 Lines	GeneratedRTChecks(ScalarEvolution &SE, DominatorTree DT, LoopInfo LI,
MemCheckExp(SE, DL, "scev.check") {}		MemCheckExp(SE, DL, "scev.check") {}

/// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can		/// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can
/// accurately estimate the cost of the runtime checks. The blocks are		/// accurately estimate the cost of the runtime checks. The blocks are
/// un-linked from the IR and is added back during vector code generation. If		/// un-linked from the IR and is added back during vector code generation. If
/// there is no vector code generation, the check blocks are removed		/// there is no vector code generation, the check blocks are removed
/// completely.		/// completely.
void Create(Loop *L, const LoopAccessInfo &LAI,		void Create(Loop *L, const LoopAccessInfo &LAI,
const SCEVPredicate &Pred) {		const SCEVPredicate &UnionPred, ElementCount VF, unsigned IC) {

BasicBlock *LoopHeader = L->getHeader();		BasicBlock *LoopHeader = L->getHeader();
BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();

// Use SplitBlock to create blocks for SCEV & memory runtime checks to		// Use SplitBlock to create blocks for SCEV & memory runtime checks to
// ensure the blocks are properly added to LoopInfo & DominatorTree. Those		// ensure the blocks are properly added to LoopInfo & DominatorTree. Those
// may be used by SCEVExpander. The blocks will be un-linked from their		// may be used by SCEVExpander. The blocks will be un-linked from their
// predecessors and removed from LI & DT at the end of the function.		// predecessors and removed from LI & DT at the end of the function.
if (!Pred.isAlwaysTrue()) {		if (!UnionPred.isAlwaysTrue()) {
SCEVCheckBlock = SplitBlock(Preheader, Preheader->getTerminator(), DT, LI,		SCEVCheckBlock = SplitBlock(Preheader, Preheader->getTerminator(), DT, LI,
nullptr, "vector.scevcheck");		nullptr, "vector.scevcheck");

SCEVCheckCond = SCEVExp.expandCodeForPredicate(		SCEVCheckCond = SCEVExp.expandCodeForPredicate(
&Pred, SCEVCheckBlock->getTerminator());		&UnionPred, SCEVCheckBlock->getTerminator());
}		}

const auto &RtPtrChecking = *LAI.getRuntimePointerChecking();		const auto &RtPtrChecking = *LAI.getRuntimePointerChecking();
if (RtPtrChecking.Need) {		if (RtPtrChecking.Need) {
auto *Pred = SCEVCheckBlock ? SCEVCheckBlock : Preheader;		auto *Pred = SCEVCheckBlock ? SCEVCheckBlock : Preheader;
MemCheckBlock = SplitBlock(Pred, Pred->getTerminator(), DT, LI, nullptr,		MemCheckBlock = SplitBlock(Pred, Pred->getTerminator(), DT, LI, nullptr,
"vector.memcheck");		"vector.memcheck");

		auto DiffChecks = RtPtrChecking.getDiffChecks();
		if (DiffChecks) {
		MemRuntimeCheckCond = addDiffRuntimeChecks(
		MemCheckBlock->getTerminator(), L, *DiffChecks, MemCheckExp,
		[VF](IRBuilderBase &B, unsigned Bits) {
		return getRuntimeVF(B, B.getIntNTy(Bits), VF);
		},
		IC);
		} else {
MemRuntimeCheckCond =		MemRuntimeCheckCond =
addRuntimeChecks(MemCheckBlock->getTerminator(), L,		addRuntimeChecks(MemCheckBlock->getTerminator(), L,
RtPtrChecking.getChecks(), MemCheckExp);		RtPtrChecking.getChecks(), MemCheckExp);
		}
assert(MemRuntimeCheckCond &&		assert(MemRuntimeCheckCond &&
"no RT checks generated although RtPtrChecking "		"no RT checks generated although RtPtrChecking "
"claimed checks are required");		"claimed checks are required");
}		}

if (!MemCheckBlock && !SCEVCheckBlock)		if (!MemCheckBlock && !SCEVCheckBlock)
return;		return;

▲ Show 20 Lines • Show All 1,063 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
"(e.g., adding 'restrict').";		"(e.g., adding 'restrict').";
});		});
}		}

LoopBypassBlocks.push_back(MemCheckBlock);		LoopBypassBlocks.push_back(MemCheckBlock);

AddedSafetyChecks = true;		AddedSafetyChecks = true;

		// Only use noalias metadata when using memory checks guaranteeing no overlap
		// across all iterations.
		if (!Legal->getLAI()->getRuntimePointerChecking()->getDiffChecks()) {
		bjopeUnsubmitted Not Done Reply Inline Actions Hi @fhahn and @dmgreen (and well, anyone else who might happen to read this). I've been trying to understand some regressions that we've seen downstream after this patch. It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here). What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well. So far I haven't figured out what to do downstream in this situation. Maybe we should look into the SWP scheduler to see if it can deduce "no alias" in some other way (I'm not sure, but I figure SWP isn't requiring no overlap across all iterations, but depending on how much pipelining it might require no overlap across iteration N and N+1 etc.). Maybe we should add some heuristic already in the LoopVectorizer to not use the new kind of memory checks when we think that it would block SWP (an initial heuristic would probably be to use the old kind of checks for out target). Here I'm not quite sure about the plans in-tree for this. Are the new memory checks supposed to replace the old checks in the future? If anyone has some insight/ideas here, then I'd be happy to read your comments on this. bjope: Hi @fhahn and @dmgreen (and well, anyone else who might happen to read this). I've been trying…
		fhahnAuthorUnsubmitted Done Reply Inline Actions It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here). Yes exactly, the cheaper check only rules out dependences for the selected VF * IC, so no alias is not guaranteed for all accesses in the loop. What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well. That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky' earlier because the dependence checks by LV were checking more than is required for vectorization. If pipelining is profitable but requires runtime checks, then ideally the software pipeliner would emit them (and replace the LV checks with the stricter checks). If you are talking about `MachinePipeliner`, which runs on machine-functions, this is likely going to be very difficult unfortunately. A more crude solution would be to introduce a TTI hook to opt-out of the more lightweight checks. The drawback here is that the backend misses out on cases where the lightweight checks are sufficient because no pipelining is happening. If it is enough to rule out no-aliasing for the pipelined iterations another option might be for LV to emit the difference checks with a slightly larger distance. fhahn: > It seems to be related to this part, where the new memchecks somehow are weaker(?) in the…
		bjopeUnsubmitted Not Done Reply Inline Actions That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky' earlier because the dependence checks by LV were checking more than is required for vectorization. Yes, agree. We've been lucky in the past. If pipelining is profitable but requires runtime checks, then ideally the software pipeliner would emit them (and replace the LV checks with the stricter checks). If you are talking about `MachinePipeliner`, which runs on machine-functions, this is likely going to be very difficult unfortunately. We are not using the in-tree MachinePipeliner directly, but we got a software pipeline pass that runs late in the backend (both pre-RA and also later after register allocation). And yes, it is a bit difficult to add stricter checks at that stage (I figure specially without adding a lot more loop versioning for both the vector and scalar loop). A more crude solution would be to introduce a TTI hook to opt-out of the more lightweight checks. The drawback here is that the backend misses out on cases where the lightweight checks are sufficient because no pipelining is happening. This is probably what we will aim for (downstream), at least as a starting point to give us some time to analyse the regressions a bit more to see if we can find some heuristics or alternative solutions in the future. If it is enough to rule out no-aliasing for the pipelined iterations another option might be for LV to emit the difference checks with a slightly larger distance. This is an interesting solution. One problem could be how to setup that distance (I think SWP is trying to pipeline using different distances). A bigger problem is how to analyze the runtime checks that late in the backend to understand for which distances SWP is "safe". Maybe some kind of metadata could be used, but then that metadata must be propagated properly all the way from the vectorizer to the backend. bjope: > That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky'…
// We currently don't use LoopVersioning for the actual loop cloning but we		// We currently don't use LoopVersioning for the actual loop cloning but we
// still use it to add the noalias metadata.		// still use it to add the noalias metadata.
LVer = std::make_unique<LoopVersioning>(		LVer = std::make_unique<LoopVersioning>(
*Legal->getLAI(),		*Legal->getLAI(),
Legal->getLAI()->getRuntimePointerChecking()->getChecks(), OrigLoop, LI,		Legal->getLAI()->getRuntimePointerChecking()->getChecks(), OrigLoop, LI,
DT, PSE.getSE());		DT, PSE.getSE());
LVer->prepareNoAliasMetadata();		LVer->prepareNoAliasMetadata();
		}
return MemCheckBlock;		return MemCheckBlock;
}		}

void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {		void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarBody = OrigLoop->getHeader();		LoopScalarBody = OrigLoop->getHeader();
LoopVectorPreHeader = OrigLoop->getLoopPreheader();		LoopVectorPreHeader = OrigLoop->getLoopPreheader();
assert(LoopVectorPreHeader && "Invalid loop structure");		assert(LoopVectorPreHeader && "Invalid loop structure");
LoopExitBlock = OrigLoop->getUniqueExitBlock(); // may be nullptr		LoopExitBlock = OrigLoop->getUniqueExitBlock(); // may be nullptr
▲ Show 20 Lines • Show All 7,472 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
MDNode *OrigLoopID = L->getLoopID();		MDNode *OrigLoopID = L->getLoopID();
{		{
// Optimistically generate runtime checks. Drop them if they turn out to not		// Optimistically generate runtime checks. Drop them if they turn out to not
// be profitable. Limit the scope of Checks, so the cleanup happens		// be profitable. Limit the scope of Checks, so the cleanup happens
// immediately after vector codegeneration is done.		// immediately after vector codegeneration is done.
GeneratedRTChecks Checks(*PSE.getSE(), DT, LI,		GeneratedRTChecks Checks(*PSE.getSE(), DT, LI,
F->getParent()->getDataLayout());		F->getParent()->getDataLayout());
if (!VF.Width.isScalar() \|\| IC > 1)		if (!VF.Width.isScalar() \|\| IC > 1)
Checks.Create(L, *LVL.getLAI(), PSE.getPredicate());		Checks.Create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, IC);

using namespace ore;		using namespace ore;
if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
// interleave it.		// interleave it.
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,		InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
&CM, BFI, PSI, Checks);		&CM, BFI, PSI, Checks);
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; This is the loop in c++ being vectorize in this file with		; This is the loop in c++ being vectorize in this file with
;experimental.vector.reverse		;experimental.vector.reverse
; #pragma clang loop vectorize_width(8, scalable)		; #pragma clang loop vectorize_width(8, scalable)
; for (int i = N-1; i >= 0; --i)		; for (int i = N-1; i >= 0; --i)
; a[i] = b[i] + 1.0;		; a[i] = b[i] + 1.0;

; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s		; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{		define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{
; CHECK-LABEL: @vector_reverse_f64(		; CHECK-LABEL: @vector_reverse_f64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[A2:%.]] = ptrtoint double [[A:%.*]] to i64
		; CHECK-NEXT: [[B1:%.]] = ptrtoint double [[B:%.*]] to i64
; CHECK-NEXT: [[CMP7:%.]] = icmp sgt i64 [[N:%.]], 0		; CHECK-NEXT: [[CMP7:%.]] = icmp sgt i64 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP7]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP7]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 3		; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 3
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ugt i64 [[TMP1]], [[N]]		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ugt i64 [[TMP1]], [[N]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[A:%.*]], i64 [[N]]
; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr double, double [[B:%.*]], i64 [[N]]
; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt double [[SCEVGEP4]], [[A]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt double [[SCEVGEP]], [[B]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3		; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 6
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[B1]]
		; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], [[A2]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub i64 [[TMP5]], [[TMP6]]
		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP7]], [[TMP3]]
		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
		; CHECK-NEXT: [[TMP9:%.*]] = shl i64 [[TMP8]], 3
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP9]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1		; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]		; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()		; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8		; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8
; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1		; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1
; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64		; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP6]], i64 [[TMP9]]		; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP6]], i64 [[TMP9]]
; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <vscale x 8 x double>*		; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <vscale x 8 x double>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x double>, <vscale x 8 x double> [[TMP11]], align 8, !alias.scope !0		; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x double>, <vscale x 8 x double> [[TMP11]], align 8
; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP13:%.*]] = fadd <vscale x 8 x double> [[WIDE_LOAD]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer)		; CHECK-NEXT: [[TMP13:%.*]] = fadd <vscale x 8 x double> [[WIDE_LOAD]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer)
; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()		; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8		; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8
; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1		; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1
; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64		; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64
; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds double, double [[TMP12]], i64 [[TMP16]]		; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds double, double [[TMP12]], i64 [[TMP16]]
; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP17]] to <vscale x 8 x double>*		; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP17]] to <vscale x 8 x double>*
; CHECK-NEXT: store <vscale x 8 x double> [[TMP13]], <vscale x 8 x double>* [[TMP18]], align 8, !alias.scope !3, !noalias !0		; CHECK-NEXT: store <vscale x 8 x double> [[TMP13]], <vscale x 8 x double>* [[TMP18]], align 8
; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3		; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]
; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
Show All 33 Lines	for.body: ; preds = %entry, %for.body
%cmp = icmp sgt i64 %i.08.in, 1		%cmp = icmp sgt i64 %i.08.in, 1
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !0		br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !0
}		}


define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 {		define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 {
; CHECK-LABEL: @vector_reverse_i64(		; CHECK-LABEL: @vector_reverse_i64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[A2:%.]] = ptrtoint i64 [[A:%.*]] to i64
		; CHECK-NEXT: [[B1:%.]] = ptrtoint i64 [[B:%.*]] to i64
; CHECK-NEXT: [[CMP8:%.]] = icmp sgt i64 [[N:%.]], 0		; CHECK-NEXT: [[CMP8:%.]] = icmp sgt i64 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP8]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP8]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 3		; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 3
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ugt i64 [[TMP1]], [[N]]		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ugt i64 [[TMP1]], [[N]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i64, i64 [[A:%.*]], i64 [[N]]
; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i64, i64 [[B:%.*]], i64 [[N]]
; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i64 [[SCEVGEP4]], [[A]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i64 [[SCEVGEP]], [[B]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3		; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 6
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N]], 3
		; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[B1]]
		; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], [[A2]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub i64 [[TMP5]], [[TMP6]]
		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP7]], [[TMP3]]
		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
		; CHECK-NEXT: [[TMP9:%.*]] = shl i64 [[TMP8]], 3
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP9]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1		; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]		; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()		; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8		; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8
; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1		; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1
; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64		; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i64, i64 [[TMP6]], i64 [[TMP9]]		; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i64, i64 [[TMP6]], i64 [[TMP9]]
; CHECK-NEXT: [[TMP11:%.]] = bitcast i64 [[TMP10]] to <vscale x 8 x i64>*		; CHECK-NEXT: [[TMP11:%.]] = bitcast i64 [[TMP10]] to <vscale x 8 x i64>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x i64>, <vscale x 8 x i64> [[TMP11]], align 8, !alias.scope !9		; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x i64>, <vscale x 8 x i64> [[TMP11]], align 8
; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP13:%.*]] = add <vscale x 8 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)		; CHECK-NEXT: [[TMP13:%.*]] = add <vscale x 8 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)
; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()		; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8		; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8
; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1		; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1
; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64		; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64
; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i64, i64 [[TMP12]], i64 [[TMP16]]		; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i64, i64 [[TMP12]], i64 [[TMP16]]
; CHECK-NEXT: [[TMP18:%.]] = bitcast i64 [[TMP17]] to <vscale x 8 x i64>*		; CHECK-NEXT: [[TMP18:%.]] = bitcast i64 [[TMP17]] to <vscale x 8 x i64>*
; CHECK-NEXT: store <vscale x 8 x i64> [[TMP13]], <vscale x 8 x i64>* [[TMP18]], align 8, !alias.scope !12, !noalias !9		; CHECK-NEXT: store <vscale x 8 x i64> [[TMP13]], <vscale x 8 x i64>* [[TMP18]], align 8
; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3		; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]
; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-qabs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-vectorize -instcombine -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s -S -o - \| FileCheck %s			; RUN: opt -loop-vectorize -instcombine -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s -S -o - \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv8.1m.main-arm-none-eabi"			target triple = "thumbv8.1m.main-arm-none-eabi"

	define void @arm_abs_q7(i8* nocapture readonly %pSrc, i8* nocapture %pDst, i32 %blockSize) #0 {			define void @arm_abs_q7(i8* nocapture readonly %pSrc, i8* nocapture %pDst, i32 %blockSize) #0 {
	; CHECK-LABEL: @arm_abs_q7(			; CHECK-LABEL: @arm_abs_q7(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[PSRC2:%.]] = ptrtoint i8 [[PSRC:%.*]] to i32
				; CHECK-NEXT: [[PDST1:%.]] = ptrtoint i8 [[PDST:%.*]] to i32
	; CHECK-NEXT: [[CMP_NOT19:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0			; CHECK-NEXT: [[CMP_NOT19:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]
	; CHECK: while.body.preheader:			; CHECK: while.body.preheader:
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 16			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 16
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i32 [[PDST1]], [[PSRC2]]
	; CHECK: vector.memcheck:			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i32 [[TMP0]], 16
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, i8 [[PDST:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]
	; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, i8 [[PSRC:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i8 [[SCEVGEP1]], [[PDST]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i8 [[SCEVGEP]], [[PSRC]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -16			; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -16
	; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, i8 [[PSRC]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, i8 [[PSRC]], i32 [[N_VEC]]
	; CHECK-NEXT: [[IND_END3:%.*]] = and i32 [[BLOCKSIZE]], 15			; CHECK-NEXT: [[IND_END3:%.*]] = and i32 [[BLOCKSIZE]], 15
	; CHECK-NEXT: [[IND_END5:%.]] = getelementptr i8, i8 [[PDST]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END5:%.]] = getelementptr i8, i8 [[PDST]], i32 [[N_VEC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 [[PSRC]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 [[PSRC]], i32 [[INDEX]]
	; CHECK-NEXT: [[NEXT_GEP6:%.]] = getelementptr i8, i8 [[PDST]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP6:%.]] = getelementptr i8, i8 [[PDST]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[NEXT_GEP]] to <16 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[NEXT_GEP]] to <16 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <16 x i8> [[WIDE_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <16 x i8> [[WIDE_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <16 x i8> [[WIDE_LOAD]], <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <16 x i8> [[WIDE_LOAD]], <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>
	; CHECK-NEXT: [[TMP3:%.*]] = sub <16 x i8> zeroinitializer, [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3:%.*]] = sub <16 x i8> zeroinitializer, [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP4:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> <i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127>, <16 x i8> [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> <i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127>, <16 x i8> [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[WIDE_LOAD]], <16 x i8> [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[WIDE_LOAD]], <16 x i8> [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[NEXT_GEP6]] to <16 x i8>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[NEXT_GEP6]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP5]], <16 x i8>* [[TMP6]], align 1, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <16 x i8> [[TMP5]], <16 x i8>* [[TMP6]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP5:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP5:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ], [ [[PSRC]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i32 [ [[IND_END3]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ], [ [[BLOCKSIZE]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i32 [ [[IND_END3]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL4:%.]] = phi i8 [ [[IND_END5]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ], [ [[PDST]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL4:%.]] = phi i8 [ [[IND_END5]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[PSRC_ADDR_022:%.]] = phi i8 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PSRC_ADDR_022:%.]] = phi i8 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[BLKCNT_021:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[BLKCNT_021:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[PDST_ADDR_020:%.]] = phi i8 [ [[INCDEC_PTR13:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL4]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PDST_ADDR_020:%.]] = phi i8 [ [[INCDEC_PTR13:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL4]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8* [[PSRC_ADDR_022]], i32 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, i8* [[PSRC_ADDR_022]], i32 1
	; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[PSRC_ADDR_022]], align 1			; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[PSRC_ADDR_022]], align 1
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i8 [[TMP8]], 0			; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i8 [[TMP8]], 0
	Show All 38 Lines

	while.end: ; preds = %while.end.loopexit, %entry			while.end: ; preds = %while.end.loopexit, %entry
	ret void			ret void
	}			}

	define void @arm_abs_q15(i16* nocapture readonly %pSrc, i16* nocapture %pDst, i32 %blockSize) #0 {			define void @arm_abs_q15(i16* nocapture readonly %pSrc, i16* nocapture %pDst, i32 %blockSize) #0 {
	; CHECK-LABEL: @arm_abs_q15(			; CHECK-LABEL: @arm_abs_q15(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[PSRC2:%.]] = ptrtoint i16 [[PSRC:%.*]] to i32
				; CHECK-NEXT: [[PDST1:%.]] = ptrtoint i16 [[PDST:%.*]] to i32
	; CHECK-NEXT: [[CMP_NOT20:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0			; CHECK-NEXT: [[CMP_NOT20:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT20]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP_NOT20]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]
	; CHECK: while.body.preheader:			; CHECK: while.body.preheader:
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 8			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 8
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i32 [[PDST1]], [[PSRC2]]
	; CHECK: vector.memcheck:			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i32 [[TMP0]], 16
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i16, i16 [[PDST:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i16, i16 [[PSRC:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i16 [[SCEVGEP4]], [[PDST]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i16 [[SCEVGEP]], [[PSRC]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -8			; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -8
	; CHECK-NEXT: [[IND_END:%.]] = getelementptr i16, i16 [[PSRC]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END:%.]] = getelementptr i16, i16 [[PSRC]], i32 [[N_VEC]]
	; CHECK-NEXT: [[IND_END7:%.*]] = and i32 [[BLOCKSIZE]], 7			; CHECK-NEXT: [[IND_END7:%.*]] = and i32 [[BLOCKSIZE]], 7
	; CHECK-NEXT: [[IND_END9:%.]] = getelementptr i16, i16 [[PDST]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END9:%.]] = getelementptr i16, i16 [[PDST]], i32 [[N_VEC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i16, i16 [[PSRC]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i16, i16 [[PSRC]], i32 [[INDEX]]
	; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i16, i16 [[PDST]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i16, i16 [[PDST]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[NEXT_GEP]] to <8 x i16>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[NEXT_GEP]] to <8 x i16>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2, !alias.scope !8			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i16> [[WIDE_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i16> [[WIDE_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <8 x i16> [[WIDE_LOAD]], <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <8 x i16> [[WIDE_LOAD]], <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>
	; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i16> zeroinitializer, [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i16> zeroinitializer, [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> <i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767>, <8 x i16> [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> <i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767>, <8 x i16> [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[WIDE_LOAD]], <8 x i16> [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[WIDE_LOAD]], <8 x i16> [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[NEXT_GEP10]] to <8 x i16>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i16 [[NEXT_GEP10]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP5]], <8 x i16>* [[TMP6]], align 2, !alias.scope !11, !noalias !8			; CHECK-NEXT: store <8 x i16> [[TMP5]], <8 x i16>* [[TMP6]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP13:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP13:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ], [ [[PSRC]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[IND_END7]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ], [ [[BLOCKSIZE]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[IND_END7]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL8:%.]] = phi i16 [ [[IND_END9]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ], [ [[PDST]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL8:%.]] = phi i16 [ [[IND_END9]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[PSRC_ADDR_023:%.]] = phi i16 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PSRC_ADDR_023:%.]] = phi i16 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[BLKCNT_022:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[BLKCNT_022:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[PDST_ADDR_021:%.]] = phi i16 [ [[INCDEC_PTR13:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL8]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PDST_ADDR_021:%.]] = phi i16 [ [[INCDEC_PTR13:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL8]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i16, i16* [[PSRC_ADDR_023]], i32 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i16, i16* [[PSRC_ADDR_023]], i32 1
	; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[PSRC_ADDR_023]], align 2			; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[PSRC_ADDR_023]], align 2
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i16 [[TMP8]], 0			; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i16 [[TMP8]], 0
	Show All 38 Lines

	while.end: ; preds = %while.end.loopexit, %entry			while.end: ; preds = %while.end.loopexit, %entry
	ret void			ret void
	}			}

	define void @arm_abs_q31(i32* nocapture readonly %pSrc, i32* nocapture %pDst, i32 %blockSize) #0 {			define void @arm_abs_q31(i32* nocapture readonly %pSrc, i32* nocapture %pDst, i32 %blockSize) #0 {
	; CHECK-LABEL: @arm_abs_q31(			; CHECK-LABEL: @arm_abs_q31(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[PSRC2:%.]] = ptrtoint i32 [[PSRC:%.*]] to i32
				; CHECK-NEXT: [[PDST1:%.]] = ptrtoint i32 [[PDST:%.*]] to i32
	; CHECK-NEXT: [[CMP_NOT14:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0			; CHECK-NEXT: [[CMP_NOT14:%.]] = icmp eq i32 [[BLOCKSIZE:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT14]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP_NOT14]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]
	; CHECK: while.body.preheader:			; CHECK: while.body.preheader:
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i32 [[PDST1]], [[PSRC2]]
	; CHECK: vector.memcheck:			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i32 [[TMP0]], 16
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[PDST:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[PSRC:%.*]], i32 [[BLOCKSIZE]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP4]], [[PDST]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[PSRC]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -4			; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -4
	; CHECK-NEXT: [[IND_END:%.]] = getelementptr i32, i32 [[PSRC]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END:%.]] = getelementptr i32, i32 [[PSRC]], i32 [[N_VEC]]
	; CHECK-NEXT: [[IND_END7:%.*]] = and i32 [[BLOCKSIZE]], 3			; CHECK-NEXT: [[IND_END7:%.*]] = and i32 [[BLOCKSIZE]], 3
	; CHECK-NEXT: [[IND_END9:%.]] = getelementptr i32, i32 [[PDST]], i32 [[N_VEC]]			; CHECK-NEXT: [[IND_END9:%.]] = getelementptr i32, i32 [[PDST]], i32 [[N_VEC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[PSRC]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[PSRC]], i32 [[INDEX]]
	; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i32, i32 [[PDST]], i32 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i32, i32 [[PDST]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[NEXT_GEP]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[NEXT_GEP]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4, !alias.scope !15			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
	; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> zeroinitializer, [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> zeroinitializer, [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, <4 x i32> [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, <4 x i32> [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[WIDE_LOAD]], <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[WIDE_LOAD]], <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[NEXT_GEP10]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[NEXT_GEP10]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !alias.scope !18, !noalias !15			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ], [ [[PSRC]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[PSRC]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[IND_END7]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ], [ [[BLOCKSIZE]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[IND_END7]], [[MIDDLE_BLOCK]] ], [ [[BLOCKSIZE]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL8:%.]] = phi i32 [ [[IND_END9]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ], [ [[PDST]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL8:%.]] = phi i32 [ [[IND_END9]], [[MIDDLE_BLOCK]] ], [ [[PDST]], [[WHILE_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[PSRC_ADDR_017:%.]] = phi i32 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PSRC_ADDR_017:%.]] = phi i32 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[BLKCNT_016:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[BLKCNT_016:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[PDST_ADDR_015:%.]] = phi i32 [ [[INCDEC_PTR7:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL8]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PDST_ADDR_015:%.]] = phi i32 [ [[INCDEC_PTR7:%.*]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL8]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[PSRC_ADDR_017]], i32 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, i32* [[PSRC_ADDR_017]], i32 1
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[PSRC_ADDR_017]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[PSRC_ADDR_017]], align 4
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP8]], 0			; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP8]], 0
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll

Show All 14 Lines
; A[i] = B[i] + trigger[i];		; A[i] = B[i] + trigger[i];
; }		; }
; }		; }
;}		;}

define void @foo1(i32* nocapture %A, i32* nocapture readonly %B, i32* nocapture readonly %trigger) local_unnamed_addr #0 {		define void @foo1(i32* nocapture %A, i32* nocapture readonly %B, i32* nocapture readonly %trigger) local_unnamed_addr #0 {
; AVX1-LABEL: @foo1(		; AVX1-LABEL: @foo1(
; AVX1-NEXT: entry:		; AVX1-NEXT: entry:
; AVX1-NEXT: [[A1:%.]] = bitcast i32 [[A:%.]] to i8		; AVX1-NEXT: [[B3:%.]] = ptrtoint i32 [[B:%.*]] to i64
; AVX1-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX1-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX1-NEXT: [[B6:%.]] = bitcast i32 [[B:%.]] to i8		; AVX1-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX1: vector.memcheck:		; AVX1: vector.memcheck:
; AVX1-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[A]], i64 10000		; AVX1-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX1-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; AVX1-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
; AVX1-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX1-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX1-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX1-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 32
; AVX1-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 [[B]], i64 10000		; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX1-NEXT: [[SCEVGEP78:%.]] = bitcast i32 [[SCEVGEP7]] to i8*
; AVX1-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX1-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX1-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX1-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX1: vector.ph:		; AVX1: vector.ph:
; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX1: vector.body:		; AVX1: vector.body:
; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*		; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*
; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !0		; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4
; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP5]], i32 0		; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP5]], i32 0
; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*		; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !3		; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison)
; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[TMP9]], i32 0		; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[TMP9]], i32 0
; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*		; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
; AVX1-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !5, !noalias !7		; AVX1-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP4]])
; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000		; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]		; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; AVX1: middle.block:		; AVX1: middle.block:
; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000		; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX1: scalar.ph:		; AVX1: scalar.ph:
; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 15 Lines
; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]		; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; AVX1: for.end:		; AVX1: for.end:
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @foo1(		; AVX2-LABEL: @foo1(
; AVX2-NEXT: entry:		; AVX2-NEXT: entry:
; AVX2-NEXT: [[A1:%.]] = bitcast i32 [[A:%.]] to i8		; AVX2-NEXT: [[B3:%.]] = ptrtoint i32 [[B:%.*]] to i64
; AVX2-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX2-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX2-NEXT: [[B6:%.]] = bitcast i32 [[B:%.]] to i8		; AVX2-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX2: vector.memcheck:		; AVX2: vector.memcheck:
; AVX2-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[A]], i64 10000		; AVX2-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX2-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; AVX2-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128
; AVX2-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX2-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX2-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX2-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 128
; AVX2-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 [[B]], i64 10000		; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX2-NEXT: [[SCEVGEP78:%.]] = bitcast i32 [[SCEVGEP7]] to i8*
; AVX2-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX2-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX2-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX2-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX2: vector.ph:		; AVX2: vector.ph:
; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX2: vector.body:		; AVX2: vector.body:
; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8		; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8
; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16		; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16
; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24		; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*		; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4, !alias.scope !0		; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8
; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*		; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4, !alias.scope !0		; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16
; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*		; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !0		; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*		; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !0		; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4
; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0		; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0
; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <8 x i32>*		; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !3		; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison)
; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 8		; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 8
; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <8 x i32>*		; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !3		; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison)
; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 16		; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 16
; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <8 x i32>*		; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !3		; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison)
; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 24		; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 24
; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <8 x i32>*		; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !3		; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison)
; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]		; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]		; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]		; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0		; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0
; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <8 x i32>*		; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <8 x i32>*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP32]], <8 x i32>* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !5, !noalias !7		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP32]], <8 x i32>* [[TMP41]], i32 4, <8 x i1> [[TMP16]])
; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 8		; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 8
; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <8 x i32>*		; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <8 x i32>*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP33]], <8 x i32>* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !5, !noalias !7		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP33]], <8 x i32>* [[TMP43]], i32 4, <8 x i1> [[TMP17]])
; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 16		; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 16
; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <8 x i32>*		; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <8 x i32>*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP34]], <8 x i32>* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !5, !noalias !7		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP34]], <8 x i32>* [[TMP45]], i32 4, <8 x i1> [[TMP18]])
; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 24		; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 24
; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <8 x i32>*		; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <8 x i32>*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP35]], <8 x i32>* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !5, !noalias !7		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP35]], <8 x i32>* [[TMP47]], i32 4, <8 x i1> [[TMP19]])
; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32		; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]		; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; AVX2: middle.block:		; AVX2: middle.block:
; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX2: scalar.ph:		; AVX2: scalar.ph:
; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 15 Lines
; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]		; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; AVX2: for.end:		; AVX2: for.end:
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @foo1(		; AVX512-LABEL: @foo1(
; AVX512-NEXT: iter.check:		; AVX512-NEXT: iter.check:
; AVX512-NEXT: [[A1:%.]] = bitcast i32 [[A:%.]] to i8		; AVX512-NEXT: [[B3:%.]] = ptrtoint i32 [[B:%.*]] to i64
; AVX512-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX512-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX512-NEXT: [[B6:%.]] = bitcast i32 [[B:%.]] to i8		; AVX512-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX512: vector.memcheck:		; AVX512: vector.memcheck:
; AVX512-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[A]], i64 10000		; AVX512-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX512-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; AVX512-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 256
; AVX512-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX512-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX512-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX512-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 256
; AVX512-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 [[B]], i64 10000		; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX512-NEXT: [[SCEVGEP78:%.]] = bitcast i32 [[SCEVGEP7]] to i8*
; AVX512-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX512-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX512-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX512-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]		; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]
; AVX512: vector.main.loop.iter.check:		; AVX512: vector.main.loop.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
; AVX512: vector.ph:		; AVX512: vector.ph:
; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX512: vector.body:		; AVX512: vector.body:
; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16		; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16
; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32		; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32
; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48		; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <16 x i32>*		; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP9]], align 4, !alias.scope !0		; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP9]], align 4
; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16		; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16
; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*		; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4, !alias.scope !0		; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4
; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 32		; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 32
; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <16 x i32>*		; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !0		; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4
; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48		; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48
; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !0		; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4
; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0		; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0
; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <16 x i32>*		; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !3		; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison)
; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 16		; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 16
; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <16 x i32>*		; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !3		; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison)
; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 32		; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 32
; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <16 x i32>*		; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !3		; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison)
; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 48		; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 48
; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*		; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !3		; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison)
; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]		; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]		; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]		; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0		; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0
; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <16 x i32>*		; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <16 x i32>*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP32]], <16 x i32>* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !5, !noalias !7		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP32]], <16 x i32>* [[TMP41]], i32 4, <16 x i1> [[TMP16]])
; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 16		; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 16
; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <16 x i32>*		; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <16 x i32>*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP33]], <16 x i32>* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !5, !noalias !7		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP33]], <16 x i32>* [[TMP43]], i32 4, <16 x i1> [[TMP17]])
; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 32		; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 32
; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <16 x i32>*		; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <16 x i32>*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP34]], <16 x i32>* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !5, !noalias !7		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP34]], <16 x i32>* [[TMP45]], i32 4, <16 x i1> [[TMP18]])
; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 48		; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 48
; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <16 x i32>*		; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <16 x i32>*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP35]], <16 x i32>* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !5, !noalias !7		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP35]], <16 x i32>* [[TMP47]], i32 4, <16 x i1> [[TMP19]])
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; AVX512: vec.epilog.iter.check:		; AVX512: vec.epilog.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for.end: ; preds = %for.inc
ret void		ret void
}		}

; The same as @foo1 but all the pointers are address space 1 pointers.		; The same as @foo1 but all the pointers are address space 1 pointers.

define void @foo1_addrspace1(i32 addrspace(1)* nocapture %A, i32 addrspace(1)* nocapture readonly %B, i32 addrspace(1)* nocapture readonly %trigger) local_unnamed_addr #0 {		define void @foo1_addrspace1(i32 addrspace(1)* nocapture %A, i32 addrspace(1)* nocapture readonly %B, i32 addrspace(1)* nocapture readonly %trigger) local_unnamed_addr #0 {
; AVX1-LABEL: @foo1_addrspace1(		; AVX1-LABEL: @foo1_addrspace1(
; AVX1-NEXT: entry:		; AVX1-NEXT: entry:
; AVX1-NEXT: [[A1:%.]] = bitcast i32 addrspace(1) [[A:%.]] to i8 addrspace(1)		; AVX1-NEXT: [[B3:%.]] = ptrtoint i32 addrspace(1) [[B:%.*]] to i64
; AVX1-NEXT: [[TRIGGER3:%.]] = bitcast i32 addrspace(1) [[TRIGGER:%.]] to i8 addrspace(1)		; AVX1-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 addrspace(1) [[TRIGGER:%.*]] to i64
; AVX1-NEXT: [[B6:%.]] = bitcast i32 addrspace(1) [[B:%.]] to i8 addrspace(1)		; AVX1-NEXT: [[A1:%.]] = ptrtoint i32 addrspace(1) [[A:%.*]] to i64
; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX1: vector.memcheck:		; AVX1: vector.memcheck:
; AVX1-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 10000		; AVX1-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX1-NEXT: [[SCEVGEP2:%.]] = bitcast i32 addrspace(1) [[SCEVGEP]] to i8 addrspace(1)*		; AVX1-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
; AVX1-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 addrspace(1) [[TRIGGER]], i64 10000		; AVX1-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX1-NEXT: [[SCEVGEP45:%.]] = bitcast i32 addrspace(1) [[SCEVGEP4]] to i8 addrspace(1)*		; AVX1-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 32
; AVX1-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 10000		; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX1-NEXT: [[SCEVGEP78:%.]] = bitcast i32 addrspace(1) [[SCEVGEP7]] to i8 addrspace(1)*
; AVX1-NEXT: [[BOUND0:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP45]]
; AVX1-NEXT: [[BOUND1:%.]] = icmp ult i8 addrspace(1) [[TRIGGER3]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX1-NEXT: [[BOUND09:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP78]]
; AVX1-NEXT: [[BOUND110:%.]] = icmp ult i8 addrspace(1) [[B6]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX1: vector.ph:		; AVX1: vector.ph:
; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX1: vector.body:		; AVX1: vector.body:
; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP1]], i32 0		; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP1]], i32 0
; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 addrspace(1) [[TMP2]] to <8 x i32> addrspace(1)*		; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 addrspace(1) [[TMP2]] to <8 x i32> addrspace(1)*
; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP3]], align 4, !alias.scope !11		; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP3]], align 4
; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 addrspace(1) [[TMP5]], i32 0		; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 addrspace(1) [[TMP5]], i32 0
; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 addrspace(1) [[TMP6]] to <8 x i32> addrspace(1)*		; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 addrspace(1) [[TMP6]] to <8 x i32> addrspace(1)*
; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !14		; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison)
; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 addrspace(1) [[TMP9]], i32 0		; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 addrspace(1) [[TMP9]], i32 0
; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*		; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*
; AVX1-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP8]], <8 x i32> addrspace(1)* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !16, !noalias !18		; AVX1-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP8]], <8 x i32> addrspace(1)* [[TMP11]], i32 4, <8 x i1> [[TMP4]])
; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000		; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]		; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
; AVX1: middle.block:		; AVX1: middle.block:
; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000		; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX1: scalar.ph:		; AVX1: scalar.ph:
; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 15 Lines
; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]		; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
; AVX1: for.end:		; AVX1: for.end:
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @foo1_addrspace1(		; AVX2-LABEL: @foo1_addrspace1(
; AVX2-NEXT: entry:		; AVX2-NEXT: entry:
; AVX2-NEXT: [[A1:%.]] = bitcast i32 addrspace(1) [[A:%.]] to i8 addrspace(1)		; AVX2-NEXT: [[B3:%.]] = ptrtoint i32 addrspace(1) [[B:%.*]] to i64
; AVX2-NEXT: [[TRIGGER3:%.]] = bitcast i32 addrspace(1) [[TRIGGER:%.]] to i8 addrspace(1)		; AVX2-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 addrspace(1) [[TRIGGER:%.*]] to i64
; AVX2-NEXT: [[B6:%.]] = bitcast i32 addrspace(1) [[B:%.]] to i8 addrspace(1)		; AVX2-NEXT: [[A1:%.]] = ptrtoint i32 addrspace(1) [[A:%.*]] to i64
; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX2: vector.memcheck:		; AVX2: vector.memcheck:
; AVX2-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 10000		; AVX2-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX2-NEXT: [[SCEVGEP2:%.]] = bitcast i32 addrspace(1) [[SCEVGEP]] to i8 addrspace(1)*		; AVX2-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128
; AVX2-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 addrspace(1) [[TRIGGER]], i64 10000		; AVX2-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX2-NEXT: [[SCEVGEP45:%.]] = bitcast i32 addrspace(1) [[SCEVGEP4]] to i8 addrspace(1)*		; AVX2-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 128
; AVX2-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 10000		; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX2-NEXT: [[SCEVGEP78:%.]] = bitcast i32 addrspace(1) [[SCEVGEP7]] to i8 addrspace(1)*
; AVX2-NEXT: [[BOUND0:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP45]]
; AVX2-NEXT: [[BOUND1:%.]] = icmp ult i8 addrspace(1) [[TRIGGER3]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX2-NEXT: [[BOUND09:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP78]]
; AVX2-NEXT: [[BOUND110:%.]] = icmp ult i8 addrspace(1) [[B6]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX2: vector.ph:		; AVX2: vector.ph:
; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX2: vector.body:		; AVX2: vector.body:
; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8		; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8
; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16		; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16
; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24		; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 0		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 0
; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 addrspace(1) [[TMP8]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 addrspace(1) [[TMP8]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP9]], align 4, !alias.scope !11		; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP9]], align 4
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 8		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 8
; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP11]], align 4, !alias.scope !11		; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP11]], align 4
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 16		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 16
; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 addrspace(1) [[TMP12]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 addrspace(1) [[TMP12]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !11		; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP13]], align 4
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 24		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 24
; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !11		; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP15]], align 4
; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0		; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0
; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !14		; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison)
; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 8		; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 8
; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !14		; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison)
; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16		; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16
; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !14		; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison)
; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 24		; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 24
; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !14		; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison)
; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]		; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]		; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]		; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0		; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0
; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP32]], <8 x i32> addrspace(1)* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !16, !noalias !18		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP32]], <8 x i32> addrspace(1)* [[TMP41]], i32 4, <8 x i1> [[TMP16]])
; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 8		; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 8
; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP33]], <8 x i32> addrspace(1)* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !16, !noalias !18		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP33]], <8 x i32> addrspace(1)* [[TMP43]], i32 4, <8 x i1> [[TMP17]])
; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16		; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16
; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP34]], <8 x i32> addrspace(1)* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !16, !noalias !18		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP34]], <8 x i32> addrspace(1)* [[TMP45]], i32 4, <8 x i1> [[TMP18]])
; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 24		; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 24
; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <8 x i32> addrspace(1)*		; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <8 x i32> addrspace(1)*
; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP35]], <8 x i32> addrspace(1)* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !16, !noalias !18		; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP35]], <8 x i32> addrspace(1)* [[TMP47]], i32 4, <8 x i1> [[TMP19]])
; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32		; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]		; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
; AVX2: middle.block:		; AVX2: middle.block:
; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX2: scalar.ph:		; AVX2: scalar.ph:
; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 15 Lines
; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]		; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
; AVX2: for.end:		; AVX2: for.end:
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @foo1_addrspace1(		; AVX512-LABEL: @foo1_addrspace1(
; AVX512-NEXT: iter.check:		; AVX512-NEXT: iter.check:
; AVX512-NEXT: [[A1:%.]] = bitcast i32 addrspace(1) [[A:%.]] to i8 addrspace(1)		; AVX512-NEXT: [[B3:%.]] = ptrtoint i32 addrspace(1) [[B:%.*]] to i64
; AVX512-NEXT: [[TRIGGER3:%.]] = bitcast i32 addrspace(1) [[TRIGGER:%.]] to i8 addrspace(1)		; AVX512-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 addrspace(1) [[TRIGGER:%.*]] to i64
; AVX512-NEXT: [[B6:%.]] = bitcast i32 addrspace(1) [[B:%.]] to i8 addrspace(1)		; AVX512-NEXT: [[A1:%.]] = ptrtoint i32 addrspace(1) [[A:%.*]] to i64
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX512: vector.memcheck:		; AVX512: vector.memcheck:
; AVX512-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 10000		; AVX512-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX512-NEXT: [[SCEVGEP2:%.]] = bitcast i32 addrspace(1) [[SCEVGEP]] to i8 addrspace(1)*		; AVX512-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 256
; AVX512-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 addrspace(1) [[TRIGGER]], i64 10000		; AVX512-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX512-NEXT: [[SCEVGEP45:%.]] = bitcast i32 addrspace(1) [[SCEVGEP4]] to i8 addrspace(1)*		; AVX512-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 256
; AVX512-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 10000		; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX512-NEXT: [[SCEVGEP78:%.]] = bitcast i32 addrspace(1) [[SCEVGEP7]] to i8 addrspace(1)*
; AVX512-NEXT: [[BOUND0:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP45]]
; AVX512-NEXT: [[BOUND1:%.]] = icmp ult i8 addrspace(1) [[TRIGGER3]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX512-NEXT: [[BOUND09:%.]] = icmp ult i8 addrspace(1) [[A1]], [[SCEVGEP78]]
; AVX512-NEXT: [[BOUND110:%.]] = icmp ult i8 addrspace(1) [[B6]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]		; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]
; AVX512: vector.main.loop.iter.check:		; AVX512: vector.main.loop.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
; AVX512: vector.ph:		; AVX512: vector.ph:
; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX512: vector.body:		; AVX512: vector.body:
; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16		; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16
; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32		; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32
; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48		; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 0		; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 0
; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 addrspace(1) [[TMP8]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 addrspace(1) [[TMP8]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP9]], align 4, !alias.scope !13		; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP9]], align 4
; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 16		; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 16
; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP11]], align 4, !alias.scope !13		; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP11]], align 4
; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 32		; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 32
; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 addrspace(1) [[TMP12]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 addrspace(1) [[TMP12]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !13		; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP13]], align 4
; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 48		; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 48
; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !13		; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP15]], align 4
; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0		; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0
; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !16		; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison)
; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16		; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16
; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !16		; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison)
; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 32		; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 32
; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !16		; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison)
; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 48		; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 48
; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !16		; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison)
; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]		; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]		; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]		; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]		; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0		; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0
; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP32]], <16 x i32> addrspace(1)* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !18, !noalias !20		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP32]], <16 x i32> addrspace(1)* [[TMP41]], i32 4, <16 x i1> [[TMP16]])
; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16		; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16
; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP33]], <16 x i32> addrspace(1)* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !18, !noalias !20		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP33]], <16 x i32> addrspace(1)* [[TMP43]], i32 4, <16 x i1> [[TMP17]])
; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 32		; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 32
; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP34]], <16 x i32> addrspace(1)* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !18, !noalias !20		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP34]], <16 x i32> addrspace(1)* [[TMP45]], i32 4, <16 x i1> [[TMP18]])
; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 48		; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 48
; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <16 x i32> addrspace(1)*		; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <16 x i32> addrspace(1)*
; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP35]], <16 x i32> addrspace(1)* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !18, !noalias !20		; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP35]], <16 x i32> addrspace(1)* [[TMP47]], i32 4, <16 x i1> [[TMP19]])
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; AVX512: vec.epilog.iter.check:		; AVX512: vec.epilog.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
; A[i] = B[i] + trigger[i];		; A[i] = B[i] + trigger[i];
; }		; }
; }		; }
;}		;}

define void @foo2(float* nocapture %A, float* nocapture readonly %B, i32* nocapture readonly %trigger) local_unnamed_addr #0 {		define void @foo2(float* nocapture %A, float* nocapture readonly %B, i32* nocapture readonly %trigger) local_unnamed_addr #0 {
; AVX1-LABEL: @foo2(		; AVX1-LABEL: @foo2(
; AVX1-NEXT: entry:		; AVX1-NEXT: entry:
; AVX1-NEXT: [[A1:%.]] = bitcast float [[A:%.]] to i8		; AVX1-NEXT: [[B3:%.]] = ptrtoint float [[B:%.*]] to i64
; AVX1-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX1-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX1-NEXT: [[B6:%.]] = bitcast float [[B:%.]] to i8		; AVX1-NEXT: [[A1:%.]] = ptrtoint float [[A:%.*]] to i64
; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX1-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX1: vector.memcheck:		; AVX1: vector.memcheck:
; AVX1-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[A]], i64 10000		; AVX1-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX1-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*		; AVX1-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
; AVX1-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX1-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX1-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX1-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 32
; AVX1-NEXT: [[SCEVGEP7:%.]] = getelementptr float, float [[B]], i64 10000		; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX1-NEXT: [[SCEVGEP78:%.]] = bitcast float [[SCEVGEP7]] to i8*
; AVX1-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX1-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX1-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX1-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX1-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX1-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX1-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX1: vector.ph:		; AVX1: vector.ph:
; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX1-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX1: vector.body:		; AVX1: vector.body:
; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*		; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*
; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !21		; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4
; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX1-NEXT: [[TMP5:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP5:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP5]], i32 0		; AVX1-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP5]], i32 0
; AVX1-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*		; AVX1-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*
; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x float> poison), !alias.scope !24		; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x float> poison)
; AVX1-NEXT: [[TMP8:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>		; AVX1-NEXT: [[TMP8:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>
; AVX1-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP8]]		; AVX1-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP8]]
; AVX1-NEXT: [[TMP10:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]		; AVX1-NEXT: [[TMP10:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
; AVX1-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP10]], i32 0		; AVX1-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP10]], i32 0
; AVX1-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <8 x float>*		; AVX1-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <8 x float>*
; AVX1-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP12]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !26, !noalias !28		; AVX1-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP12]], i32 4, <8 x i1> [[TMP4]])
; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; AVX1-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000		; AVX1-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
; AVX1-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]		; AVX1-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
; AVX1: middle.block:		; AVX1: middle.block:
; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000		; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX1: scalar.ph:		; AVX1: scalar.ph:
; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 16 Lines
; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX1-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]		; AVX1-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
; AVX1: for.end:		; AVX1: for.end:
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @foo2(		; AVX2-LABEL: @foo2(
; AVX2-NEXT: entry:		; AVX2-NEXT: entry:
; AVX2-NEXT: [[A1:%.]] = bitcast float [[A:%.]] to i8		; AVX2-NEXT: [[B3:%.]] = ptrtoint float [[B:%.*]] to i64
; AVX2-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX2-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX2-NEXT: [[B6:%.]] = bitcast float [[B:%.]] to i8		; AVX2-NEXT: [[A1:%.]] = ptrtoint float [[A:%.*]] to i64
; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX2: vector.memcheck:		; AVX2: vector.memcheck:
; AVX2-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[A]], i64 10000		; AVX2-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX2-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*		; AVX2-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128
; AVX2-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX2-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX2-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX2-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 128
; AVX2-NEXT: [[SCEVGEP7:%.]] = getelementptr float, float [[B]], i64 10000		; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX2-NEXT: [[SCEVGEP78:%.]] = bitcast float [[SCEVGEP7]] to i8*
; AVX2-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX2-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX2-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX2-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX2-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX2-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX2-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX2: vector.ph:		; AVX2: vector.ph:
; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX2-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX2: vector.body:		; AVX2: vector.body:
; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8		; AVX2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 8
; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16		; AVX2-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16
; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24		; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*		; AVX2-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4, !alias.scope !21		; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8
; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*		; AVX2-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4, !alias.scope !21		; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16
; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*		; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !21		; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*		; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !21		; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4
; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0		; AVX2-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0
; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <8 x float>*		; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <8 x float>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x float> poison), !alias.scope !24		; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x float> poison)
; AVX2-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 8		; AVX2-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 8
; AVX2-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <8 x float>*		; AVX2-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <8 x float>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x float> poison), !alias.scope !24		; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x float> poison)
; AVX2-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 16		; AVX2-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 16
; AVX2-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <8 x float>*		; AVX2-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <8 x float>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x float> poison), !alias.scope !24		; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x float> poison)
; AVX2-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 24		; AVX2-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 24
; AVX2-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <8 x float>*		; AVX2-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <8 x float>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x float> poison), !alias.scope !24		; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x float> poison)
; AVX2-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>		; AVX2-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>
; AVX2-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x float>		; AVX2-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x float>
; AVX2-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x float>		; AVX2-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x float>
; AVX2-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x float>		; AVX2-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x float>
; AVX2-NEXT: [[TMP36:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]		; AVX2-NEXT: [[TMP36:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]
; AVX2-NEXT: [[TMP37:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]		; AVX2-NEXT: [[TMP37:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]
; AVX2-NEXT: [[TMP38:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]		; AVX2-NEXT: [[TMP38:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]
; AVX2-NEXT: [[TMP39:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]		; AVX2-NEXT: [[TMP39:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]
; AVX2-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0		; AVX2-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0
; AVX2-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <8 x float>*		; AVX2-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <8 x float>*
; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP36]], <8 x float>* [[TMP45]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !26, !noalias !28		; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP36]], <8 x float>* [[TMP45]], i32 4, <8 x i1> [[TMP16]])
; AVX2-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 8		; AVX2-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 8
; AVX2-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <8 x float>*		; AVX2-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <8 x float>*
; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP37]], <8 x float>* [[TMP47]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !26, !noalias !28		; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP37]], <8 x float>* [[TMP47]], i32 4, <8 x i1> [[TMP17]])
; AVX2-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 16		; AVX2-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 16
; AVX2-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <8 x float>*		; AVX2-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <8 x float>*
; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP38]], <8 x float>* [[TMP49]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !26, !noalias !28		; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP38]], <8 x float>* [[TMP49]], i32 4, <8 x i1> [[TMP18]])
; AVX2-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 24		; AVX2-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 24
; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <8 x float>*		; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <8 x float>*
; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP39]], <8 x float>* [[TMP51]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !26, !noalias !28		; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP39]], <8 x float>* [[TMP51]], i32 4, <8 x i1> [[TMP19]])
; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32		; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
; AVX2-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX2-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX2-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]		; AVX2-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
; AVX2: middle.block:		; AVX2: middle.block:
; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX2: scalar.ph:		; AVX2: scalar.ph:
; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Show All 16 Lines
; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; AVX2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000		; AVX2-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 10000
; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]		; AVX2-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
; AVX2: for.end:		; AVX2: for.end:
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @foo2(		; AVX512-LABEL: @foo2(
; AVX512-NEXT: iter.check:		; AVX512-NEXT: iter.check:
; AVX512-NEXT: [[A1:%.]] = bitcast float [[A:%.]] to i8		; AVX512-NEXT: [[B3:%.]] = ptrtoint float [[B:%.*]] to i64
; AVX512-NEXT: [[TRIGGER3:%.]] = bitcast i32 [[TRIGGER:%.]] to i8		; AVX512-NEXT: [[TRIGGER2:%.]] = ptrtoint i32 [[TRIGGER:%.*]] to i64
; AVX512-NEXT: [[B6:%.]] = bitcast float [[B:%.]] to i8		; AVX512-NEXT: [[A1:%.]] = ptrtoint float [[A:%.*]] to i64
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; AVX512: vector.memcheck:		; AVX512: vector.memcheck:
; AVX512-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[A]], i64 10000		; AVX512-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[TRIGGER2]]
; AVX512-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*		; AVX512-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 256
; AVX512-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[TRIGGER]], i64 10000		; AVX512-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[B3]]
; AVX512-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*		; AVX512-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 256
; AVX512-NEXT: [[SCEVGEP7:%.]] = getelementptr float, float [[B]], i64 10000		; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
; AVX512-NEXT: [[SCEVGEP78:%.]] = bitcast float [[SCEVGEP7]] to i8*
; AVX512-NEXT: [[BOUND0:%.]] = icmp ult i8 [[A1]], [[SCEVGEP45]]
; AVX512-NEXT: [[BOUND1:%.]] = icmp ult i8 [[TRIGGER3]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; AVX512-NEXT: [[BOUND09:%.]] = icmp ult i8 [[A1]], [[SCEVGEP78]]
; AVX512-NEXT: [[BOUND110:%.]] = icmp ult i8 [[B6]], [[SCEVGEP2]]
; AVX512-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]		; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[VEC_EPILOG_SCALAR_PH]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]
; AVX512: vector.main.loop.iter.check:		; AVX512: vector.main.loop.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
; AVX512: vector.ph:		; AVX512: vector.ph:
; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX512: vector.body:		; AVX512: vector.body:
; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; AVX512-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16		; AVX512-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 16
; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32		; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 32
; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48		; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 48
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <16 x i32>*		; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP9]], align 4, !alias.scope !24		; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP9]], align 4
; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16		; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16
; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*		; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4, !alias.scope !24		; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4
; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 32		; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 32
; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <16 x i32>*		; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !24		; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4
; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48		; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48
; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*
; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !24		; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4
; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0		; AVX512-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0
; AVX512-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <16 x float>*		; AVX512-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <16 x float>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x float> poison), !alias.scope !27		; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x float> poison)
; AVX512-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 16		; AVX512-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 16
; AVX512-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <16 x float>*		; AVX512-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <16 x float>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x float> poison), !alias.scope !27		; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x float> poison)
; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 32		; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 32
; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*		; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x float> poison), !alias.scope !27		; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x float> poison)
; AVX512-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 48		; AVX512-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 48
; AVX512-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <16 x float>*		; AVX512-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <16 x float>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x float> poison), !alias.scope !27		; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x float> poison)
; AVX512-NEXT: [[TMP32:%.*]] = sitofp <16 x i32> [[WIDE_LOAD]] to <16 x float>		; AVX512-NEXT: [[TMP32:%.*]] = sitofp <16 x i32> [[WIDE_LOAD]] to <16 x float>
; AVX512-NEXT: [[TMP33:%.*]] = sitofp <16 x i32> [[WIDE_LOAD12]] to <16 x float>		; AVX512-NEXT: [[TMP33:%.*]] = sitofp <16 x i32> [[WIDE_LOAD12]] to <16 x float>
; AVX512-NEXT: [[TMP34:%.*]] = sitofp <16 x i32> [[WIDE_LOAD13]] to <16 x float>		; AVX512-NEXT: [[TMP34:%.*]] = sitofp <16 x i32> [[WIDE_LOAD13]] to <16 x float>
; AVX512-NEXT: [[TMP35:%.*]] = sitofp <16 x i32> [[WIDE_LOAD14]] to <16 x float>		; AVX512-NEXT: [[TMP35:%.*]] = sitofp <16 x i32> [[WIDE_LOAD14]] to <16 x float>
; AVX512-NEXT: [[TMP36:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]		; AVX512-NEXT: [[TMP36:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]
; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]		; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]
; AVX512-NEXT: [[TMP38:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]		; AVX512-NEXT: [[TMP38:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]
; AVX512-NEXT: [[TMP39:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]		; AVX512-NEXT: [[TMP39:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]
; AVX512-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0		; AVX512-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0
; AVX512-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <16 x float>*		; AVX512-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <16 x float>*
; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP36]], <16 x float>* [[TMP45]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !29, !noalias !31		; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP36]], <16 x float>* [[TMP45]], i32 4, <16 x i1> [[TMP16]])
; AVX512-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 16		; AVX512-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 16
; AVX512-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <16 x float>*		; AVX512-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <16 x float>*
; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP47]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !29, !noalias !31		; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP47]], i32 4, <16 x i1> [[TMP17]])
; AVX512-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 32		; AVX512-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 32
; AVX512-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <16 x float>*		; AVX512-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <16 x float>*
; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP38]], <16 x float>* [[TMP49]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !29, !noalias !31		; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP38]], <16 x float>* [[TMP49]], i32 4, <16 x i1> [[TMP18]])
; AVX512-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 48		; AVX512-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 48
; AVX512-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <16 x float>*		; AVX512-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <16 x float>*
; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP39]], <16 x float>* [[TMP51]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !29, !noalias !31		; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP39]], <16 x float>* [[TMP51]], i32 4, <16 x i1> [[TMP19]])
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; AVX512: vec.epilog.iter.check:		; AVX512: vec.epilog.iter.check:
; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
; AVX-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8		; AVX-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
; AVX-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12		; AVX-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*		; AVX-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
; AVX-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, !alias.scope !31		; AVX-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, !alias.scope !7
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 4		; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 4
; AVX-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*		; AVX-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4, !alias.scope !31		; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4, !alias.scope !7
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8		; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8
; AVX-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*		; AVX-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*
; AVX-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !31		; AVX-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !7
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 12		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 12
; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*		; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*
; AVX-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP15]], align 4, !alias.scope !31		; AVX-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP15]], align 4, !alias.scope !7
; AVX-NEXT: [[TMP16:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100>		; AVX-NEXT: [[TMP16:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100>
; AVX-NEXT: [[TMP17:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100>		; AVX-NEXT: [[TMP17:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100>
; AVX-NEXT: [[TMP18:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100>		; AVX-NEXT: [[TMP18:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100>
; AVX-NEXT: [[TMP19:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100>		; AVX-NEXT: [[TMP19:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100>
; AVX-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]		; AVX-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]
; AVX-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]		; AVX-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]
; AVX-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]		; AVX-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]
; AVX-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]		; AVX-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]
; AVX-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0		; AVX-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0
; AVX-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <4 x double>*		; AVX-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <4 x double>*
; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP25]], i32 8, <4 x i1> [[TMP16]], <4 x double> poison), !alias.scope !34		; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP25]], i32 8, <4 x i1> [[TMP16]], <4 x double> poison), !alias.scope !10
; AVX-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 4		; AVX-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 4
; AVX-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*		; AVX-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*
; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP27]], i32 8, <4 x i1> [[TMP17]], <4 x double> poison), !alias.scope !34		; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP27]], i32 8, <4 x i1> [[TMP17]], <4 x double> poison), !alias.scope !10
; AVX-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 8		; AVX-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 8
; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <4 x double>*		; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <4 x double>*
; AVX-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP29]], i32 8, <4 x i1> [[TMP18]], <4 x double> poison), !alias.scope !34		; AVX-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP29]], i32 8, <4 x i1> [[TMP18]], <4 x double> poison), !alias.scope !10
; AVX-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 12		; AVX-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 12
; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*		; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*
; AVX-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP31]], i32 8, <4 x i1> [[TMP19]], <4 x double> poison), !alias.scope !34		; AVX-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP31]], i32 8, <4 x i1> [[TMP19]], <4 x double> poison), !alias.scope !10
; AVX-NEXT: [[TMP32:%.*]] = sitofp <4 x i32> [[WIDE_LOAD]] to <4 x double>		; AVX-NEXT: [[TMP32:%.*]] = sitofp <4 x i32> [[WIDE_LOAD]] to <4 x double>
; AVX-NEXT: [[TMP33:%.*]] = sitofp <4 x i32> [[WIDE_LOAD12]] to <4 x double>		; AVX-NEXT: [[TMP33:%.*]] = sitofp <4 x i32> [[WIDE_LOAD12]] to <4 x double>
; AVX-NEXT: [[TMP34:%.*]] = sitofp <4 x i32> [[WIDE_LOAD13]] to <4 x double>		; AVX-NEXT: [[TMP34:%.*]] = sitofp <4 x i32> [[WIDE_LOAD13]] to <4 x double>
; AVX-NEXT: [[TMP35:%.*]] = sitofp <4 x i32> [[WIDE_LOAD14]] to <4 x double>		; AVX-NEXT: [[TMP35:%.*]] = sitofp <4 x i32> [[WIDE_LOAD14]] to <4 x double>
; AVX-NEXT: [[TMP36:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]		; AVX-NEXT: [[TMP36:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]
; AVX-NEXT: [[TMP37:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]		; AVX-NEXT: [[TMP37:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]
; AVX-NEXT: [[TMP38:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]		; AVX-NEXT: [[TMP38:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]
; AVX-NEXT: [[TMP39:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]		; AVX-NEXT: [[TMP39:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]
; AVX-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]		; AVX-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]
; AVX-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]		; AVX-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
; AVX-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]		; AVX-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]
; AVX-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]		; AVX-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]
; AVX-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0		; AVX-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0
; AVX-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <4 x double>*		; AVX-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <4 x double>*
; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP36]], <4 x double>* [[TMP45]], i32 8, <4 x i1> [[TMP16]]), !alias.scope !36, !noalias !38		; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP36]], <4 x double>* [[TMP45]], i32 8, <4 x i1> [[TMP16]]), !alias.scope !12, !noalias !14
; AVX-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 4		; AVX-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 4
; AVX-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <4 x double>*		; AVX-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <4 x double>*
; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP37]], <4 x double>* [[TMP47]], i32 8, <4 x i1> [[TMP17]]), !alias.scope !36, !noalias !38		; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP37]], <4 x double>* [[TMP47]], i32 8, <4 x i1> [[TMP17]]), !alias.scope !12, !noalias !14
; AVX-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 8		; AVX-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 8
; AVX-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <4 x double>*		; AVX-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <4 x double>*
; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP38]], <4 x double>* [[TMP49]], i32 8, <4 x i1> [[TMP18]]), !alias.scope !36, !noalias !38		; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP38]], <4 x double>* [[TMP49]], i32 8, <4 x i1> [[TMP18]]), !alias.scope !12, !noalias !14
; AVX-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 12		; AVX-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 12
; AVX-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <4 x double>*		; AVX-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <4 x double>*
; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP39]], <4 x double>* [[TMP51]], i32 8, <4 x i1> [[TMP19]]), !alias.scope !36, !noalias !38		; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP39]], <4 x double>* [[TMP51]], i32 8, <4 x i1> [[TMP19]]), !alias.scope !12, !noalias !14
; AVX-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16		; AVX-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
; AVX-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000		; AVX-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
; AVX-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]		; AVX-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]
; AVX: middle.block:		; AVX: middle.block:
; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000		; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX: scalar.ph:		; AVX: scalar.ph:
; AVX-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16		; AVX512-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 16
; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24		; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 24
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*		; AVX512-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4, !alias.scope !35		; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP9]], align 4, !alias.scope !11
; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8		; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 8
; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*		; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4, !alias.scope !35		; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP11]], align 4, !alias.scope !11
; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16		; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 16
; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*		; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !35		; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !11
; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24		; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*		; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !35		; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !11
; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0		; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0
; AVX512-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <8 x double>*		; AVX512-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP25]], i32 8, <8 x i1> [[TMP16]], <8 x double> poison), !alias.scope !38		; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP25]], i32 8, <8 x i1> [[TMP16]], <8 x double> poison), !alias.scope !14
; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 8		; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 8
; AVX512-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <8 x double>*		; AVX512-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP27]], i32 8, <8 x i1> [[TMP17]], <8 x double> poison), !alias.scope !38		; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP27]], i32 8, <8 x i1> [[TMP17]], <8 x double> poison), !alias.scope !14
; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 16		; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 16
; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <8 x double>*		; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP29]], i32 8, <8 x i1> [[TMP18]], <8 x double> poison), !alias.scope !38		; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP29]], i32 8, <8 x i1> [[TMP18]], <8 x double> poison), !alias.scope !14
; AVX512-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 24		; AVX512-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 24
; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*		; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP31]], i32 8, <8 x i1> [[TMP19]], <8 x double> poison), !alias.scope !38		; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP31]], i32 8, <8 x i1> [[TMP19]], <8 x double> poison), !alias.scope !14
; AVX512-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x double>		; AVX512-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x double>
; AVX512-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x double>		; AVX512-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x double>
; AVX512-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x double>		; AVX512-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x double>
; AVX512-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x double>		; AVX512-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x double>
; AVX512-NEXT: [[TMP36:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]		; AVX512-NEXT: [[TMP36:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]
; AVX512-NEXT: [[TMP37:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]		; AVX512-NEXT: [[TMP37:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]
; AVX512-NEXT: [[TMP38:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]		; AVX512-NEXT: [[TMP38:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]
; AVX512-NEXT: [[TMP39:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]		; AVX512-NEXT: [[TMP39:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]
; AVX512-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0		; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0
; AVX512-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <8 x double>*		; AVX512-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP36]], <8 x double>* [[TMP45]], i32 8, <8 x i1> [[TMP16]]), !alias.scope !40, !noalias !42		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP36]], <8 x double>* [[TMP45]], i32 8, <8 x i1> [[TMP16]]), !alias.scope !16, !noalias !18
; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 8		; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 8
; AVX512-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <8 x double>*		; AVX512-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP37]], <8 x double>* [[TMP47]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !40, !noalias !42		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP37]], <8 x double>* [[TMP47]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !16, !noalias !18
; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 16		; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 16
; AVX512-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <8 x double>*		; AVX512-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP38]], <8 x double>* [[TMP49]], i32 8, <8 x i1> [[TMP18]]), !alias.scope !40, !noalias !42		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP38]], <8 x double>* [[TMP49]], i32 8, <8 x i1> [[TMP18]]), !alias.scope !16, !noalias !18
; AVX512-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 24		; AVX512-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 24
; AVX512-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <8 x double>*		; AVX512-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP39]], <8 x double>* [[TMP51]], i32 8, <8 x i1> [[TMP19]]), !alias.scope !40, !noalias !42		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP39]], <8 x double>* [[TMP51]], i32 8, <8 x i1> [[TMP19]]), !alias.scope !16, !noalias !18
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984		; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX512: scalar.ph:		; AVX512: scalar.ph:
; AVX512-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; AVX512-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 9984, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]		; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; AVX512: vector.ph:		; AVX512: vector.ph:
; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]		; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
; AVX512: vector.body:		; AVX512: vector.body:
; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX512-NEXT: [[VEC_IND:%.]] = phi <8 x i64> [ <i64 0, i64 16, i64 32, i64 48, i64 64, i64 80, i64 96, i64 112>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]		; AVX512-NEXT: [[VEC_IND:%.]] = phi <8 x i64> [ <i64 0, i64 16, i64 32, i64 48, i64 64, i64 80, i64 96, i64 112>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], <8 x i64> [[VEC_IND]]		; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], <8 x i64> [[VEC_IND]]
; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP0]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !alias.scope !45		; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP0]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !alias.scope !21
; AVX512-NEXT: [[TMP1:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>		; AVX512-NEXT: [[TMP1:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
; AVX512-NEXT: [[TMP2:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>		; AVX512-NEXT: [[TMP2:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>
; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP2]]		; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP2]]
; AVX512-NEXT: [[WIDE_MASKED_GATHER12:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP3]], i32 8, <8 x i1> [[TMP1]], <8 x double> undef), !alias.scope !48		; AVX512-NEXT: [[WIDE_MASKED_GATHER12:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP3]], i32 8, <8 x i1> [[TMP1]], <8 x double> undef), !alias.scope !24
; AVX512-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER]] to <8 x double>		; AVX512-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER]] to <8 x double>
; AVX512-NEXT: [[TMP5:%.*]] = fadd <8 x double> [[WIDE_MASKED_GATHER12]], [[TMP4]]		; AVX512-NEXT: [[TMP5:%.*]] = fadd <8 x double> [[WIDE_MASKED_GATHER12]], [[TMP4]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[A]], <8 x i64> [[VEC_IND]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[A]], <8 x i64> [[VEC_IND]]
; AVX512-NEXT: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> [[TMP5]], <8 x double*> [[TMP6]], i32 8, <8 x i1> [[TMP1]]), !alias.scope !50, !noalias !52		; AVX512-NEXT: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> [[TMP5]], <8 x double*> [[TMP6]], i32 8, <8 x i1> [[TMP1]]), !alias.scope !26, !noalias !28
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; AVX512-NEXT: [[VEC_IND_NEXT]] = add <8 x i64> [[VEC_IND]], <i64 128, i64 128, i64 128, i64 128, i64 128, i64 128, i64 128, i64 128>		; AVX512-NEXT: [[VEC_IND_NEXT]] = add <8 x i64> [[VEC_IND]], <i64 128, i64 128, i64 128, i64 128, i64 128, i64 128, i64 128, i64 128>
; AVX512-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 624		; AVX512-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 624
; AVX512-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP53:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP53:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 625, 624		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 625, 624
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX512: scalar.ph:		; AVX512: scalar.ph:
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -12		; AVX2-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -12
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 -3		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 -3
; AVX2-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*		; AVX2-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*
; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP10]], align 4, !alias.scope !41		; AVX2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP10]], align 4, !alias.scope !17
; AVX2-NEXT: [[REVERSE:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -4		; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -4
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 -3		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 -3
; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*		; AVX2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*
; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !41		; AVX2-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !17
; AVX2-NEXT: [[REVERSE13:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD12]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE13:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD12]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -8		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -8
; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i32 -3		; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i32 -3
; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*		; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*
; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP16]], align 4, !alias.scope !41		; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP16]], align 4, !alias.scope !17
; AVX2-NEXT: [[REVERSE15:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD14]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE15:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD14]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -12		; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -12
; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -3		; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -3
; AVX2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*		; AVX2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
; AVX2-NEXT: [[WIDE_LOAD16:%.]] = load <4 x i32>, <4 x i32> [[TMP19]], align 4, !alias.scope !41		; AVX2-NEXT: [[WIDE_LOAD16:%.]] = load <4 x i32>, <4 x i32> [[TMP19]], align 4, !alias.scope !17
; AVX2-NEXT: [[REVERSE17:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD16]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE17:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD16]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i32> [[REVERSE]], zeroinitializer		; AVX2-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i32> [[REVERSE]], zeroinitializer
; AVX2-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i32> [[REVERSE13]], zeroinitializer		; AVX2-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i32> [[REVERSE13]], zeroinitializer
; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i32> [[REVERSE15]], zeroinitializer		; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i32> [[REVERSE15]], zeroinitializer
; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i32> [[REVERSE17]], zeroinitializer		; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i32> [[REVERSE17]], zeroinitializer
; AVX2-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]
; AVX2-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0		; AVX2-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0
; AVX2-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -3		; AVX2-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -3
; AVX2-NEXT: [[REVERSE18:%.*]] = shufflevector <4 x i1> [[TMP20]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE18:%.*]] = shufflevector <4 x i1> [[TMP20]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*		; AVX2-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP30]], i32 8, <4 x i1> [[REVERSE18]], <4 x double> poison), !alias.scope !44		; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP30]], i32 8, <4 x i1> [[REVERSE18]], <4 x double> poison), !alias.scope !20
; AVX2-NEXT: [[REVERSE19:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE19:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -4		; AVX2-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -4
; AVX2-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -3		; AVX2-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -3
; AVX2-NEXT: [[REVERSE20:%.*]] = shufflevector <4 x i1> [[TMP21]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE20:%.*]] = shufflevector <4 x i1> [[TMP21]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*		; AVX2-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[REVERSE20]], <4 x double> poison), !alias.scope !44		; AVX2-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[REVERSE20]], <4 x double> poison), !alias.scope !20
; AVX2-NEXT: [[REVERSE22:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD21]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE22:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD21]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -8		; AVX2-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -8
; AVX2-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -3		; AVX2-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -3
; AVX2-NEXT: [[REVERSE23:%.*]] = shufflevector <4 x i1> [[TMP22]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE23:%.*]] = shufflevector <4 x i1> [[TMP22]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <4 x double>*		; AVX2-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <4 x double>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP36]], i32 8, <4 x i1> [[REVERSE23]], <4 x double> poison), !alias.scope !44		; AVX2-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP36]], i32 8, <4 x i1> [[REVERSE23]], <4 x double> poison), !alias.scope !20
; AVX2-NEXT: [[REVERSE25:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD24]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE25:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD24]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -12		; AVX2-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -12
; AVX2-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -3		; AVX2-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -3
; AVX2-NEXT: [[REVERSE26:%.*]] = shufflevector <4 x i1> [[TMP23]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE26:%.*]] = shufflevector <4 x i1> [[TMP23]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*		; AVX2-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*
; AVX2-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[REVERSE26]], <4 x double> poison), !alias.scope !44		; AVX2-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[REVERSE26]], <4 x double> poison), !alias.scope !20
; AVX2-NEXT: [[REVERSE28:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD27]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE28:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD27]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP40:%.*]] = fadd <4 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX2-NEXT: [[TMP40:%.*]] = fadd <4 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX2-NEXT: [[TMP41:%.*]] = fadd <4 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX2-NEXT: [[TMP41:%.*]] = fadd <4 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX2-NEXT: [[TMP42:%.*]] = fadd <4 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX2-NEXT: [[TMP42:%.*]] = fadd <4 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX2-NEXT: [[TMP43:%.*]] = fadd <4 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX2-NEXT: [[TMP43:%.*]] = fadd <4 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX2-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]		; AVX2-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]
; AVX2-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]		; AVX2-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
; AVX2-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]		; AVX2-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
; AVX2-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]		; AVX2-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
; AVX2-NEXT: [[REVERSE29:%.*]] = shufflevector <4 x double> [[TMP40]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE29:%.*]] = shufflevector <4 x double> [[TMP40]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0		; AVX2-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0
; AVX2-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -3		; AVX2-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -3
; AVX2-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <4 x double>*		; AVX2-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <4 x double>*
; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE29]], <4 x double>* [[TMP50]], i32 8, <4 x i1> [[REVERSE18]]), !alias.scope !46, !noalias !48		; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE29]], <4 x double>* [[TMP50]], i32 8, <4 x i1> [[REVERSE18]]), !alias.scope !22, !noalias !24
; AVX2-NEXT: [[REVERSE31:%.*]] = shufflevector <4 x double> [[TMP41]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE31:%.*]] = shufflevector <4 x double> [[TMP41]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -4		; AVX2-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -4
; AVX2-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -3		; AVX2-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -3
; AVX2-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <4 x double>*		; AVX2-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <4 x double>*
; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE31]], <4 x double>* [[TMP53]], i32 8, <4 x i1> [[REVERSE20]]), !alias.scope !46, !noalias !48		; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE31]], <4 x double>* [[TMP53]], i32 8, <4 x i1> [[REVERSE20]]), !alias.scope !22, !noalias !24
; AVX2-NEXT: [[REVERSE33:%.*]] = shufflevector <4 x double> [[TMP42]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE33:%.*]] = shufflevector <4 x double> [[TMP42]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -8		; AVX2-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -8
; AVX2-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -3		; AVX2-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -3
; AVX2-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <4 x double>*		; AVX2-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <4 x double>*
; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE33]], <4 x double>* [[TMP56]], i32 8, <4 x i1> [[REVERSE23]]), !alias.scope !46, !noalias !48		; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE33]], <4 x double>* [[TMP56]], i32 8, <4 x i1> [[REVERSE23]]), !alias.scope !22, !noalias !24
; AVX2-NEXT: [[REVERSE35:%.*]] = shufflevector <4 x double> [[TMP43]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		; AVX2-NEXT: [[REVERSE35:%.*]] = shufflevector <4 x double> [[TMP43]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; AVX2-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -12		; AVX2-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -12
; AVX2-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -3		; AVX2-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -3
; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*		; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE35]], <4 x double>* [[TMP59]], i32 8, <4 x i1> [[REVERSE26]]), !alias.scope !46, !noalias !48		; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE35]], <4 x double>* [[TMP59]], i32 8, <4 x i1> [[REVERSE26]]), !alias.scope !22, !noalias !24
; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16		; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
; AVX2-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096		; AVX2-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
; AVX2-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP49:![0-9]+]]		; AVX2-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP49:![0-9]+]]
; AVX2: middle.block:		; AVX2: middle.block:
; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096		; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX2: scalar.ph:		; AVX2: scalar.ph:
; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ -1, [[MIDDLE_BLOCK]] ], [ 4095, [[ENTRY:%.]] ], [ 4095, [[VECTOR_MEMCHECK]] ]		; AVX2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ -1, [[MIDDLE_BLOCK]] ], [ 4095, [[ENTRY:%.]] ], [ 4095, [[VECTOR_MEMCHECK]] ]
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -24		; AVX512-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -24
; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0		; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
; AVX512-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 -7		; AVX512-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 -7
; AVX512-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <8 x i32>*		; AVX512-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP10]], align 4, !alias.scope !55		; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP10]], align 4, !alias.scope !31
; AVX512-NEXT: [[REVERSE:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -8		; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -8
; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 -7		; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 -7
; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*		; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !55		; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !31
; AVX512-NEXT: [[REVERSE13:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD12]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE13:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD12]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -16		; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -16
; AVX512-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i32 -7		; AVX512-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i32 -7
; AVX512-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <8 x i32>*		; AVX512-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP16]], align 4, !alias.scope !55		; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP16]], align 4, !alias.scope !31
; AVX512-NEXT: [[REVERSE15:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD14]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE15:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD14]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -24		; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 -24
; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -7		; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -7
; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <8 x i32>*		; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <8 x i32>*
; AVX512-NEXT: [[WIDE_LOAD16:%.]] = load <8 x i32>, <8 x i32> [[TMP19]], align 4, !alias.scope !55		; AVX512-NEXT: [[WIDE_LOAD16:%.]] = load <8 x i32>, <8 x i32> [[TMP19]], align 4, !alias.scope !31
; AVX512-NEXT: [[REVERSE17:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD16]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE17:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD16]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP20:%.*]] = icmp sgt <8 x i32> [[REVERSE]], zeroinitializer		; AVX512-NEXT: [[TMP20:%.*]] = icmp sgt <8 x i32> [[REVERSE]], zeroinitializer
; AVX512-NEXT: [[TMP21:%.*]] = icmp sgt <8 x i32> [[REVERSE13]], zeroinitializer		; AVX512-NEXT: [[TMP21:%.*]] = icmp sgt <8 x i32> [[REVERSE13]], zeroinitializer
; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <8 x i32> [[REVERSE15]], zeroinitializer		; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <8 x i32> [[REVERSE15]], zeroinitializer
; AVX512-NEXT: [[TMP23:%.*]] = icmp sgt <8 x i32> [[REVERSE17]], zeroinitializer		; AVX512-NEXT: [[TMP23:%.*]] = icmp sgt <8 x i32> [[REVERSE17]], zeroinitializer
; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]
; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0		; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0
; AVX512-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -7		; AVX512-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -7
; AVX512-NEXT: [[REVERSE18:%.*]] = shufflevector <8 x i1> [[TMP20]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE18:%.*]] = shufflevector <8 x i1> [[TMP20]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <8 x double>*		; AVX512-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP30]], i32 8, <8 x i1> [[REVERSE18]], <8 x double> poison), !alias.scope !58		; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP30]], i32 8, <8 x i1> [[REVERSE18]], <8 x double> poison), !alias.scope !34
; AVX512-NEXT: [[REVERSE19:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE19:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -8		; AVX512-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -8
; AVX512-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -7		; AVX512-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -7
; AVX512-NEXT: [[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP21]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP21]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*		; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[REVERSE20]], <8 x double> poison), !alias.scope !58		; AVX512-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[REVERSE20]], <8 x double> poison), !alias.scope !34
; AVX512-NEXT: [[REVERSE22:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD21]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE22:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD21]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -16		; AVX512-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -16
; AVX512-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -7		; AVX512-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -7
; AVX512-NEXT: [[REVERSE23:%.*]] = shufflevector <8 x i1> [[TMP22]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE23:%.*]] = shufflevector <8 x i1> [[TMP22]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <8 x double>*		; AVX512-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP36]], i32 8, <8 x i1> [[REVERSE23]], <8 x double> poison), !alias.scope !58		; AVX512-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP36]], i32 8, <8 x i1> [[REVERSE23]], <8 x double> poison), !alias.scope !34
; AVX512-NEXT: [[REVERSE25:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD24]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE25:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD24]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -24		; AVX512-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -24
; AVX512-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -7		; AVX512-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -7
; AVX512-NEXT: [[REVERSE26:%.*]] = shufflevector <8 x i1> [[TMP23]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE26:%.*]] = shufflevector <8 x i1> [[TMP23]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*		; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*
; AVX512-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[REVERSE26]], <8 x double> poison), !alias.scope !58		; AVX512-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[REVERSE26]], <8 x double> poison), !alias.scope !34
; AVX512-NEXT: [[REVERSE28:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD27]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE28:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD27]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP40:%.*]] = fadd <8 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX512-NEXT: [[TMP40:%.*]] = fadd <8 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX512-NEXT: [[TMP41:%.*]] = fadd <8 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX512-NEXT: [[TMP41:%.*]] = fadd <8 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX512-NEXT: [[TMP42:%.*]] = fadd <8 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX512-NEXT: [[TMP42:%.*]] = fadd <8 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX512-NEXT: [[TMP43:%.*]] = fadd <8 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		; AVX512-NEXT: [[TMP43:%.*]] = fadd <8 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]		; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]
; AVX512-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]		; AVX512-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]		; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
; AVX512-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]		; AVX512-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
; AVX512-NEXT: [[REVERSE29:%.*]] = shufflevector <8 x double> [[TMP40]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE29:%.*]] = shufflevector <8 x double> [[TMP40]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0		; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0
; AVX512-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -7		; AVX512-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -7
; AVX512-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <8 x double>*		; AVX512-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE29]], <8 x double>* [[TMP50]], i32 8, <8 x i1> [[REVERSE18]]), !alias.scope !60, !noalias !62		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE29]], <8 x double>* [[TMP50]], i32 8, <8 x i1> [[REVERSE18]]), !alias.scope !36, !noalias !38
; AVX512-NEXT: [[REVERSE31:%.*]] = shufflevector <8 x double> [[TMP41]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE31:%.*]] = shufflevector <8 x double> [[TMP41]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -8		; AVX512-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -8
; AVX512-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -7		; AVX512-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -7
; AVX512-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <8 x double>*		; AVX512-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE31]], <8 x double>* [[TMP53]], i32 8, <8 x i1> [[REVERSE20]]), !alias.scope !60, !noalias !62		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE31]], <8 x double>* [[TMP53]], i32 8, <8 x i1> [[REVERSE20]]), !alias.scope !36, !noalias !38
; AVX512-NEXT: [[REVERSE33:%.*]] = shufflevector <8 x double> [[TMP42]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE33:%.*]] = shufflevector <8 x double> [[TMP42]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -16		; AVX512-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -16
; AVX512-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -7		; AVX512-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -7
; AVX512-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <8 x double>*		; AVX512-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE33]], <8 x double>* [[TMP56]], i32 8, <8 x i1> [[REVERSE23]]), !alias.scope !60, !noalias !62		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE33]], <8 x double>* [[TMP56]], i32 8, <8 x i1> [[REVERSE23]]), !alias.scope !36, !noalias !38
; AVX512-NEXT: [[REVERSE35:%.*]] = shufflevector <8 x double> [[TMP43]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; AVX512-NEXT: [[REVERSE35:%.*]] = shufflevector <8 x double> [[TMP43]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; AVX512-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -24		; AVX512-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -24
; AVX512-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -7		; AVX512-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -7
; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*		; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*
; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE35]], <8 x double>* [[TMP59]], i32 8, <8 x i1> [[REVERSE26]]), !alias.scope !60, !noalias !62		; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE35]], <8 x double>* [[TMP59]], i32 8, <8 x i1> [[REVERSE26]]), !alias.scope !36, !noalias !38
; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32		; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
; AVX512-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096		; AVX512-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
; AVX512-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP63:![0-9]+]]		; AVX512-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP63:![0-9]+]]
; AVX512: middle.block:		; AVX512: middle.block:
; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096		; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; AVX512: scalar.ph:		; AVX512: scalar.ph:
; AVX512-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ -1, [[MIDDLE_BLOCK]] ], [ 4095, [[ENTRY:%.]] ], [ 4095, [[VECTOR_MEMCHECK]] ]		; AVX512-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ -1, [[MIDDLE_BLOCK]] ], [ 4095, [[ENTRY:%.]] ], [ 4095, [[VECTOR_MEMCHECK]] ]
▲ Show 20 Lines • Show All 883 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show All 11 Lines
	; b[i] = a[i] + a[i - 1]			; b[i] = a[i] + a[i - 1]
	; }			; }
	;			;
	;			;
	;			;
	define void @recurrence_1(i32* nocapture readonly %a, i32* nocapture %b, i32 %n) {			define void @recurrence_1(i32* nocapture readonly %a, i32* nocapture %b, i32 %n) {
	; CHECK-LABEL: @recurrence_1(			; CHECK-LABEL: @recurrence_1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
				; CHECK-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; CHECK-NEXT: br label [[FOR_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_PREHEADER:%.*]]
	; CHECK: for.preheader:			; CHECK: for.preheader:
	; CHECK-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[A]], align 4
	; CHECK-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1			; CHECK-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[A2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[B1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP4]], 16
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B:%.*]], i64 [[TMP5]]			; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr i32, i32 [[A]], i64 1
	; CHECK-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; CHECK-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP6]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP5]], [[B]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i32 [[SCEVGEP3]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934588			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934588
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i64 3
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP9]], align 4, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP9]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP12:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP10]]			; CHECK-NEXT: [[TMP12:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP10]]
	; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i64 3
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	Show All 12 Lines
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: for.exit:			; CHECK: for.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; UNROLL-LABEL: @recurrence_1(			; UNROLL-LABEL: @recurrence_1(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
				; UNROLL-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
				; UNROLL-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NEXT: br label [[FOR_PREHEADER:%.*]]			; UNROLL-NEXT: br label [[FOR_PREHEADER:%.*]]
	; UNROLL: for.preheader:			; UNROLL: for.preheader:
	; UNROLL-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[A:%.*]], align 4			; UNROLL-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[A]], align 4
	; UNROLL-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1			; UNROLL-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1
	; UNROLL-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; UNROLL-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; UNROLL-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 7			; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 7
	; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL: vector.memcheck:			; UNROLL: vector.memcheck:
	; UNROLL-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; UNROLL-NEXT: [[TMP3:%.*]] = add i64 [[A2]], 4
	; UNROLL-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; UNROLL-NEXT: [[TMP4:%.*]] = sub i64 [[B1]], [[TMP3]]
	; UNROLL-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1			; UNROLL-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP4]], 32
	; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B:%.*]], i64 [[TMP5]]			; UNROLL-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NEXT: [[SCEVGEP3:%.]] = getelementptr i32, i32 [[A]], i64 1
	; UNROLL-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; UNROLL-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP6]]
	; UNROLL-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP5]], [[B]]
	; UNROLL-NEXT: [[BOUND1:%.]] = icmp ult i32 [[SCEVGEP3]], [[SCEVGEP]]
	; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL: vector.ph:			; UNROLL: vector.ph:
	; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934584			; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934584
	; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i64 3
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 1			; UNROLL-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 1
	; UNROLL-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]			; UNROLL-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]
	; UNROLL-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*			; UNROLL-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*
	; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4, !alias.scope !0			; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4
	; UNROLL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i64 4			; UNROLL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i64 4
	; UNROLL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*			; UNROLL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
	; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i32>, <4 x i32>* [[TMP11]], align 4, !alias.scope !0			; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i32>, <4 x i32>* [[TMP11]], align 4
	; UNROLL-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; UNROLL-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP15:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP12]]			; UNROLL-NEXT: [[TMP15:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP12]]
	; UNROLL-NEXT: [[TMP16:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[TMP13]]			; UNROLL-NEXT: [[TMP16:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[TMP13]]
	; UNROLL-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*			; UNROLL-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP17]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP17]], align 4
	; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i64 4			; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP14]], i64 4
	; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*			; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP16]], <4 x i32>* [[TMP19]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NEXT: store <4 x i32> [[TMP16]], <4 x i32>* [[TMP19]], align 4
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i64 3
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	Show All 12 Lines
	; UNROLL-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; UNROLL: for.exit:			; UNROLL: for.exit:
	; UNROLL-NEXT: ret void			; UNROLL-NEXT: ret void
	;			;
	; UNROLL-NO-IC-LABEL: @recurrence_1(			; UNROLL-NO-IC-LABEL: @recurrence_1(
	; UNROLL-NO-IC-NEXT: entry:			; UNROLL-NO-IC-NEXT: entry:
	; UNROLL-NO-IC-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8			; UNROLL-NO-IC-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
				; UNROLL-NO-IC-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NO-IC-NEXT: br label [[FOR_PREHEADER:%.*]]			; UNROLL-NO-IC-NEXT: br label [[FOR_PREHEADER:%.*]]
	; UNROLL-NO-IC: for.preheader:			; UNROLL-NO-IC: for.preheader:
	; UNROLL-NO-IC-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 0			; UNROLL-NO-IC-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 0
	; UNROLL-NO-IC-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4			; UNROLL-NO-IC-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1			; UNROLL-NO-IC-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1
	; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NO-IC-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8			; UNROLL-NO-IC-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8
	; UNROLL-NO-IC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NO-IC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL-NO-IC: vector.memcheck:			; UNROLL-NO-IC: vector.memcheck:
	; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add i64 [[A2]], 4
	; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = sub i64 [[B1]], [[TMP3]]
	; UNROLL-NO-IC-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1			; UNROLL-NO-IC-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP4]], 32
	; UNROLL-NO-IC-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP5]]			; UNROLL-NO-IC-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
	; UNROLL-NO-IC-NEXT: [[SCEVGEP3:%.]] = getelementptr i32, i32 [[A]], i64 1
	; UNROLL-NO-IC-NEXT: [[SCEVGEP34:%.]] = bitcast i32 [[SCEVGEP3]] to i8*
	; UNROLL-NO-IC-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; UNROLL-NO-IC-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP6]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP56:%.]] = bitcast i32 [[SCEVGEP5]] to i8*
	; UNROLL-NO-IC-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP56]]
	; UNROLL-NO-IC-NEXT: [[BOUND1:%.]] = icmp ult i8 [[SCEVGEP34]], [[SCEVGEP2]]
	; UNROLL-NO-IC-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NO-IC-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-IC: vector.ph:			; UNROLL-NO-IC: vector.ph:
	; UNROLL-NO-IC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8			; UNROLL-NO-IC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
	; UNROLL-NO-IC-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; UNROLL-NO-IC-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i32 3
	; UNROLL-NO-IC-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-IC-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 4			; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 4
	; UNROLL-NO-IC-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP7]], 1			; UNROLL-NO-IC-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP7]], 1
	; UNROLL-NO-IC-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP8]], 1			; UNROLL-NO-IC-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP8]], 1
	; UNROLL-NO-IC-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP9]]			; UNROLL-NO-IC-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP9]]
	; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP10]]			; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP10]]
	; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP14]], align 4, !alias.scope !0			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP14]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i32>, <4 x i32>* [[TMP16]], align 4, !alias.scope !0			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i32>, <4 x i32>* [[TMP16]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD]], <4 x i32> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP7]]			; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP7]]
	; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP8]]			; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP8]]
	; UNROLL-NO-IC-NEXT: [[TMP21:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP17]]			; UNROLL-NO-IC-NEXT: [[TMP21:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP17]]
	; UNROLL-NO-IC-NEXT: [[TMP22:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[TMP18]]			; UNROLL-NO-IC-NEXT: [[TMP22:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[TMP18]]
	; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP24]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP24]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP26]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP26]], align 4
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i32 3
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i32 2			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[WIDE_LOAD7]], i32 2
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	Show All 13 Lines
	; UNROLL-NO-IC-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NO-IC-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; UNROLL-NO-IC: for.exit:			; UNROLL-NO-IC: for.exit:
	; UNROLL-NO-IC-NEXT: ret void			; UNROLL-NO-IC-NEXT: ret void
	;			;
	; UNROLL-NO-VF-LABEL: @recurrence_1(			; UNROLL-NO-VF-LABEL: @recurrence_1(
	; UNROLL-NO-VF-NEXT: entry:			; UNROLL-NO-VF-NEXT: entry:
	; UNROLL-NO-VF-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8			; UNROLL-NO-VF-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
				; UNROLL-NO-VF-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NO-VF-NEXT: br label [[FOR_PREHEADER:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_PREHEADER:%.*]]
	; UNROLL-NO-VF: for.preheader:			; UNROLL-NO-VF: for.preheader:
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 0			; UNROLL-NO-VF-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 0
	; UNROLL-NO-VF-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4			; UNROLL-NO-VF-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1			; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2			; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2
	; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL-NO-VF: vector.memcheck:			; UNROLL-NO-VF: vector.memcheck:
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i64 [[A2]], 4
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = sub i64 [[B1]], [[TMP3]]
	; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1			; UNROLL-NO-VF-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP4]], 8
	; UNROLL-NO-VF-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP5]]			; UNROLL-NO-VF-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
	; UNROLL-NO-VF-NEXT: [[SCEVGEP3:%.]] = getelementptr i32, i32 [[A]], i64 1
	; UNROLL-NO-VF-NEXT: [[SCEVGEP34:%.]] = bitcast i32 [[SCEVGEP3]] to i8*
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; UNROLL-NO-VF-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP6]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP56:%.]] = bitcast i32 [[SCEVGEP5]] to i8*
	; UNROLL-NO-VF-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP56]]
	; UNROLL-NO-VF-NEXT: [[BOUND1:%.]] = icmp ult i8 [[SCEVGEP34]], [[SCEVGEP2]]
	; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NO-VF-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-VF: vector.ph:			; UNROLL-NO-VF: vector.ph:
	; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2			; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
	; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ [[PRE_LOAD]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ [[PRE_LOAD]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[INDUCTION]], 1			; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[INDUCTION]], 1
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1
	; UNROLL-NO-VF-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]			; UNROLL-NO-VF-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]
	; UNROLL-NO-VF-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP8]]			; UNROLL-NO-VF-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP8]]
	; UNROLL-NO-VF-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]], align 4, !alias.scope !0			; UNROLL-NO-VF-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP12]] = load i32, i32* [[TMP10]], align 4, !alias.scope !0			; UNROLL-NO-VF-NEXT: [[TMP12]] = load i32, i32* [[TMP10]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = add i32 [[TMP11]], [[VECTOR_RECUR]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = add i32 [[TMP11]], [[VECTOR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = add i32 [[TMP12]], [[TMP11]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = add i32 [[TMP12]], [[TMP11]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP15]], i32* [[TMP13]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NO-VF-NEXT: store i32 [[TMP15]], i32* [[TMP13]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP16]], i32* [[TMP14]], align 4, !alias.scope !3, !noalias !0			; UNROLL-NO-VF-NEXT: store i32 [[TMP16]], i32* [[TMP14]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[PRE_LOAD]], [[VECTOR_MEMCHECK]] ], [ [[PRE_LOAD]], [[FOR_PREHEADER]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[PRE_LOAD]], [[VECTOR_MEMCHECK]] ], [ [[PRE_LOAD]], [[FOR_PREHEADER]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]
	Show All 11 Lines
	; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; UNROLL-NO-VF: for.exit:			; UNROLL-NO-VF: for.exit:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @recurrence_1(			; SINK-AFTER-LABEL: @recurrence_1(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8			; SINK-AFTER-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
				; SINK-AFTER-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; SINK-AFTER-NEXT: br label [[FOR_PREHEADER:%.*]]			; SINK-AFTER-NEXT: br label [[FOR_PREHEADER:%.*]]
	; SINK-AFTER: for.preheader:			; SINK-AFTER: for.preheader:
	; SINK-AFTER-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 0			; SINK-AFTER-NEXT: [[ARRAYIDX_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, i32 [[A]], i64 0
	; SINK-AFTER-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4			; SINK-AFTER-NEXT: [[PRE_LOAD:%.]] = load i32, i32 [[ARRAYIDX_PHI_TRANS_INSERT]], align 4
	; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1			; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[N:%.]], -1
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; SINK-AFTER-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4			; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
	; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; SINK-AFTER: vector.memcheck:			; SINK-AFTER: vector.memcheck:
	; SINK-AFTER-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; SINK-AFTER-NEXT: [[TMP3:%.*]] = add i64 [[A2]], 4
	; SINK-AFTER-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; SINK-AFTER-NEXT: [[TMP4:%.*]] = sub i64 [[B1]], [[TMP3]]
	; SINK-AFTER-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1			; SINK-AFTER-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP4]], 16
	; SINK-AFTER-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP5]]			; SINK-AFTER-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; SINK-AFTER-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
	; SINK-AFTER-NEXT: [[SCEVGEP3:%.]] = getelementptr i32, i32 [[A]], i64 1
	; SINK-AFTER-NEXT: [[SCEVGEP34:%.]] = bitcast i32 [[SCEVGEP3]] to i8*
	; SINK-AFTER-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; SINK-AFTER-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP6]]
	; SINK-AFTER-NEXT: [[SCEVGEP56:%.]] = bitcast i32 [[SCEVGEP5]] to i8*
	; SINK-AFTER-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP56]]
	; SINK-AFTER-NEXT: [[BOUND1:%.]] = icmp ult i8 [[SCEVGEP34]], [[SCEVGEP2]]
	; SINK-AFTER-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; SINK-AFTER-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4			; SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
	; SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[PRE_LOAD]], i32 3
	; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]			; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
	; SINK-AFTER: vector.body:			; SINK-AFTER: vector.body:
	; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 0			; SINK-AFTER-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 0
	; SINK-AFTER-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[TMP7]], 1			; SINK-AFTER-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[TMP7]], 1
	; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP8]]			; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP8]]
	; SINK-AFTER-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP9]], i32 0			; SINK-AFTER-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP9]], i32 0
	; SINK-AFTER-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
	; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP11]], align 4, !alias.scope !0			; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i32>, <4 x i32>* [[TMP11]], align 4
	; SINK-AFTER-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP7]]			; SINK-AFTER-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP7]]
	; SINK-AFTER-NEXT: [[TMP14:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP12]]			; SINK-AFTER-NEXT: [[TMP14:%.*]] = add <4 x i32> [[WIDE_LOAD]], [[TMP12]]
	; SINK-AFTER-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP13]], i32 0			; SINK-AFTER-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP13]], i32 0
	; SINK-AFTER-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <4 x i32>*
	; SINK-AFTER-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP16]], align 4, !alias.scope !3, !noalias !0			; SINK-AFTER-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP16]], align 4
	; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; SINK-AFTER-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; SINK-AFTER-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; SINK-AFTER-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; SINK-AFTER: middle.block:			; SINK-AFTER: middle.block:
	; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 3
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 2			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[WIDE_LOAD]], i32 2
	; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*
	; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2, !alias.scope !11			; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP12:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>			; CHECK-NEXT: [[TMP12:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>
	; CHECK-NEXT: [[TMP13:%.*]] = sitofp <4 x i16> [[TMP11]] to <4 x double>			; CHECK-NEXT: [[TMP13:%.*]] = sitofp <4 x i16> [[TMP11]] to <4 x double>
	; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = fsub fast <4 x double> [[TMP12]], [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = fsub fast <4 x double> [[TMP12]], [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[B]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[B]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*			; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !alias.scope !14, !noalias !11			; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; UNROLL-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT9]], <4 x double> poison, <4 x i32> zeroinitializer			; UNROLL-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT9]], <4 x double> poison, <4 x i32> zeroinitializer
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD8:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD8:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1			; UNROLL-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1
	; UNROLL-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[OFFSET_IDX]]			; UNROLL-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[OFFSET_IDX]]
	; UNROLL-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*			; UNROLL-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP10]], align 2, !alias.scope !11			; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP10]], align 2
	; UNROLL-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i64 4			; UNROLL-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i64 4
	; UNROLL-NEXT: [[TMP12:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*			; UNROLL-NEXT: [[TMP12:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD8]] = load <4 x i16>, <4 x i16>* [[TMP12]], align 2, !alias.scope !11			; UNROLL-NEXT: [[WIDE_LOAD8]] = load <4 x i16>, <4 x i16>* [[TMP12]], align 2
	; UNROLL-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP14:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD8]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP14:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD8]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP15:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>			; UNROLL-NEXT: [[TMP15:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>
	; UNROLL-NEXT: [[TMP16:%.*]] = sitofp <4 x i16> [[WIDE_LOAD8]] to <4 x double>			; UNROLL-NEXT: [[TMP16:%.*]] = sitofp <4 x i16> [[WIDE_LOAD8]] to <4 x double>
	; UNROLL-NEXT: [[TMP17:%.*]] = sitofp <4 x i16> [[TMP13]] to <4 x double>			; UNROLL-NEXT: [[TMP17:%.*]] = sitofp <4 x i16> [[TMP13]] to <4 x double>
	; UNROLL-NEXT: [[TMP18:%.*]] = sitofp <4 x i16> [[TMP14]] to <4 x double>			; UNROLL-NEXT: [[TMP18:%.*]] = sitofp <4 x i16> [[TMP14]] to <4 x double>
	; UNROLL-NEXT: [[TMP19:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT]], [[TMP17]]			; UNROLL-NEXT: [[TMP19:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT]], [[TMP17]]
	; UNROLL-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT10]], [[TMP18]]			; UNROLL-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[BROADCAST_SPLAT10]], [[TMP18]]
	; UNROLL-NEXT: [[TMP21:%.*]] = fsub fast <4 x double> [[TMP15]], [[TMP19]]			; UNROLL-NEXT: [[TMP21:%.*]] = fsub fast <4 x double> [[TMP15]], [[TMP19]]
	; UNROLL-NEXT: [[TMP22:%.*]] = fsub fast <4 x double> [[TMP16]], [[TMP20]]			; UNROLL-NEXT: [[TMP22:%.*]] = fsub fast <4 x double> [[TMP16]], [[TMP20]]
	; UNROLL-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[B]], i64 [[OFFSET_IDX]]			; UNROLL-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[B]], i64 [[OFFSET_IDX]]
	; UNROLL-NEXT: [[TMP24:%.]] = bitcast double [[TMP23]] to <4 x double>*			; UNROLL-NEXT: [[TMP24:%.]] = bitcast double [[TMP23]] to <4 x double>*
	; UNROLL-NEXT: store <4 x double> [[TMP21]], <4 x double>* [[TMP24]], align 8, !alias.scope !14, !noalias !11			; UNROLL-NEXT: store <4 x double> [[TMP21]], <4 x double>* [[TMP24]], align 8
	; UNROLL-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[TMP23]], i64 4			; UNROLL-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[TMP23]], i64 4
	; UNROLL-NEXT: [[TMP26:%.]] = bitcast double [[TMP25]] to <4 x double>*			; UNROLL-NEXT: [[TMP26:%.]] = bitcast double [[TMP25]] to <4 x double>*
	; UNROLL-NEXT: store <4 x double> [[TMP22]], <4 x double>* [[TMP26]], align 8, !alias.scope !14, !noalias !11			; UNROLL-NEXT: store <4 x double> [[TMP22]], <4 x double>* [[TMP26]], align 8
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i64 3
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD8:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD8:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-IC-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]			; UNROLL-NO-IC-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
	; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0			; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0
	; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = add i64 [[OFFSET_IDX]], 4			; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = add i64 [[OFFSET_IDX]], 4
	; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP7]]			; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP7]]
	; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP8]]			; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP8]]
	; UNROLL-NO-IC-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP12]], align 2, !alias.scope !11			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP12]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP13]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP13]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD8]] = load <4 x i16>, <4 x i16>* [[TMP14]], align 2, !alias.scope !11			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD8]] = load <4 x i16>, <4 x i16>* [[TMP14]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD8]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD8]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>			; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>
	; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = sitofp <4 x i16> [[WIDE_LOAD8]] to <4 x double>			; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = sitofp <4 x i16> [[WIDE_LOAD8]] to <4 x double>
	; UNROLL-NO-IC-NEXT: [[TMP19:%.*]] = sitofp <4 x i16> [[TMP15]] to <4 x double>			; UNROLL-NO-IC-NEXT: [[TMP19:%.*]] = sitofp <4 x i16> [[TMP15]] to <4 x double>
	; UNROLL-NO-IC-NEXT: [[TMP20:%.*]] = sitofp <4 x i16> [[TMP16]] to <4 x double>			; UNROLL-NO-IC-NEXT: [[TMP20:%.*]] = sitofp <4 x i16> [[TMP16]] to <4 x double>
	; UNROLL-NO-IC-NEXT: [[TMP21:%.*]] = fmul fast <4 x double> [[TMP19]], [[BROADCAST_SPLAT]]			; UNROLL-NO-IC-NEXT: [[TMP21:%.*]] = fmul fast <4 x double> [[TMP19]], [[BROADCAST_SPLAT]]
	; UNROLL-NO-IC-NEXT: [[TMP22:%.*]] = fmul fast <4 x double> [[TMP20]], [[BROADCAST_SPLAT10]]			; UNROLL-NO-IC-NEXT: [[TMP22:%.*]] = fmul fast <4 x double> [[TMP20]], [[BROADCAST_SPLAT10]]
	; UNROLL-NO-IC-NEXT: [[TMP23:%.*]] = fsub fast <4 x double> [[TMP17]], [[TMP21]]			; UNROLL-NO-IC-NEXT: [[TMP23:%.*]] = fsub fast <4 x double> [[TMP17]], [[TMP21]]
	; UNROLL-NO-IC-NEXT: [[TMP24:%.*]] = fsub fast <4 x double> [[TMP18]], [[TMP22]]			; UNROLL-NO-IC-NEXT: [[TMP24:%.*]] = fsub fast <4 x double> [[TMP18]], [[TMP22]]
	; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP7]]			; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP7]]
	; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP8]]			; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP8]]
	; UNROLL-NO-IC-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double [[TMP25]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double [[TMP25]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP28:%.]] = bitcast double [[TMP27]] to <4 x double>*			; UNROLL-NO-IC-NEXT: [[TMP28:%.]] = bitcast double [[TMP27]] to <4 x double>*
	; UNROLL-NO-IC-NEXT: store <4 x double> [[TMP23]], <4 x double>* [[TMP28]], align 8, !alias.scope !14, !noalias !11			; UNROLL-NO-IC-NEXT: store <4 x double> [[TMP23]], <4 x double>* [[TMP28]], align 8
	; UNROLL-NO-IC-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[TMP25]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[TMP25]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*			; UNROLL-NO-IC-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*
	; UNROLL-NO-IC-NEXT: store <4 x double> [[TMP24]], <4 x double>* [[TMP30]], align 8, !alias.scope !14, !noalias !11			; UNROLL-NO-IC-NEXT: store <4 x double> [[TMP24]], <4 x double>* [[TMP30]], align 8
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NO-IC-NEXT: [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: br i1 [[TMP31]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP31]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i32 3
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i32 2			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD8]], i32 2
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[TMP0]], [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[TMP0]], [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]			; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
	; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[OFFSET_IDX]], 0			; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[OFFSET_IDX]], 0
	; UNROLL-NO-VF-NEXT: [[INDUCTION8:%.*]] = add i64 [[OFFSET_IDX]], 1			; UNROLL-NO-VF-NEXT: [[INDUCTION8:%.*]] = add i64 [[OFFSET_IDX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[INDUCTION8]]			; UNROLL-NO-VF-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[INDUCTION8]]
	; UNROLL-NO-VF-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP7]], align 2, !alias.scope !10			; UNROLL-NO-VF-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP7]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP10]] = load i16, i16* [[TMP8]], align 2, !alias.scope !10			; UNROLL-NO-VF-NEXT: [[TMP10]] = load i16, i16* [[TMP8]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = sitofp i16 [[TMP9]] to double			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = sitofp i16 [[TMP9]] to double
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = sitofp i16 [[TMP10]] to double			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = sitofp i16 [[TMP10]] to double
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = sitofp i16 [[VECTOR_RECUR]] to double			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = sitofp i16 [[VECTOR_RECUR]] to double
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = sitofp i16 [[TMP9]] to double			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = sitofp i16 [[TMP9]] to double
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = fmul fast double [[TMP13]], [[CONV1]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = fmul fast double [[TMP13]], [[CONV1]]
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = fmul fast double [[TMP14]], [[CONV1]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = fmul fast double [[TMP14]], [[CONV1]]
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = fsub fast double [[TMP11]], [[TMP15]]			; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = fsub fast double [[TMP11]], [[TMP15]]
	; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = fsub fast double [[TMP12]], [[TMP16]]			; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = fsub fast double [[TMP12]], [[TMP16]]
	; UNROLL-NO-VF-NEXT: [[TMP19:%.]] = getelementptr inbounds double, double [[B]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP19:%.]] = getelementptr inbounds double, double [[B]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[B]], i64 [[INDUCTION8]]			; UNROLL-NO-VF-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[B]], i64 [[INDUCTION8]]
	; UNROLL-NO-VF-NEXT: store double [[TMP17]], double* [[TMP19]], align 8, !alias.scope !13, !noalias !10			; UNROLL-NO-VF-NEXT: store double [[TMP17]], double* [[TMP19]], align 8
	; UNROLL-NO-VF-NEXT: store double [[TMP18]], double* [[TMP20]], align 8, !alias.scope !13, !noalias !10			; UNROLL-NO-VF-NEXT: store double [[TMP18]], double* [[TMP20]], align 8
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[TMP0]], [[VECTOR_MEMCHECK]] ], [ [[TMP0]], [[FOR_PREHEADER]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[TMP0]], [[VECTOR_MEMCHECK]] ], [ [[TMP0]], [[FOR_PREHEADER]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; SINK-AFTER: vector.body:			; SINK-AFTER: vector.body:
	; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]			; SINK-AFTER-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
	; SINK-AFTER-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0			; SINK-AFTER-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0
	; SINK-AFTER-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP7]]			; SINK-AFTER-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP7]]
	; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0			; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
	; SINK-AFTER-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*			; SINK-AFTER-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*
	; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2, !alias.scope !11			; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2
	; SINK-AFTER-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER-NEXT: [[TMP12:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>			; SINK-AFTER-NEXT: [[TMP12:%.*]] = sitofp <4 x i16> [[WIDE_LOAD]] to <4 x double>
	; SINK-AFTER-NEXT: [[TMP13:%.*]] = sitofp <4 x i16> [[TMP11]] to <4 x double>			; SINK-AFTER-NEXT: [[TMP13:%.*]] = sitofp <4 x i16> [[TMP11]] to <4 x double>
	; SINK-AFTER-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[TMP13]], [[BROADCAST_SPLAT]]			; SINK-AFTER-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[TMP13]], [[BROADCAST_SPLAT]]
	; SINK-AFTER-NEXT: [[TMP15:%.*]] = fsub fast <4 x double> [[TMP12]], [[TMP14]]			; SINK-AFTER-NEXT: [[TMP15:%.*]] = fsub fast <4 x double> [[TMP12]], [[TMP14]]
	; SINK-AFTER-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP7]]			; SINK-AFTER-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP7]]
	; SINK-AFTER-NEXT: [[TMP17:%.]] = getelementptr inbounds double, double [[TMP16]], i32 0			; SINK-AFTER-NEXT: [[TMP17:%.]] = getelementptr inbounds double, double [[TMP16]], i32 0
	; SINK-AFTER-NEXT: [[TMP18:%.]] = bitcast double [[TMP17]] to <4 x double>*			; SINK-AFTER-NEXT: [[TMP18:%.]] = bitcast double [[TMP17]] to <4 x double>*
	; SINK-AFTER-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP18]], align 8, !alias.scope !14, !noalias !11			; SINK-AFTER-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP18]], align 8
	; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; SINK-AFTER-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; SINK-AFTER-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; SINK-AFTER-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; SINK-AFTER: middle.block:			; SINK-AFTER: middle.block:
	; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 1,440 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2, !alias.scope !26			; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; CHECK-NEXT: [[TMP8:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !alias.scope !29, !noalias !26			; CHECK-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	Show All 37 Lines
	; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1			; UNROLL-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
	; UNROLL-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; UNROLL-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; UNROLL-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; UNROLL-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2, !alias.scope !26			; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2
	; UNROLL-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[TMP4]], i64 4			; UNROLL-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[TMP4]], i64 4
	; UNROLL-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <4 x i16>*			; UNROLL-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP7]], align 2, !alias.scope !26			; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP7]], align 2
	; UNROLL-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP9:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP9:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP10:%.*]] = sext <4 x i16> [[TMP8]] to <4 x i32>			; UNROLL-NEXT: [[TMP10:%.*]] = sext <4 x i16> [[TMP8]] to <4 x i32>
	; UNROLL-NEXT: [[TMP11:%.*]] = sext <4 x i16> [[TMP9]] to <4 x i32>			; UNROLL-NEXT: [[TMP11:%.*]] = sext <4 x i16> [[TMP9]] to <4 x i32>
	; UNROLL-NEXT: [[TMP12:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; UNROLL-NEXT: [[TMP12:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; UNROLL-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>			; UNROLL-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>
	; UNROLL-NEXT: [[TMP14:%.*]] = mul nsw <4 x i32> [[TMP12]], [[TMP10]]			; UNROLL-NEXT: [[TMP14:%.*]] = mul nsw <4 x i32> [[TMP12]], [[TMP10]]
	; UNROLL-NEXT: [[TMP15:%.*]] = mul nsw <4 x i32> [[TMP13]], [[TMP11]]			; UNROLL-NEXT: [[TMP15:%.*]] = mul nsw <4 x i32> [[TMP13]], [[TMP11]]
	; UNROLL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; UNROLL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <4 x i32>*			; UNROLL-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP17]], align 4, !alias.scope !29, !noalias !26			; UNROLL-NEXT: store <4 x i32> [[TMP14]], <4 x i32>* [[TMP17]], align 4
	; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP16]], i64 4			; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP16]], i64 4
	; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*			; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP19]], align 4, !alias.scope !29, !noalias !26			; UNROLL-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP19]], align 4
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i64 3
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 4			; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 4
	; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP2]], 1			; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP2]], 1
	; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; UNROLL-NO-IC-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP4]]			; UNROLL-NO-IC-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP4]]
	; UNROLL-NO-IC-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP8]], align 2, !alias.scope !26			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP8]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2, !alias.scope !26			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP12:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP12:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[TMP11]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[TMP11]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[TMP12]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[TMP12]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = mul nsw <4 x i32> [[TMP15]], [[TMP13]]			; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = mul nsw <4 x i32> [[TMP15]], [[TMP13]]
	; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = mul nsw <4 x i32> [[TMP16]], [[TMP14]]			; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = mul nsw <4 x i32> [[TMP16]], [[TMP14]]
	; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]			; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]
	; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = bitcast i32 [[TMP21]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = bitcast i32 [[TMP21]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP17]], <4 x i32>* [[TMP22]], align 4, !alias.scope !29, !noalias !26			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP17]], <4 x i32>* [[TMP22]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP18]], <4 x i32>* [[TMP24]], align 4, !alias.scope !29, !noalias !26			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP18]], <4 x i32>* [[TMP24]], align 4
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NO-IC-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 3
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 2			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 2
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[INDUCTION]], 1			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[INDUCTION]], 1
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1
	; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP1]]
	; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]			; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]
	; UNROLL-NO-VF-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP3]], align 2, !alias.scope !25			; UNROLL-NO-VF-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP3]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP6]] = load i16, i16* [[TMP4]], align 2, !alias.scope !25			; UNROLL-NO-VF-NEXT: [[TMP6]] = load i16, i16* [[TMP4]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[VECTOR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[VECTOR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = sext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = sext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = sext i16 [[TMP6]] to i32			; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = sext i16 [[TMP6]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = mul nsw i32 [[TMP9]], [[TMP7]]			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = mul nsw i32 [[TMP9]], [[TMP7]]
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4, !alias.scope !28, !noalias !25			; UNROLL-NO-VF-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], i32* [[TMP14]], align 4, !alias.scope !28, !noalias !25			; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], i32* [[TMP14]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; SINK-AFTER: vector.body:			; SINK-AFTER: vector.body:
	; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; SINK-AFTER-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]			; SINK-AFTER-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]
	; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[TMP3]], i32 0			; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[TMP3]], i32 0
	; SINK-AFTER-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; SINK-AFTER-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2, !alias.scope !26			; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2
	; SINK-AFTER-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP8:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP8:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP7]]			; SINK-AFTER-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP7]]
	; SINK-AFTER-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; SINK-AFTER-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; SINK-AFTER-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0			; SINK-AFTER-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0
	; SINK-AFTER-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; SINK-AFTER-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP12]], align 4, !alias.scope !29, !noalias !26			; SINK-AFTER-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP12]], align 4
	; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; SINK-AFTER-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; SINK-AFTER-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; SINK-AFTER-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]
	; SINK-AFTER: middle.block:			; SINK-AFTER: middle.block:
	; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	define void @PR34711([2 x i16]* %a, i32* %b, i32* %c, i64 %n) {			define void @PR34711([2 x i16]* %a, i32* %b, i32* %c, i64 %n) {
	; CHECK-LABEL: @PR34711(			; CHECK-LABEL: @PR34711(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2			; CHECK-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[C:%.*]], i64 [[N]]			; CHECK-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B:%.*]], i64 [[N]]			; CHECK-NEXT: [[C2:%.]] = ptrtoint i32 [[C:%.*]] to i64
	; CHECK-NEXT: [[SCEVGEP6:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 0, i64 1			; CHECK-NEXT: [[A3:%.]] = ptrtoint [2 x i16] [[A]] to i64
	; CHECK-NEXT: [[SCEVGEP8:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0			; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[C2]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP4]], [[C]]			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[B]]			; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[A3]], 2
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], [[C2]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[SCEVGEP8]] to i32*			; CHECK-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP2]], 16
	; CHECK-NEXT: [[BOUND010:%.]] = icmp ugt i32 [[TMP0]], [[C]]			; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[SCEVGEP]] to i16*			; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[B1]], [[TMP1]]
	; CHECK-NEXT: [[BOUND111:%.]] = icmp ult i16 [[SCEVGEP6]], [[TMP1]]			; CHECK-NEXT: [[DIFF_CHECK5:%.*]] = icmp ult i64 [[TMP3]], 16
	; CHECK-NEXT: [[FOUND_CONFLICT12:%.*]] = and i1 [[BOUND010]], [[BOUND111]]			; CHECK-NEXT: [[CONFLICT_RDX6:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK5]]
	; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT12]]			; CHECK-NEXT: br i1 [[CONFLICT_RDX6]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[SCEVGEP8]] to i32*
	; CHECK-NEXT: [[BOUND013:%.]] = icmp ugt i32 [[TMP2]], [[B]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[SCEVGEP4]] to i16*
	; CHECK-NEXT: [[BOUND114:%.]] = icmp ult i16 [[SCEVGEP6]], [[TMP3]]
	; CHECK-NEXT: [[FOUND_CONFLICT15:%.*]] = and i1 [[BOUND013]], [[BOUND114]]
	; CHECK-NEXT: [[CONFLICT_RDX16:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT15]]
	; CHECK-NEXT: br i1 [[CONFLICT_RDX16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDEX]], i64 1			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDEX]], i64 1
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP12]], align 4, !alias.scope !33, !noalias !36			; CHECK-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: [[TMP13:%.]] = load i16, i16 [[TMP8]], align 2, !alias.scope !39			; CHECK-NEXT: [[TMP13:%.]] = load i16, i16 [[TMP8]], align 2
	; CHECK-NEXT: [[TMP14:%.]] = load i16, i16 [[TMP9]], align 2, !alias.scope !39			; CHECK-NEXT: [[TMP14:%.]] = load i16, i16 [[TMP9]], align 2
	; CHECK-NEXT: [[TMP15:%.]] = load i16, i16 [[TMP10]], align 2, !alias.scope !39			; CHECK-NEXT: [[TMP15:%.]] = load i16, i16 [[TMP10]], align 2
	; CHECK-NEXT: [[TMP16:%.]] = load i16, i16 [[TMP11]], align 2, !alias.scope !39			; CHECK-NEXT: [[TMP16:%.]] = load i16, i16 [[TMP11]], align 2
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i16> poison, i16 [[TMP13]], i64 0			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i16> poison, i16 [[TMP13]], i64 0
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i16> [[TMP17]], i16 [[TMP14]], i64 1			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i16> [[TMP17]], i16 [[TMP14]], i64 1
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i16> [[TMP18]], i16 [[TMP15]], i64 2			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i16> [[TMP18]], i16 [[TMP15]], i64 2
	; CHECK-NEXT: [[TMP20]] = insertelement <4 x i16> [[TMP19]], i16 [[TMP16]], i64 3			; CHECK-NEXT: [[TMP20]] = insertelement <4 x i16> [[TMP19]], i16 [[TMP16]], i64 3
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP19]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP19]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP22:%.*]] = sext <4 x i16> [[TMP21]] to <4 x i32>			; CHECK-NEXT: [[TMP22:%.*]] = sext <4 x i16> [[TMP21]] to <4 x i32>
	; CHECK-NEXT: [[TMP23:%.*]] = sext <4 x i16> [[TMP20]] to <4 x i32>			; CHECK-NEXT: [[TMP23:%.*]] = sext <4 x i16> [[TMP20]] to <4 x i32>
	; CHECK-NEXT: [[TMP24:%.*]] = mul nsw <4 x i32> [[TMP23]], [[TMP22]]			; CHECK-NEXT: [[TMP24:%.*]] = mul nsw <4 x i32> [[TMP23]], [[TMP22]]
	; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*			; CHECK-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP24]], <4 x i32>* [[TMP26]], align 4, !alias.scope !40, !noalias !39			; CHECK-NEXT: store <4 x i32> [[TMP24]], <4 x i32>* [[TMP26]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP16]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP16]], [[MIDDLE_BLOCK]] ]
	Show All 19 Lines
	;			;
	; UNROLL-LABEL: @PR34711(			; UNROLL-LABEL: @PR34711(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0			; UNROLL-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0
	; UNROLL-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2			; UNROLL-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2
	; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8			; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8
	; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL: vector.memcheck:			; UNROLL: vector.memcheck:
	; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[C:%.*]], i64 [[N]]			; UNROLL-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B:%.*]], i64 [[N]]			; UNROLL-NEXT: [[C2:%.]] = ptrtoint i32 [[C:%.*]] to i64
	; UNROLL-NEXT: [[SCEVGEP6:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 0, i64 1			; UNROLL-NEXT: [[A3:%.]] = ptrtoint [2 x i16] [[A]] to i64
	; UNROLL-NEXT: [[SCEVGEP8:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0			; UNROLL-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[C2]]
	; UNROLL-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP4]], [[C]]			; UNROLL-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
	; UNROLL-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[B]]			; UNROLL-NEXT: [[TMP1:%.*]] = add nuw i64 [[A3]], 2
	; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; UNROLL-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], [[C2]]
	; UNROLL-NEXT: [[TMP0:%.]] = bitcast i16 [[SCEVGEP8]] to i32*			; UNROLL-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP2]], 32
	; UNROLL-NEXT: [[BOUND010:%.]] = icmp ugt i32 [[TMP0]], [[C]]			; UNROLL-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; UNROLL-NEXT: [[TMP1:%.]] = bitcast i32 [[SCEVGEP]] to i16*			; UNROLL-NEXT: [[TMP3:%.*]] = sub i64 [[B1]], [[TMP1]]
	; UNROLL-NEXT: [[BOUND111:%.]] = icmp ult i16 [[SCEVGEP6]], [[TMP1]]			; UNROLL-NEXT: [[DIFF_CHECK5:%.*]] = icmp ult i64 [[TMP3]], 32
	; UNROLL-NEXT: [[FOUND_CONFLICT12:%.*]] = and i1 [[BOUND010]], [[BOUND111]]			; UNROLL-NEXT: [[CONFLICT_RDX6:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK5]]
	; UNROLL-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT12]]
	; UNROLL-NEXT: [[TMP2:%.]] = bitcast i16 [[SCEVGEP8]] to i32*
	; UNROLL-NEXT: [[BOUND013:%.]] = icmp ugt i32 [[TMP2]], [[B]]
	; UNROLL-NEXT: [[TMP3:%.]] = bitcast i32 [[SCEVGEP4]] to i16*
	; UNROLL-NEXT: [[BOUND114:%.]] = icmp ult i16 [[SCEVGEP6]], [[TMP3]]
	; UNROLL-NEXT: [[FOUND_CONFLICT15:%.*]] = and i1 [[BOUND013]], [[BOUND114]]
	; UNROLL-NEXT: [[CONFLICT_RDX16:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT15]]
	; UNROLL-NEXT: br i1 [[CONFLICT_RDX16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL: vector.ph:			; UNROLL: vector.ph:
	; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -8			; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -8
	; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP38:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP38:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 1			; UNROLL-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 1
	; UNROLL-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 2			; UNROLL-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP6:%.*]] = or i64 [[INDEX]], 3			; UNROLL-NEXT: [[TMP6:%.*]] = or i64 [[INDEX]], 3
	; UNROLL-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 4			; UNROLL-NEXT: [[TMP7:%.*]] = or i64 [[INDEX]], 4
	; UNROLL-NEXT: [[TMP8:%.*]] = or i64 [[INDEX]], 5			; UNROLL-NEXT: [[TMP8:%.*]] = or i64 [[INDEX]], 5
	; UNROLL-NEXT: [[TMP9:%.*]] = or i64 [[INDEX]], 6			; UNROLL-NEXT: [[TMP9:%.*]] = or i64 [[INDEX]], 6
	; UNROLL-NEXT: [[TMP10:%.*]] = or i64 [[INDEX]], 7			; UNROLL-NEXT: [[TMP10:%.*]] = or i64 [[INDEX]], 7
	; UNROLL-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]			; UNROLL-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP12:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDEX]], i64 1			; UNROLL-NEXT: [[TMP12:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDEX]], i64 1
	; UNROLL-NEXT: [[TMP13:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1			; UNROLL-NEXT: [[TMP13:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1
	; UNROLL-NEXT: [[TMP14:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1			; UNROLL-NEXT: [[TMP14:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1
	; UNROLL-NEXT: [[TMP15:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1			; UNROLL-NEXT: [[TMP15:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1
	; UNROLL-NEXT: [[TMP16:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP7]], i64 1			; UNROLL-NEXT: [[TMP16:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP7]], i64 1
	; UNROLL-NEXT: [[TMP17:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP8]], i64 1			; UNROLL-NEXT: [[TMP17:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP8]], i64 1
	; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP9]], i64 1			; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP9]], i64 1
	; UNROLL-NEXT: [[TMP19:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP10]], i64 1			; UNROLL-NEXT: [[TMP19:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP10]], i64 1
	; UNROLL-NEXT: [[TMP20:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*			; UNROLL-NEXT: [[TMP20:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP20]], align 4, !alias.scope !33, !noalias !36			; UNROLL-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP20]], align 4
	; UNROLL-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i64 4			; UNROLL-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i64 4
	; UNROLL-NEXT: [[TMP22:%.]] = bitcast i32 [[TMP21]] to <4 x i32>*			; UNROLL-NEXT: [[TMP22:%.]] = bitcast i32 [[TMP21]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP22]], align 4, !alias.scope !33, !noalias !36			; UNROLL-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP22]], align 4
	; UNROLL-NEXT: [[TMP23:%.]] = load i16, i16 [[TMP12]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP23:%.]] = load i16, i16 [[TMP12]], align 2
	; UNROLL-NEXT: [[TMP24:%.]] = load i16, i16 [[TMP13]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP24:%.]] = load i16, i16 [[TMP13]], align 2
	; UNROLL-NEXT: [[TMP25:%.]] = load i16, i16 [[TMP14]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP25:%.]] = load i16, i16 [[TMP14]], align 2
	; UNROLL-NEXT: [[TMP26:%.]] = load i16, i16 [[TMP15]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP26:%.]] = load i16, i16 [[TMP15]], align 2
	; UNROLL-NEXT: [[TMP27:%.*]] = insertelement <4 x i16> poison, i16 [[TMP23]], i64 0			; UNROLL-NEXT: [[TMP27:%.*]] = insertelement <4 x i16> poison, i16 [[TMP23]], i64 0
	; UNROLL-NEXT: [[TMP28:%.*]] = insertelement <4 x i16> [[TMP27]], i16 [[TMP24]], i64 1			; UNROLL-NEXT: [[TMP28:%.*]] = insertelement <4 x i16> [[TMP27]], i16 [[TMP24]], i64 1
	; UNROLL-NEXT: [[TMP29:%.*]] = insertelement <4 x i16> [[TMP28]], i16 [[TMP25]], i64 2			; UNROLL-NEXT: [[TMP29:%.*]] = insertelement <4 x i16> [[TMP28]], i16 [[TMP25]], i64 2
	; UNROLL-NEXT: [[TMP30:%.*]] = insertelement <4 x i16> [[TMP29]], i16 [[TMP26]], i64 3			; UNROLL-NEXT: [[TMP30:%.*]] = insertelement <4 x i16> [[TMP29]], i16 [[TMP26]], i64 3
	; UNROLL-NEXT: [[TMP31:%.]] = load i16, i16 [[TMP16]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP31:%.]] = load i16, i16 [[TMP16]], align 2
	; UNROLL-NEXT: [[TMP32:%.]] = load i16, i16 [[TMP17]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP32:%.]] = load i16, i16 [[TMP17]], align 2
	; UNROLL-NEXT: [[TMP33:%.]] = load i16, i16 [[TMP18]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP33:%.]] = load i16, i16 [[TMP18]], align 2
	; UNROLL-NEXT: [[TMP34:%.]] = load i16, i16 [[TMP19]], align 2, !alias.scope !39			; UNROLL-NEXT: [[TMP34:%.]] = load i16, i16 [[TMP19]], align 2
	; UNROLL-NEXT: [[TMP35:%.*]] = insertelement <4 x i16> poison, i16 [[TMP31]], i64 0			; UNROLL-NEXT: [[TMP35:%.*]] = insertelement <4 x i16> poison, i16 [[TMP31]], i64 0
	; UNROLL-NEXT: [[TMP36:%.*]] = insertelement <4 x i16> [[TMP35]], i16 [[TMP32]], i64 1			; UNROLL-NEXT: [[TMP36:%.*]] = insertelement <4 x i16> [[TMP35]], i16 [[TMP32]], i64 1
	; UNROLL-NEXT: [[TMP37:%.*]] = insertelement <4 x i16> [[TMP36]], i16 [[TMP33]], i64 2			; UNROLL-NEXT: [[TMP37:%.*]] = insertelement <4 x i16> [[TMP36]], i16 [[TMP33]], i64 2
	; UNROLL-NEXT: [[TMP38]] = insertelement <4 x i16> [[TMP37]], i16 [[TMP34]], i64 3			; UNROLL-NEXT: [[TMP38]] = insertelement <4 x i16> [[TMP37]], i16 [[TMP34]], i64 3
	; UNROLL-NEXT: [[TMP39:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP29]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP39:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP29]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP40:%.*]] = shufflevector <4 x i16> [[TMP30]], <4 x i16> [[TMP37]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP40:%.*]] = shufflevector <4 x i16> [[TMP30]], <4 x i16> [[TMP37]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP41:%.*]] = sext <4 x i16> [[TMP39]] to <4 x i32>			; UNROLL-NEXT: [[TMP41:%.*]] = sext <4 x i16> [[TMP39]] to <4 x i32>
	; UNROLL-NEXT: [[TMP42:%.*]] = sext <4 x i16> [[TMP40]] to <4 x i32>			; UNROLL-NEXT: [[TMP42:%.*]] = sext <4 x i16> [[TMP40]] to <4 x i32>
	; UNROLL-NEXT: [[TMP43:%.*]] = sext <4 x i16> [[TMP30]] to <4 x i32>			; UNROLL-NEXT: [[TMP43:%.*]] = sext <4 x i16> [[TMP30]] to <4 x i32>
	; UNROLL-NEXT: [[TMP44:%.*]] = sext <4 x i16> [[TMP38]] to <4 x i32>			; UNROLL-NEXT: [[TMP44:%.*]] = sext <4 x i16> [[TMP38]] to <4 x i32>
	; UNROLL-NEXT: [[TMP45:%.*]] = mul nsw <4 x i32> [[TMP43]], [[TMP41]]			; UNROLL-NEXT: [[TMP45:%.*]] = mul nsw <4 x i32> [[TMP43]], [[TMP41]]
	; UNROLL-NEXT: [[TMP46:%.*]] = mul nsw <4 x i32> [[TMP44]], [[TMP42]]			; UNROLL-NEXT: [[TMP46:%.*]] = mul nsw <4 x i32> [[TMP44]], [[TMP42]]
	; UNROLL-NEXT: [[TMP47:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; UNROLL-NEXT: [[TMP47:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP48:%.]] = bitcast i32 [[TMP47]] to <4 x i32>*			; UNROLL-NEXT: [[TMP48:%.]] = bitcast i32 [[TMP47]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP45]], <4 x i32>* [[TMP48]], align 4, !alias.scope !40, !noalias !39			; UNROLL-NEXT: store <4 x i32> [[TMP45]], <4 x i32>* [[TMP48]], align 4
	; UNROLL-NEXT: [[TMP49:%.]] = getelementptr inbounds i32, i32 [[TMP47]], i64 4			; UNROLL-NEXT: [[TMP49:%.]] = getelementptr inbounds i32, i32 [[TMP47]], i64 4
	; UNROLL-NEXT: [[TMP50:%.]] = bitcast i32 [[TMP49]] to <4 x i32>*			; UNROLL-NEXT: [[TMP50:%.]] = bitcast i32 [[TMP49]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP46]], <4 x i32>* [[TMP50]], align 4, !alias.scope !40, !noalias !39			; UNROLL-NEXT: store <4 x i32> [[TMP46]], <4 x i32>* [[TMP50]], align 4
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP51]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP51]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	; UNROLL-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP34]], [[MIDDLE_BLOCK]] ]			; UNROLL-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP34]], [[MIDDLE_BLOCK]] ]
	Show All 14 Lines
	; UNROLL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP42:![0-9]+]]			; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP42:![0-9]+]]
	; UNROLL: for.end:			; UNROLL: for.end:
	; UNROLL-NEXT: ret void			; UNROLL-NEXT: ret void
	;			;
	; UNROLL-NO-IC-LABEL: @PR34711(			; UNROLL-NO-IC-LABEL: @PR34711(
	; UNROLL-NO-IC-NEXT: entry:			; UNROLL-NO-IC-NEXT: entry:
	; UNROLL-NO-IC-NEXT: [[C1:%.]] = bitcast i32 [[C:%.]] to i8			; UNROLL-NO-IC-NEXT: [[A3:%.]] = ptrtoint [2 x i16] [[A:%.*]] to i64
	; UNROLL-NO-IC-NEXT: [[B3:%.]] = bitcast i32 [[B:%.]] to i8			; UNROLL-NO-IC-NEXT: [[C2:%.]] = ptrtoint i32 [[C:%.*]] to i64
				; UNROLL-NO-IC-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NO-IC-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0			; UNROLL-NO-IC-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0
	; UNROLL-NO-IC-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2			; UNROLL-NO-IC-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2
	; UNROLL-NO-IC-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8			; UNROLL-NO-IC-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8
	; UNROLL-NO-IC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NO-IC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL-NO-IC: vector.memcheck:			; UNROLL-NO-IC: vector.memcheck:
	; UNROLL-NO-IC-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[C]], i64 [[N]]			; UNROLL-NO-IC-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[C2]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*			; UNROLL-NO-IC-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
	; UNROLL-NO-IC-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]			; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add nuw i64 [[A3]], 2
	; UNROLL-NO-IC-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*			; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], [[C2]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP6:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 0, i64 1			; UNROLL-NO-IC-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP2]], 32
	; UNROLL-NO-IC-NEXT: [[SCEVGEP67:%.]] = bitcast i16 [[SCEVGEP6]] to i8*			; UNROLL-NO-IC-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP8:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0			; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = sub i64 [[B1]], [[TMP1]]
	; UNROLL-NO-IC-NEXT: [[SCEVGEP89:%.]] = bitcast i16 [[SCEVGEP8]] to i8*			; UNROLL-NO-IC-NEXT: [[DIFF_CHECK5:%.*]] = icmp ult i64 [[TMP3]], 32
	; UNROLL-NO-IC-NEXT: [[BOUND0:%.]] = icmp ult i8 [[C1]], [[SCEVGEP45]]			; UNROLL-NO-IC-NEXT: [[CONFLICT_RDX6:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK5]]
	; UNROLL-NO-IC-NEXT: [[BOUND1:%.]] = icmp ult i8 [[B3]], [[SCEVGEP2]]			; UNROLL-NO-IC-NEXT: br i1 [[CONFLICT_RDX6]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-IC-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NO-IC-NEXT: [[BOUND010:%.]] = icmp ult i8 [[C1]], [[SCEVGEP89]]
	; UNROLL-NO-IC-NEXT: [[BOUND111:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP2]]
	; UNROLL-NO-IC-NEXT: [[FOUND_CONFLICT12:%.*]] = and i1 [[BOUND010]], [[BOUND111]]
	; UNROLL-NO-IC-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT12]]
	; UNROLL-NO-IC-NEXT: [[BOUND013:%.]] = icmp ult i8 [[B3]], [[SCEVGEP89]]
	; UNROLL-NO-IC-NEXT: [[BOUND114:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP45]]
	; UNROLL-NO-IC-NEXT: [[FOUND_CONFLICT15:%.*]] = and i1 [[BOUND013]], [[BOUND114]]
	; UNROLL-NO-IC-NEXT: [[CONFLICT_RDX16:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT15]]
	; UNROLL-NO-IC-NEXT: br i1 [[CONFLICT_RDX16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-IC: vector.ph:			; UNROLL-NO-IC: vector.ph:
	; UNROLL-NO-IC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 8			; UNROLL-NO-IC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 8
	; UNROLL-NO-IC-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; UNROLL-NO-IC-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i32 3
	; UNROLL-NO-IC-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-IC-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP37:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP37:%.]], [[VECTOR_BODY]] ]
	Show All 12 Lines
	; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP2]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP2]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP3]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP3]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP4]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP5]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP16:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP16:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP6]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP17:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP7]], i64 1			; UNROLL-NO-IC-NEXT: [[TMP17:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP7]], i64 1
	; UNROLL-NO-IC-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP19]], align 4, !alias.scope !33, !noalias !36			; UNROLL-NO-IC-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP19]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP21]], align 4, !alias.scope !33, !noalias !36			; UNROLL-NO-IC-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP21]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = load i16, i16 [[TMP10]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = load i16, i16 [[TMP10]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = load i16, i16 [[TMP11]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = load i16, i16 [[TMP11]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = load i16, i16 [[TMP12]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = load i16, i16 [[TMP12]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = load i16, i16 [[TMP13]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = load i16, i16 [[TMP13]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP26:%.*]] = insertelement <4 x i16> poison, i16 [[TMP22]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP26:%.*]] = insertelement <4 x i16> poison, i16 [[TMP22]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = insertelement <4 x i16> [[TMP26]], i16 [[TMP23]], i32 1			; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = insertelement <4 x i16> [[TMP26]], i16 [[TMP23]], i32 1
	; UNROLL-NO-IC-NEXT: [[TMP28:%.*]] = insertelement <4 x i16> [[TMP27]], i16 [[TMP24]], i32 2			; UNROLL-NO-IC-NEXT: [[TMP28:%.*]] = insertelement <4 x i16> [[TMP27]], i16 [[TMP24]], i32 2
	; UNROLL-NO-IC-NEXT: [[TMP29:%.*]] = insertelement <4 x i16> [[TMP28]], i16 [[TMP25]], i32 3			; UNROLL-NO-IC-NEXT: [[TMP29:%.*]] = insertelement <4 x i16> [[TMP28]], i16 [[TMP25]], i32 3
	; UNROLL-NO-IC-NEXT: [[TMP30:%.]] = load i16, i16 [[TMP14]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP30:%.]] = load i16, i16 [[TMP14]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP31:%.]] = load i16, i16 [[TMP15]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP31:%.]] = load i16, i16 [[TMP15]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP32:%.]] = load i16, i16 [[TMP16]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP32:%.]] = load i16, i16 [[TMP16]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP33:%.]] = load i16, i16 [[TMP17]], align 2, !alias.scope !39			; UNROLL-NO-IC-NEXT: [[TMP33:%.]] = load i16, i16 [[TMP17]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP34:%.*]] = insertelement <4 x i16> poison, i16 [[TMP30]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP34:%.*]] = insertelement <4 x i16> poison, i16 [[TMP30]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP35:%.*]] = insertelement <4 x i16> [[TMP34]], i16 [[TMP31]], i32 1			; UNROLL-NO-IC-NEXT: [[TMP35:%.*]] = insertelement <4 x i16> [[TMP34]], i16 [[TMP31]], i32 1
	; UNROLL-NO-IC-NEXT: [[TMP36:%.*]] = insertelement <4 x i16> [[TMP35]], i16 [[TMP32]], i32 2			; UNROLL-NO-IC-NEXT: [[TMP36:%.*]] = insertelement <4 x i16> [[TMP35]], i16 [[TMP32]], i32 2
	; UNROLL-NO-IC-NEXT: [[TMP37]] = insertelement <4 x i16> [[TMP36]], i16 [[TMP33]], i32 3			; UNROLL-NO-IC-NEXT: [[TMP37]] = insertelement <4 x i16> [[TMP36]], i16 [[TMP33]], i32 3
	; UNROLL-NO-IC-NEXT: [[TMP38:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP29]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP38:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP29]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP39:%.*]] = shufflevector <4 x i16> [[TMP29]], <4 x i16> [[TMP37]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP39:%.*]] = shufflevector <4 x i16> [[TMP29]], <4 x i16> [[TMP37]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP40:%.*]] = sext <4 x i16> [[TMP38]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP40:%.*]] = sext <4 x i16> [[TMP38]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP41:%.*]] = sext <4 x i16> [[TMP39]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP41:%.*]] = sext <4 x i16> [[TMP39]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP42:%.*]] = sext <4 x i16> [[TMP29]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP42:%.*]] = sext <4 x i16> [[TMP29]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP43:%.*]] = sext <4 x i16> [[TMP37]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP43:%.*]] = sext <4 x i16> [[TMP37]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP44:%.*]] = mul nsw <4 x i32> [[TMP42]], [[TMP40]]			; UNROLL-NO-IC-NEXT: [[TMP44:%.*]] = mul nsw <4 x i32> [[TMP42]], [[TMP40]]
	; UNROLL-NO-IC-NEXT: [[TMP45:%.*]] = mul nsw <4 x i32> [[TMP43]], [[TMP41]]			; UNROLL-NO-IC-NEXT: [[TMP45:%.*]] = mul nsw <4 x i32> [[TMP43]], [[TMP41]]
	; UNROLL-NO-IC-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]			; UNROLL-NO-IC-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]
	; UNROLL-NO-IC-NEXT: [[TMP47:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP4]]			; UNROLL-NO-IC-NEXT: [[TMP47:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP4]]
	; UNROLL-NO-IC-NEXT: [[TMP48:%.]] = getelementptr inbounds i32, i32 [[TMP46]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP48:%.]] = getelementptr inbounds i32, i32 [[TMP46]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP49:%.]] = bitcast i32 [[TMP48]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP49:%.]] = bitcast i32 [[TMP48]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP44]], <4 x i32>* [[TMP49]], align 4, !alias.scope !40, !noalias !39			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP44]], <4 x i32>* [[TMP49]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 [[TMP46]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 [[TMP46]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP51:%.]] = bitcast i32 [[TMP50]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP51:%.]] = bitcast i32 [[TMP50]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP45]], <4 x i32>* [[TMP51]], align 4, !alias.scope !40, !noalias !39			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP45]], <4 x i32>* [[TMP51]], align 4
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NO-IC-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP37]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP37]], i32 3
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[TMP37]], i32 2			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[TMP37]], i32 2
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	Show All 16 Lines
	; UNROLL-NO-IC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-IC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP42:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP42:![0-9]+]]
	; UNROLL-NO-IC: for.end:			; UNROLL-NO-IC: for.end:
	; UNROLL-NO-IC-NEXT: ret void			; UNROLL-NO-IC-NEXT: ret void
	;			;
	; UNROLL-NO-VF-LABEL: @PR34711(			; UNROLL-NO-VF-LABEL: @PR34711(
	; UNROLL-NO-VF-NEXT: entry:			; UNROLL-NO-VF-NEXT: entry:
	; UNROLL-NO-VF-NEXT: [[C1:%.]] = bitcast i32 [[C:%.]] to i8			; UNROLL-NO-VF-NEXT: [[A3:%.]] = ptrtoint [2 x i16] [[A:%.*]] to i64
	; UNROLL-NO-VF-NEXT: [[B3:%.]] = bitcast i32 [[B:%.]] to i8			; UNROLL-NO-VF-NEXT: [[C2:%.]] = ptrtoint i32 [[C:%.*]] to i64
				; UNROLL-NO-VF-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; UNROLL-NO-VF-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0			; UNROLL-NO-VF-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0
	; UNROLL-NO-VF-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2			; UNROLL-NO-VF-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2
	; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2			; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2
	; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL-NO-VF: vector.memcheck:			; UNROLL-NO-VF: vector.memcheck:
	; UNROLL-NO-VF-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[C]], i64 [[N]]			; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[C2]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*			; UNROLL-NO-VF-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 8
	; UNROLL-NO-VF-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add nuw i64 [[A3]], 2
	; UNROLL-NO-VF-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], [[C2]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP6:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 0, i64 1			; UNROLL-NO-VF-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP2]], 8
	; UNROLL-NO-VF-NEXT: [[SCEVGEP67:%.]] = bitcast i16 [[SCEVGEP6]] to i8*			; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP8:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = sub i64 [[B1]], [[TMP1]]
	; UNROLL-NO-VF-NEXT: [[SCEVGEP89:%.]] = bitcast i16 [[SCEVGEP8]] to i8*			; UNROLL-NO-VF-NEXT: [[DIFF_CHECK5:%.*]] = icmp ult i64 [[TMP3]], 8
	; UNROLL-NO-VF-NEXT: [[BOUND0:%.]] = icmp ult i8 [[C1]], [[SCEVGEP45]]			; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX6:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK5]]
	; UNROLL-NO-VF-NEXT: [[BOUND1:%.]] = icmp ult i8 [[B3]], [[SCEVGEP2]]			; UNROLL-NO-VF-NEXT: br i1 [[CONFLICT_RDX6]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NO-VF-NEXT: [[BOUND010:%.]] = icmp ult i8 [[C1]], [[SCEVGEP89]]
	; UNROLL-NO-VF-NEXT: [[BOUND111:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP2]]
	; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT12:%.*]] = and i1 [[BOUND010]], [[BOUND111]]
	; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT12]]
	; UNROLL-NO-VF-NEXT: [[BOUND013:%.]] = icmp ult i8 [[B3]], [[SCEVGEP89]]
	; UNROLL-NO-VF-NEXT: [[BOUND114:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP45]]
	; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT15:%.*]] = and i1 [[BOUND013]], [[BOUND114]]
	; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX16:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT15]]
	; UNROLL-NO-VF-NEXT: br i1 [[CONFLICT_RDX16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL-NO-VF: vector.ph:			; UNROLL-NO-VF: vector.ph:
	; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2			; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
	; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-VF-NEXT: [[INDUCTION17:%.*]] = add i64 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[INDUCTION17:%.*]] = add i64 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDUCTION17]]			; UNROLL-NO-VF-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDUCTION17]]
	; UNROLL-NO-VF-NEXT: [[TMP2:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDUCTION]], i64 1			; UNROLL-NO-VF-NEXT: [[TMP2:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDUCTION]], i64 1
	; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDUCTION17]], i64 1			; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[INDUCTION17]], i64 1
	; UNROLL-NO-VF-NEXT: store i32 7, i32* [[TMP0]], align 4, !alias.scope !32, !noalias !35			; UNROLL-NO-VF-NEXT: store i32 7, i32* [[TMP0]], align 4
	; UNROLL-NO-VF-NEXT: store i32 7, i32* [[TMP1]], align 4, !alias.scope !32, !noalias !35			; UNROLL-NO-VF-NEXT: store i32 7, i32* [[TMP1]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = load i16, i16 [[TMP2]], align 2, !alias.scope !38			; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = load i16, i16 [[TMP2]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP5]] = load i16, i16* [[TMP3]], align 2, !alias.scope !38			; UNROLL-NO-VF-NEXT: [[TMP5]] = load i16, i16* [[TMP3]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = sext i16 [[VECTOR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = sext i16 [[VECTOR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[TMP4]] to i32			; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[TMP4]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP4]] to i32			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP4]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = sext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = sext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = mul nsw i32 [[TMP8]], [[TMP6]]			; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = mul nsw i32 [[TMP8]], [[TMP6]]
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = mul nsw i32 [[TMP9]], [[TMP7]]			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = mul nsw i32 [[TMP9]], [[TMP7]]
	; UNROLL-NO-VF-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION17]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION17]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP10]], i32* [[TMP12]], align 4, !alias.scope !39, !noalias !38			; UNROLL-NO-VF-NEXT: store i32 [[TMP10]], i32* [[TMP12]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4, !alias.scope !39, !noalias !38			; UNROLL-NO-VF-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP40:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP40:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	Show All 14 Lines
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @PR34711(			; SINK-AFTER-LABEL: @PR34711(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[C1:%.]] = bitcast i32 [[C:%.]] to i8			; SINK-AFTER-NEXT: [[A3:%.]] = ptrtoint [2 x i16] [[A:%.*]] to i64
	; SINK-AFTER-NEXT: [[B3:%.]] = bitcast i32 [[B:%.]] to i8			; SINK-AFTER-NEXT: [[C2:%.]] = ptrtoint i32 [[C:%.*]] to i64
				; SINK-AFTER-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; SINK-AFTER-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0			; SINK-AFTER-NEXT: [[PRE_INDEX:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 0
	; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2			; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, i16 [[PRE_INDEX]], align 2
	; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; SINK-AFTER: vector.memcheck:			; SINK-AFTER: vector.memcheck:
	; SINK-AFTER-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[C]], i64 [[N]]			; SINK-AFTER-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[C2]]
	; SINK-AFTER-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*			; SINK-AFTER-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
	; SINK-AFTER-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]			; SINK-AFTER-NEXT: [[TMP1:%.*]] = add nuw i64 [[A3]], 2
	; SINK-AFTER-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*			; SINK-AFTER-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], [[C2]]
	; SINK-AFTER-NEXT: [[SCEVGEP6:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 0, i64 1			; SINK-AFTER-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP2]], 16
	; SINK-AFTER-NEXT: [[SCEVGEP67:%.]] = bitcast i16 [[SCEVGEP6]] to i8*			; SINK-AFTER-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; SINK-AFTER-NEXT: [[SCEVGEP8:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0			; SINK-AFTER-NEXT: [[TMP3:%.*]] = sub i64 [[B1]], [[TMP1]]
	; SINK-AFTER-NEXT: [[SCEVGEP89:%.]] = bitcast i16 [[SCEVGEP8]] to i8*			; SINK-AFTER-NEXT: [[DIFF_CHECK5:%.*]] = icmp ult i64 [[TMP3]], 16
	; SINK-AFTER-NEXT: [[BOUND0:%.]] = icmp ult i8 [[C1]], [[SCEVGEP45]]			; SINK-AFTER-NEXT: [[CONFLICT_RDX6:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK5]]
	; SINK-AFTER-NEXT: [[BOUND1:%.]] = icmp ult i8 [[B3]], [[SCEVGEP2]]			; SINK-AFTER-NEXT: br i1 [[CONFLICT_RDX6]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; SINK-AFTER-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; SINK-AFTER-NEXT: [[BOUND010:%.]] = icmp ult i8 [[C1]], [[SCEVGEP89]]
	; SINK-AFTER-NEXT: [[BOUND111:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP2]]
	; SINK-AFTER-NEXT: [[FOUND_CONFLICT12:%.*]] = and i1 [[BOUND010]], [[BOUND111]]
	; SINK-AFTER-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT12]]
	; SINK-AFTER-NEXT: [[BOUND013:%.]] = icmp ult i8 [[B3]], [[SCEVGEP89]]
	; SINK-AFTER-NEXT: [[BOUND114:%.]] = icmp ult i8 [[SCEVGEP67]], [[SCEVGEP45]]
	; SINK-AFTER-NEXT: [[FOUND_CONFLICT15:%.*]] = and i1 [[BOUND013]], [[BOUND114]]
	; SINK-AFTER-NEXT: [[CONFLICT_RDX16:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT15]]
	; SINK-AFTER-NEXT: br i1 [[CONFLICT_RDX16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4			; SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
	; SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i32 3
	; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]			; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
	; SINK-AFTER: vector.body:			; SINK-AFTER: vector.body:
	; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; SINK-AFTER-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; SINK-AFTER-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2			; SINK-AFTER-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
	; SINK-AFTER-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3			; SINK-AFTER-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
	; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[TMP0]]			; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[TMP0]]
	; SINK-AFTER-NEXT: [[TMP5:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP0]], i64 1			; SINK-AFTER-NEXT: [[TMP5:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP0]], i64 1
	; SINK-AFTER-NEXT: [[TMP6:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP1]], i64 1			; SINK-AFTER-NEXT: [[TMP6:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP1]], i64 1
	; SINK-AFTER-NEXT: [[TMP7:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP2]], i64 1			; SINK-AFTER-NEXT: [[TMP7:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP2]], i64 1
	; SINK-AFTER-NEXT: [[TMP8:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP3]], i64 1			; SINK-AFTER-NEXT: [[TMP8:%.]] = getelementptr inbounds [2 x i16], [2 x i16] [[A]], i64 [[TMP3]], i64 1
	; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0			; SINK-AFTER-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
	; SINK-AFTER-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*
	; SINK-AFTER-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP10]], align 4, !alias.scope !33, !noalias !36			; SINK-AFTER-NEXT: store <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32>* [[TMP10]], align 4
	; SINK-AFTER-NEXT: [[TMP11:%.]] = load i16, i16 [[TMP5]], align 2, !alias.scope !39			; SINK-AFTER-NEXT: [[TMP11:%.]] = load i16, i16 [[TMP5]], align 2
	; SINK-AFTER-NEXT: [[TMP12:%.]] = load i16, i16 [[TMP6]], align 2, !alias.scope !39			; SINK-AFTER-NEXT: [[TMP12:%.]] = load i16, i16 [[TMP6]], align 2
	; SINK-AFTER-NEXT: [[TMP13:%.]] = load i16, i16 [[TMP7]], align 2, !alias.scope !39			; SINK-AFTER-NEXT: [[TMP13:%.]] = load i16, i16 [[TMP7]], align 2
	; SINK-AFTER-NEXT: [[TMP14:%.]] = load i16, i16 [[TMP8]], align 2, !alias.scope !39			; SINK-AFTER-NEXT: [[TMP14:%.]] = load i16, i16 [[TMP8]], align 2
	; SINK-AFTER-NEXT: [[TMP15:%.*]] = insertelement <4 x i16> poison, i16 [[TMP11]], i32 0			; SINK-AFTER-NEXT: [[TMP15:%.*]] = insertelement <4 x i16> poison, i16 [[TMP11]], i32 0
	; SINK-AFTER-NEXT: [[TMP16:%.*]] = insertelement <4 x i16> [[TMP15]], i16 [[TMP12]], i32 1			; SINK-AFTER-NEXT: [[TMP16:%.*]] = insertelement <4 x i16> [[TMP15]], i16 [[TMP12]], i32 1
	; SINK-AFTER-NEXT: [[TMP17:%.*]] = insertelement <4 x i16> [[TMP16]], i16 [[TMP13]], i32 2			; SINK-AFTER-NEXT: [[TMP17:%.*]] = insertelement <4 x i16> [[TMP16]], i16 [[TMP13]], i32 2
	; SINK-AFTER-NEXT: [[TMP18]] = insertelement <4 x i16> [[TMP17]], i16 [[TMP14]], i32 3			; SINK-AFTER-NEXT: [[TMP18]] = insertelement <4 x i16> [[TMP17]], i16 [[TMP14]], i32 3
	; SINK-AFTER-NEXT: [[TMP19:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP18]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER-NEXT: [[TMP19:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP18]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER-NEXT: [[TMP20:%.*]] = sext <4 x i16> [[TMP19]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP20:%.*]] = sext <4 x i16> [[TMP19]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP21:%.*]] = sext <4 x i16> [[TMP18]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP21:%.*]] = sext <4 x i16> [[TMP18]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP22:%.*]] = mul nsw <4 x i32> [[TMP21]], [[TMP20]]			; SINK-AFTER-NEXT: [[TMP22:%.*]] = mul nsw <4 x i32> [[TMP21]], [[TMP20]]
	; SINK-AFTER-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]			; SINK-AFTER-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]
	; SINK-AFTER-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP23]], i32 0			; SINK-AFTER-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP23]], i32 0
	; SINK-AFTER-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <4 x i32>*
	; SINK-AFTER-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP25]], align 4, !alias.scope !40, !noalias !39			; SINK-AFTER-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP25]], align 4
	; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; SINK-AFTER-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; SINK-AFTER-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; SINK-AFTER-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; SINK-AFTER: middle.block:			; SINK-AFTER: middle.block:
	; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP18]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP18]], i32 3
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[TMP18]], i32 2			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[TMP18]], i32 2
	; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2, !alias.scope !43			; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>
	; CHECK-NEXT: [[TMP9:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; CHECK-NEXT: [[TMP10:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !alias.scope !46, !noalias !43			; CHECK-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i64 3
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	Show All 38 Lines
	; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD7:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1			; UNROLL-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
	; UNROLL-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; UNROLL-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; UNROLL-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; UNROLL-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2, !alias.scope !43			; UNROLL-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP5]], align 2
	; UNROLL-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[TMP4]], i64 4			; UNROLL-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[TMP4]], i64 4
	; UNROLL-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <4 x i16>*			; UNROLL-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <4 x i16>*
	; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP7]], align 2, !alias.scope !43			; UNROLL-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP7]], align 2
	; UNROLL-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP9:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NEXT: [[TMP9:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NEXT: [[TMP10:%.*]] = sext <4 x i16> [[TMP8]] to <4 x i32>			; UNROLL-NEXT: [[TMP10:%.*]] = sext <4 x i16> [[TMP8]] to <4 x i32>
	; UNROLL-NEXT: [[TMP11:%.*]] = sext <4 x i16> [[TMP9]] to <4 x i32>			; UNROLL-NEXT: [[TMP11:%.*]] = sext <4 x i16> [[TMP9]] to <4 x i32>
	; UNROLL-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[TMP10]], <i32 2, i32 2, i32 2, i32 2>			; UNROLL-NEXT: [[TMP12:%.*]] = add nsw <4 x i32> [[TMP10]], <i32 2, i32 2, i32 2, i32 2>
	; UNROLL-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP11]], <i32 2, i32 2, i32 2, i32 2>			; UNROLL-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP11]], <i32 2, i32 2, i32 2, i32 2>
	; UNROLL-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; UNROLL-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; UNROLL-NEXT: [[TMP15:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>			; UNROLL-NEXT: [[TMP15:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>
	; UNROLL-NEXT: [[TMP16:%.*]] = mul nsw <4 x i32> [[TMP12]], [[TMP14]]			; UNROLL-NEXT: [[TMP16:%.*]] = mul nsw <4 x i32> [[TMP12]], [[TMP14]]
	; UNROLL-NEXT: [[TMP17:%.*]] = mul nsw <4 x i32> [[TMP13]], [[TMP15]]			; UNROLL-NEXT: [[TMP17:%.*]] = mul nsw <4 x i32> [[TMP13]], [[TMP15]]
	; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; UNROLL-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*			; UNROLL-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP16]], <4 x i32>* [[TMP19]], align 4, !alias.scope !46, !noalias !43			; UNROLL-NEXT: store <4 x i32> [[TMP16]], <4 x i32>* [[TMP19]], align 4
	; UNROLL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP18]], i64 4			; UNROLL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP18]], i64 4
	; UNROLL-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <4 x i32>*			; UNROLL-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <4 x i32>*
	; UNROLL-NEXT: store <4 x i32> [[TMP17]], <4 x i32>* [[TMP21]], align 4, !alias.scope !46, !noalias !43			; UNROLL-NEXT: store <4 x i32> [[TMP17]], <4 x i32>* [[TMP21]], align 4
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NEXT: [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i64 3			; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i64 3
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 4			; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 4
	; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP2]], 1			; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP2]], 1
	; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]			; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP3]]
	; UNROLL-NO-IC-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP4]]			; UNROLL-NO-IC-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP4]]
	; UNROLL-NO-IC-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP8]], align 2, !alias.scope !43			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP8]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*			; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <4 x i16>*
	; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2, !alias.scope !43			; UNROLL-NO-IC-NEXT: [[WIDE_LOAD7]] = load <4 x i16>, <4 x i16>* [[TMP10]], align 2
	; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP12:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; UNROLL-NO-IC-NEXT: [[TMP12:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; UNROLL-NO-IC-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[TMP11]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP13:%.*]] = sext <4 x i16> [[TMP11]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[TMP12]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP14:%.*]] = sext <4 x i16> [[TMP12]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 2, i32 2, i32 2, i32 2>			; UNROLL-NO-IC-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP13]], <i32 2, i32 2, i32 2, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 2, i32 2, i32 2, i32 2>			; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 2, i32 2, i32 2, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>			; UNROLL-NO-IC-NEXT: [[TMP18:%.*]] = sext <4 x i16> [[WIDE_LOAD7]] to <4 x i32>
	; UNROLL-NO-IC-NEXT: [[TMP19:%.*]] = mul nsw <4 x i32> [[TMP15]], [[TMP17]]			; UNROLL-NO-IC-NEXT: [[TMP19:%.*]] = mul nsw <4 x i32> [[TMP15]], [[TMP17]]
	; UNROLL-NO-IC-NEXT: [[TMP20:%.*]] = mul nsw <4 x i32> [[TMP16]], [[TMP18]]			; UNROLL-NO-IC-NEXT: [[TMP20:%.*]] = mul nsw <4 x i32> [[TMP16]], [[TMP18]]
	; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; UNROLL-NO-IC-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]			; UNROLL-NO-IC-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]
	; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP21]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[TMP21]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP19]], <4 x i32>* [[TMP24]], align 4, !alias.scope !46, !noalias !43			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP19]], <4 x i32>* [[TMP24]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP21]], i32 4			; UNROLL-NO-IC-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[TMP21]], i32 4
	; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP26:%.]] = bitcast i32 [[TMP25]] to <4 x i32>*
	; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP20]], <4 x i32>* [[TMP26]], align 4, !alias.scope !46, !noalias !43			; UNROLL-NO-IC-NEXT: store <4 x i32> [[TMP20]], <4 x i32>* [[TMP26]], align 4
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 3			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 3
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 2			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD7]], i32 2
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
	; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[INDUCTION7:%.*]] = add i64 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[INDUCTION]], 1			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[INDUCTION]], 1
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[INDUCTION7]], 1
	; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP1]]
	; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]			; UNROLL-NO-VF-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]
	; UNROLL-NO-VF-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP3]], align 2, !alias.scope !42			; UNROLL-NO-VF-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP3]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP6]] = load i16, i16* [[TMP4]], align 2, !alias.scope !42			; UNROLL-NO-VF-NEXT: [[TMP6]] = load i16, i16* [[TMP4]], align 2
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[VECTOR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = sext i16 [[VECTOR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = sext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP7]], 2			; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = add nsw i32 [[TMP7]], 2
	; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = add nsw i32 [[TMP8]], 2			; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = add nsw i32 [[TMP8]], 2
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = sext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = sext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = sext i16 [[TMP6]] to i32			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = sext i16 [[TMP6]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP9]], [[TMP11]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP9]], [[TMP11]]
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = mul nsw i32 [[TMP10]], [[TMP12]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = mul nsw i32 [[TMP10]], [[TMP12]]
	; UNROLL-NO-VF-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION7]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], i32* [[TMP15]], align 4, !alias.scope !45, !noalias !42			; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], i32* [[TMP15]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP14]], i32* [[TMP16]], align 4, !alias.scope !45, !noalias !42			; UNROLL-NO-VF-NEXT: store i32 [[TMP14]], i32* [[TMP16]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP47:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP47:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[VECTOR_MEMCHECK]] ], [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; SINK-AFTER: vector.body:			; SINK-AFTER: vector.body:
	; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]			; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[WIDE_LOAD:%.]], [[VECTOR_BODY]] ]
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; SINK-AFTER-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]			; SINK-AFTER-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[A]], i64 [[TMP2]]
	; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[TMP3]], i32 0			; SINK-AFTER-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[TMP3]], i32 0
	; SINK-AFTER-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*			; SINK-AFTER-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <4 x i16>*
	; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2, !alias.scope !43			; SINK-AFTER-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2
	; SINK-AFTER-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP7:%.*]] = sext <4 x i16> [[TMP6]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>			; SINK-AFTER-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>
	; SINK-AFTER-NEXT: [[TMP9:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>			; SINK-AFTER-NEXT: [[TMP9:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
	; SINK-AFTER-NEXT: [[TMP10:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP9]]			; SINK-AFTER-NEXT: [[TMP10:%.*]] = mul nsw <4 x i32> [[TMP8]], [[TMP9]]
	; SINK-AFTER-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; SINK-AFTER-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; SINK-AFTER-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0			; SINK-AFTER-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0
	; SINK-AFTER-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*			; SINK-AFTER-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*
	; SINK-AFTER-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP13]], align 4, !alias.scope !46, !noalias !43			; SINK-AFTER-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP13]], align 4
	; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; SINK-AFTER-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; SINK-AFTER-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; SINK-AFTER-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP48:![0-9]+]]
	; SINK-AFTER: middle.block:			; SINK-AFTER: middle.block:
	; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2			; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 1,867 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/fpsat.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s		; RUN: opt %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s

define void @signed(ptr %x, ptr %y, i32 %n) {		define void @signed(ptr %x, ptr %y, i32 %n) {
; CHECK-LABEL: @signed(		; CHECK-LABEL: @signed(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[X2:%.]] = ptrtoint ptr [[X:%.]] to i64
		; CHECK-NEXT: [[Y1:%.]] = ptrtoint ptr [[Y:%.]] to i64
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64		; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2		; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[Y1]], [[X2]]
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[Y:%.]], i64 [[TMP0]]		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[X:%.]], i64 [[TMP0]]		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[Y]], [[UGLYGEP1]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[X]], [[UGLYGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds float, ptr [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds float, ptr [[TMP2]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP3]], align 4, !alias.scope !0		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP3]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[WIDE_LOAD]])		; CHECK-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[WIDE_LOAD]])
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[TMP1]]		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0
; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr [[TMP6]], align 4, !alias.scope !3, !noalias !0		; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr [[TMP6]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[TMP9:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[TMP8]])		; CHECK-NEXT: [[TMP9:%.*]] = tail call i32 @llvm.fptosi.sat.i32.f32(float [[TMP8]])
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[INDVARS_IV]]
; CHECK-NEXT: store i32 [[TMP9]], ptr [[ARRAYIDX2]], align 4		; CHECK-NEXT: store i32 [[TMP9]], ptr [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup		br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader: ; preds = %entry		for.body.preheader: ; preds = %entry
%wide.trip.count = zext i32 %n to i64		%wide.trip.count = zext i32 %n to i64
br label %for.body		br label %for.body
Show All 11 Lines	for.body: ; preds = %for.body.preheader, %for.body
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count		%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body		br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}		}

define void @unsigned(ptr %x, ptr %y, i32 %n) {		define void @unsigned(ptr %x, ptr %y, i32 %n) {
; CHECK-LABEL: @unsigned(		; CHECK-LABEL: @unsigned(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[X2:%.]] = ptrtoint ptr [[X:%.]] to i64
		; CHECK-NEXT: [[Y1:%.]] = ptrtoint ptr [[Y:%.]] to i64
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64		; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[WIDE_TRIP_COUNT]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[WIDE_TRIP_COUNT]], 2		; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[Y1]], [[X2]]
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[Y:%.]], i64 [[TMP0]]		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[X:%.]], i64 [[TMP0]]		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[Y]], [[UGLYGEP1]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[X]], [[UGLYGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], 4
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds float, ptr [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds float, ptr [[TMP2]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP3]], align 4, !alias.scope !8		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP3]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f32(<4 x float> [[WIDE_LOAD]])		; CHECK-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f32(<4 x float> [[WIDE_LOAD]])
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[TMP1]]		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0
; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr [[TMP6]], align 4, !alias.scope !11, !noalias !8		; CHECK-NEXT: store <4 x i32> [[TMP4]], ptr [[TMP6]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[X]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[TMP9:%.*]] = tail call i32 @llvm.fptoui.sat.i32.f32(float [[TMP8]])		; CHECK-NEXT: [[TMP9:%.*]] = tail call i32 @llvm.fptoui.sat.i32.f32(float [[TMP8]])
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[Y]], i64 [[INDVARS_IV]]
; CHECK-NEXT: store i32 [[TMP9]], ptr [[ARRAYIDX2]], align 4		; CHECK-NEXT: store i32 [[TMP9]], ptr [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup		br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader: ; preds = %entry		for.body.preheader: ; preds = %entry
%wide.trip.count = zext i32 %n to i64		%wide.trip.count = zext i32 %n to i64
br label %for.body		br label %for.body
Show All 18 Lines

llvm/test/Transforms/LoopVectorize/multiple-exits-versioning.ll

	; RUN: opt -loop-vectorize -force-vector-width=2 -S %s \| FileCheck %s			; RUN: opt -loop-vectorize -force-vector-width=2 -S %s \| FileCheck %s

	; Test cases to make sure LV & loop versioning can handle loops with			; Test cases to make sure LV & loop versioning can handle loops with
	; multiple exiting branches.			; multiple exiting branches.

	; Multiple branches exiting the loop to a unique exit block. The loop should			; Multiple branches exiting the loop to a unique exit block. The loop should
	; be vectorized with versioning & noalias metadata should be added.			; be vectorized with versioning.
	define void @multiple_exits_unique_exit_block(i32* %A, i32* %B, i64 %N) {			define void @multiple_exits_unique_exit_block(i32* %A, i32* %B, i64 %N) {
	; CHECK-LABEL: @multiple_exits_unique_exit_block			; CHECK-LABEL: @multiple_exits_unique_exit_block
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-LABEL: vector.body:			; CHECK-LABEL: vector.body:
	; CHECK: %wide.load = load <2 x i32>, <2 x i32>* {{.*}}, align 4, !alias.scope			; CHECK: %wide.load = load <2 x i32>, <2 x i32>* {{.*}}, align 4
	; CHECK: store <2 x i32> %wide.load, <2 x i32>* {{.*}}, align 4, !alias.scope			; CHECK: store <2 x i32> %wide.load, <2 x i32>* {{.*}}, align 4
	; CHECK: br			; CHECK: br
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%cond.0 = icmp eq i64 %iv, %N			%cond.0 = icmp eq i64 %iv, %N
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/no_outside_user.ll

	Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines
	; vectorize c[i] = a[i] + b[i] loop where result of c[i] is used outside the			; vectorize c[i] = a[i] + b[i] loop where result of c[i] is used outside the
	; loop			; loop
	; CHECK-LABEL: sum_arrays_outside_use(			; CHECK-LABEL: sum_arrays_outside_use(
	; CHECK-LABEL: vector.memcheck:			; CHECK-LABEL: vector.memcheck:
	; CHECK: br i1 %conflict.rdx, label %scalar.ph, label %vector.ph			; CHECK: br i1 %conflict.rdx, label %scalar.ph, label %vector.ph

	; CHECK-LABEL: vector.body:			; CHECK-LABEL: vector.body:
	; CHECK: %wide.load = load <2 x i32>, <2 x i32>*			; CHECK: %wide.load = load <2 x i32>, <2 x i32>*
	; CHECK: %wide.load16 = load <2 x i32>, <2 x i32>*			; CHECK: %wide.load5 = load <2 x i32>, <2 x i32>*
	; CHECK: [[ADD:%[a-zA-Z0-9.]+]] = add nsw <2 x i32> %wide.load, %wide.load16			; CHECK: [[ADD:%[a-zA-Z0-9.]+]] = add nsw <2 x i32> %wide.load, %wide.load5
	; CHECK: store <2 x i32>			; CHECK: store <2 x i32>

	; CHECK-LABEL: middle.block:			; CHECK-LABEL: middle.block:
	; CHECK: [[E1:%[a-zA-Z0-9.]+]] = extractelement <2 x i32> [[ADD]], i32 1			; CHECK: [[E1:%[a-zA-Z0-9.]+]] = extractelement <2 x i32> [[ADD]], i32 1

	; CHECK-LABEL: f1.exit.loopexit:			; CHECK-LABEL: f1.exit.loopexit:
	; CHECK: %.lcssa = phi i32 [ %sum, %.lr.ph.i ], [ [[E1]], %middle.block ]			; CHECK: %.lcssa = phi i32 [ %sum, %.lr.ph.i ], [ [[E1]], %middle.block ]
	define i32 @sum_arrays_outside_use(i32* %B, i32* %A, i32* %C, i32 %N) {			define i32 @sum_arrays_outside_use(i32* %B, i32* %A, i32* %C, i32 %N) {
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/runtime-check-readonly.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	define void @add_ints(i32* nocapture %A, i32* nocapture %B, i32* nocapture %C) {			define void @add_ints(i32* nocapture %A, i32* nocapture %B, i32* nocapture %C) {
	; CHECK-LABEL: @add_ints(			; CHECK-LABEL: @add_ints(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK-LABEL: vector.memcheck:			; CHECK-LABEL: vector.memcheck:
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 200			; CHECK-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[B:%.*]], i64 200			; CHECK-NEXT: [[B2:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; CHECK-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 [[C:%.*]], i64 200			; CHECK-NEXT: [[C3:%.]] = ptrtoint i32 [[C:%.*]] to i64
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i32 [[SCEVGEP4]], [[A]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[B2]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[B]]			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[A1]], [[C3]]
	; CHECK-NEXT: [[BOUND09:%.]] = icmp ugt i32 [[SCEVGEP7]], [[A]]			; CHECK-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 16
	; CHECK-NEXT: [[BOUND110:%.]] = icmp ugt i32 [[SCEVGEP]], [[C]]			; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
	; CHECK-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]			; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
	; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label %scalar.ph, label %vector.ph
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 16 Lines

llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=loop-vectorize -force-vector-width=2 -S %s \| FileCheck %s			; RUN: opt -passes=loop-vectorize -force-vector-width=2 -S %s \| FileCheck %s

	; Tests where the indices of some accesses are clamped to a small range.			; Tests where the indices of some accesses are clamped to a small range.

	; FIXME: At the moment, the runtime checks require that the indices do not wrap			; FIXME: At the moment, the runtime checks require that the indices do not wrap
	; and runtime checks are emitted to ensure that. The clamped indices do			; and runtime checks are emitted to ensure that. The clamped indices do
	; wrap, so the vector loops are dead at the moment. But it is still			; wrap, so the vector loops are dead at the moment. But it is still
	; possible to compute the bounds of the accesses and generate proper			; possible to compute the bounds of the accesses and generate proper
	; runtime checks.			; runtime checks.

	; The relevant bounds for %gep.A are [%A, %A+4).			; The relevant bounds for %gep.A are [%A, %A+4).
	define void @load_clamped_index(i32* %A, i32* %B, i32 %N) {			define void @load_clamped_index(i32* %A, i32* %B, i32 %N) {
	; CHECK-LABEL: @load_clamped_index(			; CHECK-LABEL: @load_clamped_index(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8			; CHECK-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
	; CHECK-NEXT: [[A3:%.]] = bitcast i32 [[A:%.]] to i8			; CHECK-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i32 [[N:%.]], 2			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i32 [[N:%.]], 2
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
	; CHECK: vector.scevcheck:			; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], -1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[TMP0]], 3			; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[TMP0]], 3
	; CHECK-NEXT: br i1 [[TMP7]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]			; CHECK-NEXT: br i1 [[TMP7]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP10:%.*]] = add i32 [[N]], -1			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 [[B1]], [[A2]]
	; CHECK-NEXT: [[TMP11:%.*]] = zext i32 [[TMP10]] to i64			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8
	; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[TMP11]], 1			; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP12]]
	; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP12]]
	; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP45]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 2			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 2
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP14:%.*]] = urem i32 [[TMP13]], 4			; CHECK-NEXT: [[TMP14:%.*]] = urem i32 [[TMP13]], 4
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP14]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP15]], i32 0			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP15]], i32 0
	; CHECK-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*			; CHECK-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP17]], align 4, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP17]], align 4
	; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i32> [[WIDE_LOAD]], <i32 10, i32 10>			; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i32> [[WIDE_LOAD]], <i32 10, i32 10>
	; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i32 [[TMP13]]			; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[B]], i32 [[TMP13]]
	; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0
	; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <2 x i32>*			; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP18]], <2 x i32>* [[TMP21]], align 4, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <2 x i32> [[TMP18]], <2 x i32>* [[TMP21]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	Show All 30 Lines
	exit:			exit:
	ret void			ret void
	}			}

	; The relevant bounds for %gep.A are [%A, %A+4).			; The relevant bounds for %gep.A are [%A, %A+4).
	define void @store_clamped_index(i32* %A, i32* %B, i32 %N) {			define void @store_clamped_index(i32* %A, i32* %B, i32 %N) {
	; CHECK-LABEL: @store_clamped_index(			; CHECK-LABEL: @store_clamped_index(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8			; CHECK-NEXT: [[B2:%.]] = ptrtoint i32 [[B:%.*]] to i64
	; CHECK-NEXT: [[A3:%.]] = bitcast i32 [[A:%.]] to i8			; CHECK-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i32 [[N:%.]], 2			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i32 [[N:%.]], 2
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
	; CHECK: vector.scevcheck:			; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N]], -1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[TMP0]], 3			; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[TMP0]], 3
	; CHECK-NEXT: br i1 [[TMP7]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]			; CHECK-NEXT: br i1 [[TMP7]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP10:%.*]] = add i32 [[N]], -1			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 [[A1]], [[B2]]
	; CHECK-NEXT: [[TMP11:%.*]] = zext i32 [[TMP10]] to i64			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8
	; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[TMP11]], 1			; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP12]]
	; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP12]]
	; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP45]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 2			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 2
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP14:%.*]] = urem i32 [[TMP13]], 4			; CHECK-NEXT: [[TMP14:%.*]] = urem i32 [[TMP13]], 4
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[B]], i32 [[TMP13]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[B]], i32 [[TMP13]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP15]], i32 0			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP15]], i32 0
	; CHECK-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*			; CHECK-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP17]], align 4, !alias.scope !8, !noalias !11			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP17]], align 4
	; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i32> [[WIDE_LOAD]], <i32 10, i32 10>			; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i32> [[WIDE_LOAD]], <i32 10, i32 10>
	; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP14]]			; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP14]]
	; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP19]], i32 0
	; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <2 x i32>*			; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP18]], <2 x i32>* [[TMP21]], align 4, !alias.scope !11			; CHECK-NEXT: store <2 x i32> [[TMP18]], <2 x i32>* [[TMP21]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/runtime-check.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -disable-basic-aa -S -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s -check-prefix=FORCED_OPTSIZE			; RUN: opt < %s -loop-vectorize -disable-basic-aa -S -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s -check-prefix=FORCED_OPTSIZE

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure we vectorize this loop:			; Make sure we vectorize this loop:
	; int foo(float a, float b, int n) {			; int foo(float a, float b, int n) {
	; for (int i=0; i<n; ++i)			; for (int i=0; i<n; ++i)
	; a[i] = b[i] * 3;			; a[i] = b[i] * 3;
	; }			; }

	define i32 @foo(float* nocapture %a, float* nocapture %b, i32 %n) nounwind uwtable ssp {			define i32 @foo(float* nocapture %a, float* nocapture %b, i32 %n) nounwind uwtable ssp {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0, [[DBG4:!dbg !.*]]			; CHECK-NEXT: [[B2:%.]] = ptrtoint float [[B:%.]] to i64, [[DBG4:!dbg !.]]
				; CHECK-NEXT: [[A1:%.]] = ptrtoint float [[A:%.*]] to i64, [[DBG4]]
				; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0, [[DBG4]]
	; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]], [[DBG4]]			; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]], [[DBG4]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = add i32 [[N]], -1, [[DBG9:!dbg !.]]			; CHECK-NEXT: [[TMP0:%.]] = add i32 [[N]], -1, [[DBG9:!dbg !.]]
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64, [[DBG9]]			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64, [[DBG9]]
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1, [[DBG9]]			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1, [[DBG9]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3, [[DBG9]]			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3, [[DBG9]]
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]], [[DBG9]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]], [[DBG9]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1, [[DBG9]]			; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[A1]], [[B2]], [[DBG9]]
	; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64, [[DBG9]]			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP3]], 16, [[DBG9]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1, [[DBG9]]			; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]], [[DBG9]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[A:%.*]], i64 [[TMP5]], [[DBG9]]
	; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr float, float [[B:%.*]], i64 [[TMP5]], [[DBG9]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt float [[SCEVGEP4]], [[A]], [[DBG9]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt float [[SCEVGEP]], [[B]], [[DBG9]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]], [[DBG9]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]], [[DBG9]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934588, [[DBG9]]			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], 8589934588, [[DBG9]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]], [[DBG9]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]], [[DBG9]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [[DBG9]]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [[DBG9]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDEX]], [[DBG9]]			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDEX]], [[DBG9]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*, [[DBG9]]			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*, [[DBG9]]
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4, [[DBG9]], !alias.scope !10			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4, [[DBG9]]
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, [[DBG9]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, [[DBG9]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]], [[DBG9]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]], [[DBG9]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[TMP9]] to <4 x float>*, [[DBG9]]			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[TMP9]] to <4 x float>*, [[DBG9]]
	; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP10]], align 4, [[DBG9]], !alias.scope !13, !noalias !10			; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP10]], align 4, [[DBG9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4, [[DBG9]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4, [[DBG9]]
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]], [[DBG9]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]], [[DBG9]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[DBG9]], [[LOOP15:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[DBG9]], [[LOOP15:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]], [[DBG9]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]], [[DBG9]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]], [[DBG9]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]], [[DBG9]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ], [[DBG9]]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ], [[DBG9]]
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/runtime-checks-difference.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt %s -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S \| FileCheck %s		; RUN: opt %s -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S \| FileCheck %s

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

define void @same_step_and_size(i32* %a, i32* %b, i64 %n) {		define void @same_step_and_size(i32* %a, i32* %b, i64 %n) {
; CHECK-LABEL: @same_step_and_size(		; CHECK-LABEL: @same_step_and_size(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8		; CHECK-NEXT: [[A2:%.]] = ptrtoint i32 [[A:%.*]] to i64
; CHECK-NEXT: [[A3:%.]] = bitcast i32 [[A:%.]] to i8		; CHECK-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]		; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[B1]], [[A2]]
; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[A]], i64 [[N]]		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label %scalar.ph, label %vector.ph
; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*
; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP45]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %scalar.ph, label %vector.ph
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%gep.a = getelementptr inbounds i32, i32* %a, i64 %iv		%gep.a = getelementptr inbounds i32, i32* %a, i64 %iv
%l = load i32, i32* %gep.a		%l = load i32, i32* %gep.a
%mul = mul nsw i32 %l, 3		%mul = mul nsw i32 %l, 3
%gep.b = getelementptr inbounds i32, i32* %b, i64 %iv		%gep.b = getelementptr inbounds i32, i32* %b, i64 %iv
store i32 %mul, i32* %gep.b		store i32 %mul, i32* %gep.b
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, %n		%exitcond = icmp eq i64 %iv.next, %n
br i1 %exitcond, label %exit, label %loop		br i1 %exitcond, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @same_step_and_size_no_dominance_between_accesses(i32* %a, i32* %b, i64 %n, i64 %x) {		define void @same_step_and_size_no_dominance_between_accesses(i32* %a, i32* %b, i64 %n, i64 %x) {
; CHECK-LABEL: @same_step_and_size_no_dominance_between_accesses(		; CHECK-LABEL: @same_step_and_size_no_dominance_between_accesses(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8		; CHECK-NEXT: [[B2:%.]] = ptrtoint i32 [[B:%.*]] to i64
; CHECK-NEXT: [[A3:%.]] = bitcast i32 [[A:%.]] to i8		; CHECK-NEXT: [[A1:%.]] = ptrtoint i32 [[A:%.*]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]		; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[A1]], [[B2]]
; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[A]], i64 [[N]]		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label %scalar.ph, label %vector.ph
; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*
; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP45]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %scalar.ph, label %vector.ph
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
%cmp = icmp ne i64 %iv, %x		%cmp = icmp ne i64 %iv, %x
br i1 %cmp, label %then, label %else		br i1 %cmp, label %then, label %else
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

exit:		exit:
ret void		ret void
}		}

define void @steps_match_but_different_access_sizes_1([2 x i16]* %a, i32* %b, i64 %n) {		define void @steps_match_but_different_access_sizes_1([2 x i16]* %a, i32* %b, i64 %n) {
; CHECK-LABEL: @steps_match_but_different_access_sizes_1(		; CHECK-LABEL: @steps_match_but_different_access_sizes_1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8		; CHECK-NEXT: [[A2:%.]] = ptrtoint [2 x i16] [[A:%.*]] to i64
		; CHECK-NEXT: [[B1:%.]] = ptrtoint i32 [[B:%.*]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]		; CHECK-NEXT: [[TMP0:%.*]] = add nuw i64 [[A2]], 2
; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*		; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[B1]], [[TMP0]]
; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 1		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP1]], 16
; CHECK-NEXT: [[SCEVGEP34:%.]] = bitcast i16 [[SCEVGEP3]] to i8*		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label %scalar.ph, label %vector.ph
; CHECK-NEXT: [[SCEVGEP5:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0
; CHECK-NEXT: [[SCEVGEP56:%.]] = bitcast i16 [[SCEVGEP5]] to i8*
; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[B1]], [[SCEVGEP56]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[SCEVGEP34]], [[SCEVGEP2]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %scalar.ph, label %vector.ph
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%gep.a = getelementptr inbounds [2 x i16], [2 x i16]* %a, i64 %iv, i64 1		%gep.a = getelementptr inbounds [2 x i16], [2 x i16]* %a, i64 %iv, i64 1
%l = load i16, i16* %gep.a		%l = load i16, i16* %gep.a
Show All 9 Lines	exit:
ret void		ret void
}		}

; Same as @steps_match_but_different_access_sizes_1, but with source and sink		; Same as @steps_match_but_different_access_sizes_1, but with source and sink
; accesses flipped.		; accesses flipped.
define void @steps_match_but_different_access_sizes_2([2 x i16]* %a, i32* %b, i64 %n) {		define void @steps_match_but_different_access_sizes_2([2 x i16]* %a, i32* %b, i64 %n) {
; CHECK-LABEL: @steps_match_but_different_access_sizes_2(		; CHECK-LABEL: @steps_match_but_different_access_sizes_2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B4:%.]] = bitcast i32 [[B:%.]] to i8		; CHECK-NEXT: [[B2:%.]] = ptrtoint i32 [[B:%.*]] to i64
		; CHECK-NEXT: [[A1:%.]] = ptrtoint [2 x i16] [[A:%.*]] to i64
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %scalar.ph, label %vector.memcheck
; CHECK: vector.memcheck:		; CHECK: vector.memcheck:
; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr [2 x i16], [2 x i16] [[A:%.*]], i64 0, i64 1		; CHECK-NEXT: [[TMP0:%.*]] = add nuw i64 [[A1]], 2
; CHECK-NEXT: [[SCEVGEP1:%.]] = bitcast i16 [[SCEVGEP]] to i8*		; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[B2]]
; CHECK-NEXT: [[SCEVGEP2:%.]] = getelementptr [2 x i16], [2 x i16] [[A]], i64 [[N]], i64 0		; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP1]], 16
; CHECK-NEXT: [[SCEVGEP23:%.]] = bitcast i16 [[SCEVGEP2]] to i8*		; CHECK-NEXT: br i1 [[DIFF_CHECK]], label %scalar.ph, label %vector.ph
; CHECK-NEXT: [[SCEVGEP5:%.]] = getelementptr i32, i32 [[B]], i64 [[N]]
; CHECK-NEXT: [[SCEVGEP56:%.]] = bitcast i32 [[SCEVGEP5]] to i8*
; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[SCEVGEP1]], [[SCEVGEP56]]
; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[B4]], [[SCEVGEP23]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label %scalar.ph, label %vector.ph
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%gep.b = getelementptr inbounds i32, i32* %b, i64 %iv		%gep.b = getelementptr inbounds i32, i32* %b, i64 %iv
%l = load i32, i32* %gep.b		%l = load i32, i32* %gep.b
Show All 11 Lines

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

	Show All 11 Lines
	; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2			; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
	; CHECKUF1-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX4]]			; CHECKUF1-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX4]]
	; CHECKUF1: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf			; CHECKUF1: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

	; CHECKUF1: vector.body:			; CHECKUF1: vector.body:
	; CHECKUF1: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECKUF1: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECKUF1: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index			; CHECKUF1: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
	; CHECKUF1: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*			; CHECKUF1: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
	; CHECKUF1: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0			; CHECKUF1: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8
	; CHECKUF1: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)			; CHECKUF1: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF1: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index			; CHECKUF1: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
	; CHECKUF1: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*			; CHECKUF1: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
	; CHECKUF1: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0			; CHECKUF1: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8
	; CHECKUF1: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()			; CHECKUF1: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF1: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2			; CHECKUF1: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
	; CHECKUF1: %index.next = add nuw i64 %index, %[[VSCALEX4]]			; CHECKUF1: %index.next = add nuw i64 %index, %[[VSCALEX4]]
	; CHECKUF1: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec			; CHECKUF1: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
	; CHECKUF1: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5			; CHECKUF1: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !0


	; For an interleave factor of 2, vscale is scaled by 8 instead of 4 (and thus shifted left by 3 instead of 2).			; For an interleave factor of 2, vscale is scaled by 8 instead of 4 (and thus shifted left by 3 instead of 2).
	; There is also the increment for the next iteration, e.g. instead of indexing IDXB, it indexes at IDXB + vscale * 4.			; There is also the increment for the next iteration, e.g. instead of indexing IDXB, it indexes at IDXB + vscale * 4.

	; CHECKUF2: for.body.preheader:			; CHECKUF2: for.body.preheader:
	; CHECKUF2-DAG: %wide.trip.count = zext i32 %N to i64			; CHECKUF2-DAG: %wide.trip.count = zext i32 %N to i64
	; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()			; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3			; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX8]], %wide.trip.count			; CHECKUF2-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX8]], %wide.trip.count

	; CHECKUF2: vector.ph:			; CHECKUF2: vector.ph:
	; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()			; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3			; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX8]]			; CHECKUF2-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX8]]
	; CHECKUF2: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf			; CHECKUF2: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

	; CHECKUF2: vector.body:			; CHECKUF2: vector.body:
	; CHECKUF2: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECKUF2: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECKUF2: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index			; CHECKUF2: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
	; CHECKUF2: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*			; CHECKUF2: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
	; CHECKUF2: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0			; CHECKUF2: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8
	; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()			; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
	; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2			; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
	; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64			; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
	; CHECKUF2: %[[IDXB_NEXT:.]] = getelementptr inbounds double, double %[[IDXB]], i64 %[[VSCALE2_EXT]]			; CHECKUF2: %[[IDXB_NEXT:.]] = getelementptr inbounds double, double %[[IDXB]], i64 %[[VSCALE2_EXT]]
	; CHECKUF2: %[[IDXB_NEXT_CAST:.]] = bitcast double %[[IDXB_NEXT]] to <vscale x 4 x double>*			; CHECKUF2: %[[IDXB_NEXT_CAST:.]] = bitcast double %[[IDXB_NEXT]] to <vscale x 4 x double>*
	; CHECKUF2: %wide.load{{[0-9]+}} = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_NEXT_CAST]], align 8, !alias.scope !0			; CHECKUF2: %wide.load{{[0-9]+}} = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_NEXT_CAST]], align 8
	; CHECKUF2: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)			; CHECKUF2: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF2: %[[FADD_NEXT:.*]] = fadd <vscale x 4 x double> %wide.load{{[0-9]+}}, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)			; CHECKUF2: %[[FADD_NEXT:.*]] = fadd <vscale x 4 x double> %wide.load{{[0-9]+}}, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> poison, double 1.000000e+00, i32 0), <vscale x 4 x double> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF2: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index			; CHECKUF2: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
	; CHECKUF2: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*			; CHECKUF2: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
	; CHECKUF2: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0			; CHECKUF2: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8
	; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()			; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
	; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2			; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
	; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64			; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
	; CHECKUF2: %[[IDXA_NEXT:.]] = getelementptr inbounds double, double %[[IDXA]], i64 %[[VSCALE2_EXT]]			; CHECKUF2: %[[IDXA_NEXT:.]] = getelementptr inbounds double, double %[[IDXA]], i64 %[[VSCALE2_EXT]]
	; CHECKUF2: %[[IDXA_NEXT_CAST:.]] = bitcast double %[[IDXA_NEXT]] to <vscale x 4 x double>*			; CHECKUF2: %[[IDXA_NEXT_CAST:.]] = bitcast double %[[IDXA_NEXT]] to <vscale x 4 x double>*
	; CHECKUF2: store <vscale x 4 x double> %[[FADD_NEXT]], <vscale x 4 x double>* %[[IDXA_NEXT_CAST]], align 8, !alias.scope !3, !noalias !0			; CHECKUF2: store <vscale x 4 x double> %[[FADD_NEXT]], <vscale x 4 x double>* %[[IDXA_NEXT_CAST]], align 8
	; CHECKUF2: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()			; CHECKUF2: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3			; CHECKUF2: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2: %index.next = add nuw i64 %index, %[[VSCALEX8]]			; CHECKUF2: %index.next = add nuw i64 %index, %[[VSCALEX8]]
	; CHECKUF2: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec			; CHECKUF2: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
	; CHECKUF2: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5			; CHECKUF2: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !0

	define void @loop(i32 %N, double* nocapture %a, double* nocapture readonly %b) {			define void @loop(i32 %N, double* nocapture %a, double* nocapture readonly %b) {
	entry:			entry:
	%cmp7 = icmp sgt i32 %N, 0			%cmp7 = icmp sgt i32 %N, 0
	br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%wide.trip.count = zext i32 %N to i64			%wide.trip.count = zext i32 %N to i64
	Show All 19 Lines

llvm/test/Transforms/LoopVectorize/tbaa-nodep.ll

Show All 11 Lines

; CHECK: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa		; CHECK: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa
; CHECK: store <4 x i32> %{{.}}, <4 x i32> %{{.*}}, align 4, !tbaa		; CHECK: store <4 x i32> %{{.}}, <4 x i32> %{{.*}}, align 4, !tbaa

; CHECK: ret i32 0		; CHECK: ret i32 0

; CHECK-NOTBAA-LABEL: @test1		; CHECK-NOTBAA-LABEL: @test1
; CHECK-NOTBAA: entry:		; CHECK-NOTBAA: entry:
; CHECK-NOTBAA: icmp ugt i32*		; CHECK-NOTBAA: icmp ult i64
; CHECK-NOTBAA: icmp ugt float*
; CHECK-NOTBAA-NOT: icmp		; CHECK-NOTBAA-NOT: icmp
; CHECK-NOTBAA: br i1 {{.+}}, label %for.body, label %vector.body		; CHECK-NOTBAA: br i1 {{.+}}, label %for.body, label %vector.body

; CHECK-NOTBAA: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa		; CHECK-NOTBAA: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa
; CHECK-NOTBAA: store <4 x i32> %{{.}}, <4 x i32> %{{.*}}, align 4, !tbaa		; CHECK-NOTBAA: store <4 x i32> %{{.}}, <4 x i32> %{{.*}}, align 4, !tbaa

; CHECK-NOTBAA: ret i32 0		; CHECK-NOTBAA: ret i32 0

Show All 15 Lines	for.end: ; preds = %for.body
ret i32 0		ret i32 0
}		}

; This test is like the first, except here there is still one runtime check		; This test is like the first, except here there is still one runtime check
; required. Without TBAA, however, two checks are required.		; required. Without TBAA, however, two checks are required.
define i32 @test2(i32* nocapture readonly %a, float* nocapture readonly %b, float* nocapture %c) {		define i32 @test2(i32* nocapture readonly %a, float* nocapture readonly %b, float* nocapture %c) {
; CHECK-LABEL: @test2		; CHECK-LABEL: @test2
; CHECK: entry:		; CHECK: entry:
; CHECK: icmp ugt float*		; CHECK: icmp ult i64
; CHECK: icmp ugt float*		; CHECK-NOT: icmp
; CHECK-NOT: icmp uge i32*
; CHECK: br i1 {{.+}}, label %for.body, label %vector.body		; CHECK: br i1 {{.+}}, label %for.body, label %vector.body

; CHECK: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa		; CHECK: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa
; CHECK: store <4 x float> %{{.}}, <4 x float> %{{.*}}, align 4, !tbaa		; CHECK: store <4 x float> %{{.}}, <4 x float> %{{.*}}, align 4, !tbaa

; CHECK: ret i32 0		; CHECK: ret i32 0

; CHECK-NOTBAA-LABEL: @test2		; CHECK-NOTBAA-LABEL: @test2
; CHECK-NOTBAA: entry:		; CHECK-NOTBAA: entry:
; CHECK-NOTBAA: icmp ugt float*		; CHECK-NOTBAA: icmp ult i64
; CHECK-NOTBAA: icmp ugt float*		; CHECK-NOTBAA: icmp ult i64
; CHECK-NOTBAA-DAG: icmp ugt float*
; CHECK-NOTBAA-DAG: icmp ugt i32*
; CHECK-NOTBAA-NOT: icmp		; CHECK-NOTBAA-NOT: icmp
; CHECK-NOTBAA: br i1 {{.+}}, label %for.body, label %vector.body		; CHECK-NOTBAA: br i1 {{.+}}, label %for.body, label %vector.body

; CHECK-NOTBAA: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa		; CHECK-NOTBAA: load <4 x float>, <4 x float>* %{{.*}}, align 4, !tbaa
; CHECK-NOTBAA: store <4 x float> %{{.}}, <4 x float> %{{.*}}, align 4, !tbaa		; CHECK-NOTBAA: store <4 x float> %{{.}}, <4 x float> %{{.*}}, align 4, !tbaa

; CHECK-NOTBAA: ret i32 0		; CHECK-NOTBAA: ret i32 0

Show All 27 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

	Show All 35 Lines
	return: ; preds = %if.end3, %if.then2, %if.then			return: ; preds = %if.end3, %if.then2, %if.then
	%3 = load double, double* %retval, align 8			%3 = load double, double* %retval, align 8
	ret double %3			ret double %3
	}			}

	define void @loop(double* %X, double* %Y) {			define void @loop(double* %X, double* %Y) {
	; CHECK-LABEL: @loop(			; CHECK-LABEL: @loop(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[X:%.*]], i64 20000			; CHECK-NEXT: [[X6:%.]] = ptrtoint double [[X:%.*]] to i64
	; CHECK-NEXT: [[SCEVGEP9:%.]] = getelementptr double, double [[Y:%.*]], i64 20000			; CHECK-NEXT: [[Y7:%.]] = ptrtoint double [[Y:%.*]] to i64
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt double [[SCEVGEP9]], [[X]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[X6]], [[Y7]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt double [[SCEVGEP]], [[Y]]			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 32
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: br i1 [[DIFF_CHECK]], label [[FOR_BODY:%.]], label [[VECTOR_BODY:%.]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY:%.]], label [[VECTOR_BODY:%.]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[INDEX]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[INDEX]] to i64
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[Y]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[Y]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*
	; CHECK-NEXT: [[WIDE_LOAD11:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD11:%.]] = load <2 x double>, <2 x double> [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = fcmp olt <2 x double> [[WIDE_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = fcmp olt <2 x double> [[WIDE_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fcmp olt <2 x double> [[WIDE_LOAD11]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = fcmp olt <2 x double> [[WIDE_LOAD11]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <2 x double> [[WIDE_LOAD]], <double 6.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = fcmp ogt <2 x double> [[WIDE_LOAD]], <double 6.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = fcmp ogt <2 x double> [[WIDE_LOAD11]], <double 6.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP8:%.*]] = fcmp ogt <2 x double> [[WIDE_LOAD11]], <double 6.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP7]], <2 x double> <double 6.000000e+00, double 6.000000e+00>, <2 x double> [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP7]], <2 x double> <double 6.000000e+00, double 6.000000e+00>, <2 x double> [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP8]], <2 x double> <double 6.000000e+00, double 6.000000e+00>, <2 x double> [[WIDE_LOAD11]]			; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP8]], <2 x double> <double 6.000000e+00, double 6.000000e+00>, <2 x double> [[WIDE_LOAD11]]
	; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP5]], <2 x double> zeroinitializer, <2 x double> [[TMP9]]			; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP5]], <2 x double> zeroinitializer, <2 x double> [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP6]], <2 x double> zeroinitializer, <2 x double> [[TMP10]]			; CHECK-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP6]], <2 x double> zeroinitializer, <2 x double> [[TMP10]]
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[X]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[X]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP13]] to <2 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP13]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds double, double [[TMP13]], i64 2			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds double, double [[TMP13]], i64 2
	; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[TMP15]] to <2 x double>*			; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[TMP15]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP16]], align 8, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP16]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i32 [[INDEX_NEXT]], 20000			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i32 [[INDEX_NEXT]], 20000
	; CHECK-NEXT: br i1 [[TMP17]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[I_05:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_05]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_05]] to i64
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[X:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[X:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !alias.scope !8			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !alias.scope !3
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !11			; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !6
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !8, !noalias !10
	; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[TMP4]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[WIDE_LOAD15]]			; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[TMP4]], <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, <4 x float> [[WIDE_LOAD15]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = fadd <4 x float> [[TMP7]], [[TMP10]]			; CHECK-NEXT: [[PREDPHI:%.*]] = fadd <4 x float> [[TMP7]], [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !8, !noalias !10
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; CHECK-NEXT: br i1 [[TMP12]], label [[EXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[EXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: loop.body:			; CHECK: loop.body:
	; CHECK-NEXT: [[IV1:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[IV1:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[C_GEP:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[IV1]]			; CHECK-NEXT: [[C_GEP:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[IV1]]
	; CHECK-NEXT: [[C_LV:%.]] = load i32, i32 [[C_GEP]], align 4			; CHECK-NEXT: [[C_LV:%.]] = load i32, i32 [[C_GEP]], align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[C_LV]], 20			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[C_LV]], 20
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll

	Show All 11 Lines
	target triple = "x86_64-apple-macosx10.15.0"			target triple = "x86_64-apple-macosx10.15.0"

	define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {			define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {
	; CHECK-LABEL: @vdiv(			; CHECK-LABEL: @vdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
				; CHECK-NEXT: [[X4:%.]] = ptrtoint double [[X:%.*]] to i64
				; CHECK-NEXT: [[Y5:%.]] = ptrtoint double [[Y:%.*]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 16			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 16
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER18:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[X4]], [[Y5]]
	; CHECK: vector.memcheck:			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[X:%.*]], i64 [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]
	; CHECK-NEXT: [[SCEVGEP7:%.]] = getelementptr double, double [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[FOR_BODY_PREHEADER15:%.]], label [[VECTOR_PH:%.]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt double [[SCEVGEP7]], [[X]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt double [[SCEVGEP]], [[Y]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER18]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967280			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967280
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x double> poison, double [[A:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x double> poison, double [[A:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT12:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT12:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT13:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT12]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT13:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT12]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT14:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT14:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT15:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT14]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT15:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT14]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT16:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT16:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT17:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT16]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT17:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT16]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP0:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP0:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT13]]			; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT13]]
	; CHECK-NEXT: [[TMP2:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT15]]			; CHECK-NEXT: [[TMP2:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT15]]
	; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT17]]			; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT17]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP4]] to <4 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP4]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP5]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP5]], align 8, !tbaa [[TBAA3:![0-9]+]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP4]], i64 4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP4]], i64 4
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD9:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !tbaa [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD9:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds double, double [[TMP4]], i64 8			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds double, double [[TMP4]], i64 8
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP8]] to <4 x double>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP8]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD10:%.]] = load <4 x double>, <4 x double> [[TMP9]], align 8, !tbaa [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD10:%.]] = load <4 x double>, <4 x double> [[TMP9]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP4]], i64 12			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP4]], i64 12
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <4 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD11:%.]] = load <4 x double>, <4 x double> [[TMP11]], align 8, !tbaa [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD11:%.]] = load <4 x double>, <4 x double> [[TMP11]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP0]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP0]]
	; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <4 x double> [[WIDE_LOAD9]], [[TMP1]]			; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <4 x double> [[WIDE_LOAD9]], [[TMP1]]
	; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[WIDE_LOAD10]], [[TMP2]]			; CHECK-NEXT: [[TMP14:%.*]] = fmul fast <4 x double> [[WIDE_LOAD10]], [[TMP2]]
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD11]], [[TMP3]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD11]], [[TMP3]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*			; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP12]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP12]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double [[TMP16]], i64 4			; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double [[TMP16]], i64 4
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP18]] to <4 x double>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP18]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP13]], <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP13]], <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[TMP16]], i64 8			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[TMP16]], i64 8
	; CHECK-NEXT: [[TMP21:%.]] = bitcast double [[TMP20]] to <4 x double>*			; CHECK-NEXT: [[TMP21:%.]] = bitcast double [[TMP20]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP14]], <4 x double>* [[TMP21]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP14]], <4 x double>* [[TMP21]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[TMP16]], i64 12			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[TMP16]], i64 12
	; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP22]] to <4 x double>*			; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP22]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP23]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP23]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER18]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER15]]
	; CHECK: for.body.preheader18:			; CHECK: for.body.preheader15:
	; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[TMP25:%.*]] = xor i64 [[INDVARS_IV_PH]], -1			; CHECK-NEXT: [[TMP25:%.*]] = xor i64 [[INDVARS_IV_PH]], -1
	; CHECK-NEXT: [[TMP26:%.*]] = add nsw i64 [[TMP25]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[TMP26:%.*]] = add nsw i64 [[TMP25]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3			; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3
	; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0			; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL_PREHEADER:%.]]
	; CHECK: for.body.prol.preheader:			; CHECK: for.body.prol.preheader:
	; CHECK-NEXT: [[TMP27:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP27:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]			; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]
	; CHECK: for.body.prol:			; CHECK: for.body.prol:
	; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]
	; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_NEXT:%.]], [[FOR_BODY_PROL]] ], [ 0, [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_NEXT:%.]], [[FOR_BODY_PROL]] ], [ 0, [[FOR_BODY_PROL_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: [[T0_PROL:%.]] = load double, double [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_PROL:%.]] = load double, double [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP28:%.*]] = fmul fast double [[T0_PROL]], [[TMP27]]			; CHECK-NEXT: [[TMP28:%.*]] = fmul fast double [[T0_PROL]], [[TMP27]]
	; CHECK-NEXT: [[ARRAYIDX2_PROL:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX2_PROL:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: store double [[TMP28]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[TMP28]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1
	; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1			; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1
	; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_NEXT]], [[XTRAITER]]			; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_NEXT]], [[XTRAITER]]
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP14:![0-9]+]]			; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: for.body.prol.loopexit:			; CHECK: for.body.prol.loopexit:
	; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER18]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]			; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER15]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]
	; CHECK-NEXT: [[TMP29:%.*]] = icmp ult i64 [[TMP26]], 3			; CHECK-NEXT: [[TMP29:%.*]] = icmp ult i64 [[TMP26]], 3
	; CHECK-NEXT: br i1 [[TMP29]], label [[FOR_END]], label [[FOR_BODY_PREHEADER18_NEW:%.*]]			; CHECK-NEXT: br i1 [[TMP29]], label [[FOR_END]], label [[FOR_BODY_PREHEADER18_NEW:%.*]]
	; CHECK: for.body.preheader18.new:			; CHECK: for.body.preheader15.new:
	; CHECK-NEXT: [[TMP30:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP30:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP31:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP31:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP32:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP32:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP33:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP33:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER18_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER18_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV]]
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LAA,LV] Add initial support for pointer-diff memory checks.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 429711

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-qabs.ll

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/fpsat.ll

llvm/test/Transforms/LoopVectorize/multiple-exits-versioning.ll

llvm/test/Transforms/LoopVectorize/no_outside_user.ll

llvm/test/Transforms/LoopVectorize/runtime-check-readonly.ll

llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll

llvm/test/Transforms/LoopVectorize/runtime-check.ll

llvm/test/Transforms/LoopVectorize/runtime-checks-difference.ll

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

llvm/test/Transforms/LoopVectorize/tbaa-nodep.ll

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll

[LAA,LV] Add initial support for pointer-diff memory checks.
ClosedPublic