This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/RISCV/
-
Transforms/
-
LoopVectorize/
-
RISCV/
-
scalable-basics.ll

Differential D128542

[LV] Allow scalable vectorization with vscale = 1
ClosedPublic

Authored by reames on Jun 24 2022, 11:27 AM.

Download Raw Diff

Details

Reviewers

craig.topper
fhahn
hfinkel
david-arm
sdesmalen

Commits

rG20dd3297b1c0: [LV] Allow scalable vectorization with vscale = 1

Summary

This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

If folks don't like the unchecked assumption above, I can go ahead and add the min-vscale check here. That would at least gets us the most common +v extension.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jun 24 2022, 11:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 11:27 AM

Herald added subscribers: luke957, StephenFan, frasercrmck and 24 others. · View Herald Transcript

reames requested review of this revision.Jun 24 2022, 11:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 11:27 AM

Herald added subscribers: alextsao1999, • pcwang-thead, MaskRay. · View Herald Transcript

reames edited the summary of this revision. (Show Details)Jun 24 2022, 11:31 AM

Harbormaster completed remote builds in B171900: Diff 439825.Jun 24 2022, 12:23 PM

This makes sense to me.

craig.topper added a reviewer: sdesmalen.Jun 24 2022, 11:15 PM

Thanks for fixing this @reames. The approach seems sensible, because vscale is likely to be larger than 1 so it shouldn't conservatively assume scalarisation will happen at this point in the LV. If vscale could be 1 at runtime and the codegen for <vscale x 1 x eltty> is less efficient than for (scalar) eltty, then the cost-model should probably reflect that. Note that you can distinguish the vscale value to tune for with TargetTransformInfo::getVScaleForTuning() (which is different from the vscale ranges the generated code will run on (albeit possibly inefficiently), which it gets from the vscale_range attribute). For Arm we've implemented this TTI function to use the information from -mcpu.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6719–6721	I think this comment can be removed. If this is ever the case, the LV shouldn't be using scalable vectors, but fixed-size vectors instead and leave it to the code-generator to choose the right register class and instructions. This is actually what we do for SVE when we compile for a specific vector-width; we vectorize using fixed-width vectors and map them to SVE registers instead of NEON.
6725	nit: unnecessary curly braces.

This revision is now accepted and ready to land.Jun 27 2022, 3:00 AM

This revision was landed with ongoing or failed builds.Jun 27 2022, 1:39 PM

Closed by commit rG20dd3297b1c0: [LV] Allow scalable vectorization with vscale = 1 (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG20dd3297b1c0: [LV] Allow scalable vectorization with vscale = 1.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

15 lines

test/

Transforms/

LoopVectorize/

RISCV/

scalable-basics.ll

365 lines

Diff 440367

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,706 Lines • ▼ Show 20 Lines	if (InstSet.count(I))
false);		false);
}		}

Type *VectorTy;		Type *VectorTy;
InstructionCost C = getInstructionCost(I, VF, VectorTy);		InstructionCost C = getInstructionCost(I, VF, VectorTy);

bool TypeNotScalarized = false;		bool TypeNotScalarized = false;
if (VF.isVector() && VectorTy->isVectorTy()) {		if (VF.isVector() && VectorTy->isVectorTy()) {
unsigned NumParts = TTI.getNumberOfParts(VectorTy);		if (unsigned NumParts = TTI.getNumberOfParts(VectorTy)) {
if (NumParts)		if (VF.isScalable())
TypeNotScalarized = NumParts < VF.getKnownMinValue();		// <vscale x 1 x iN> is assumed to be profitable over iN because
		// scalable registers are a distinct register class from scalar ones.
		// If we ever find a target which wants to lower scalable vectors
		// back to scalars, we'll need to update this code to explicitly
		// ask TTI about the register class uses for each part.
		sdesmalenUnsubmitted Not Done Reply Inline Actions I think this comment can be removed. If this is ever the case, the LV shouldn't be using scalable vectors, but fixed-size vectors instead and leave it to the code-generator to choose the right register class and instructions. This is actually what we do for SVE when we compile for a specific vector-width; we vectorize using fixed-width vectors and map them to SVE registers instead of NEON. sdesmalen: I think this comment can be removed. If this is ever the case, the LV shouldn't be using…
		TypeNotScalarized = NumParts <= VF.getKnownMinValue();
else		else
		TypeNotScalarized = NumParts < VF.getKnownMinValue();
		} else
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: unnecessary curly braces. sdesmalen: nit: unnecessary curly braces.
C = InstructionCost::getInvalid();		C = InstructionCost::getInvalid();
}		}
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,		LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,
ElementCount VF) const {		ElementCount VF) const {
▲ Show 20 Lines • Show All 4,043 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll

	Show All 9 Lines
	; exercises the default heuristics in a useful way.			; exercises the default heuristics in a useful way.

	target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"			target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
	target triple = "riscv64"			target triple = "riscv64"

	define void @vector_add(ptr noalias nocapture %a, i64 %v, i64 %n) {			define void @vector_add(ptr noalias nocapture %a, i64 %v, i64 %n) {
	; VLENUNK-LABEL: @vector_add(			; VLENUNK-LABEL: @vector_add(
	; VLENUNK-NEXT: entry:			; VLENUNK-NEXT: entry:
				; VLENUNK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLENUNK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; VLENUNK: vector.ph:
				; VLENUNK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLENUNK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; VLENUNK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; VLENUNK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; VLENUNK-NEXT: br label [[VECTOR_BODY:%.*]]
				; VLENUNK: vector.body:
				; VLENUNK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; VLENUNK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; VLENUNK-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; VLENUNK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; VLENUNK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 8
				; VLENUNK-NEXT: [[TMP5:%.*]] = add <vscale x 1 x i64> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
				; VLENUNK-NEXT: store <vscale x 1 x i64> [[TMP5]], ptr [[TMP4]], align 8
				; VLENUNK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
				; VLENUNK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; VLENUNK: middle.block:
				; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; VLENUNK: scalar.ph:
				; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[ADD:%.]] = add i64 [[ELEM]], [[V:%.]]			; VLENUNK-NEXT: [[ADD:%.*]] = add i64 [[ELEM]], [[V]]
	; VLENUNK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: ret void			; VLENUNK-NEXT: ret void
	;			;
	; VLEN128-LABEL: @vector_add(			; VLEN128-LABEL: @vector_add(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLEN128-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VLEN128: vector.ph:			; VLEN128: vector.ph:
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; VLEN128-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0			; VLEN128-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; VLEN128-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT2]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]			; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]
	; VLEN128: vector.body:			; VLEN128: vector.body:
	; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VLEN128-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; VLEN128-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; VLEN128-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; VLEN128-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
	; VLEN128-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
	; VLEN128-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; VLEN128-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 8
	; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0			; VLEN128-NEXT: [[TMP5:%.*]] = add <vscale x 1 x i64> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
	; VLEN128-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP4]], align 8			; VLEN128-NEXT: store <vscale x 1 x i64> [[TMP5]], ptr [[TMP4]], align 8
	; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2			; VLEN128-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[WIDE_LOAD1:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8			; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
	; VLEN128-NEXT: [[TMP6:%.*]] = add <2 x i64> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]			; VLEN128-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLEN128-NEXT: [[TMP7:%.*]] = add <2 x i64> [[WIDE_LOAD1]], [[BROADCAST_SPLAT3]]			; VLEN128-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; VLEN128-NEXT: store <2 x i64> [[TMP6]], ptr [[TMP4]], align 8
	; VLEN128-NEXT: store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8
	; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VLEN128-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; VLEN128: middle.block:			; VLEN128: middle.block:
	; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLEN128: scalar.ph:			; VLEN128: scalar.ph:
	; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLEN128-NEXT: br label [[FOR_BODY:%.*]]			; VLEN128-NEXT: br label [[FOR_BODY:%.*]]
	; VLEN128: for.body:			; VLEN128: for.body:
	; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLEN128-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[ADD:%.*]] = add i64 [[ELEM]], [[V]]			; VLEN128-NEXT: [[ADD:%.*]] = add i64 [[ELEM]], [[V]]
	; VLEN128-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; VLENUNK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vscale.i32()			; VLENUNK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vscale.i32()
	; VLENUNK-NEXT: [[TMP19:%.*]] = mul i32 [[TMP18]], 2			; VLENUNK-NEXT: [[TMP19:%.*]] = mul i32 [[TMP18]], 2
	; VLENUNK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 [[TMP19]]			; VLENUNK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i32 [[TMP19]]
	; VLENUNK-NEXT: store <vscale x 2 x i32> [[TMP17]], ptr [[TMP20]], align 4			; VLENUNK-NEXT: store <vscale x 2 x i32> [[TMP17]], ptr [[TMP20]], align 4
	; VLENUNK-NEXT: [[TMP21:%.*]] = call i64 @llvm.vscale.i64()			; VLENUNK-NEXT: [[TMP21:%.*]] = call i64 @llvm.vscale.i64()
	; VLENUNK-NEXT: [[TMP22:%.*]] = mul i64 [[TMP21]], 4			; VLENUNK-NEXT: [[TMP22:%.*]] = mul i64 [[TMP21]], 4
	; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP22]]			; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP22]]
	; VLENUNK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; VLENUNK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLENUNK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; VLENUNK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; VLENUNK: middle.block:			; VLENUNK: middle.block:
	; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLENUNK: scalar.ph:			; VLENUNK: scalar.ph:
	; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: [[ELEM:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; VLENUNK-NEXT: [[ELEM:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; VLENUNK-NEXT: [[ADD:%.*]] = add i32 [[ELEM]], [[V]]			; VLENUNK-NEXT: [[ADD:%.*]] = add i32 [[ELEM]], [[V]]
	; VLENUNK-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX]], align 4			; VLENUNK-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: ret void			; VLENUNK-NEXT: ret void
	;			;
	; VLEN128-LABEL: @vector_add_i32(			; VLEN128-LABEL: @vector_add_i32(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4			; VLEN128-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
	; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]			; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
	▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret i64 %sum.next			ret i64 %sum.next
	}			}

	define void @splat_int(ptr noalias nocapture %a, i64 %v, i64 %n) {			define void @splat_int(ptr noalias nocapture %a, i64 %v, i64 %n) {
	; VLENUNK-LABEL: @splat_int(			; VLENUNK-LABEL: @splat_int(
	; VLENUNK-NEXT: entry:			; VLENUNK-NEXT: entry:
				; VLENUNK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLENUNK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; VLENUNK: vector.ph:
				; VLENUNK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLENUNK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; VLENUNK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; VLENUNK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; VLENUNK-NEXT: br label [[VECTOR_BODY:%.*]]
				; VLENUNK: vector.body:
				; VLENUNK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; VLENUNK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; VLENUNK-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; VLENUNK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; VLENUNK-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
				; VLENUNK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; VLENUNK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
				; VLENUNK: middle.block:
				; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; VLENUNK: scalar.ph:
				; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: store i64 [[V:%.*]], ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: ret void			; VLENUNK-NEXT: ret void
	;			;
	; VLEN128-LABEL: @splat_int(			; VLEN128-LABEL: @splat_int(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLEN128-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VLEN128: vector.ph:			; VLEN128: vector.ph:
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; VLEN128-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0			; VLEN128-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; VLEN128-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]			; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]
	; VLEN128: vector.body:			; VLEN128: vector.body:
	; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VLEN128-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; VLEN128-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; VLEN128-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; VLEN128-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
	; VLEN128-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
	; VLEN128-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; VLEN128-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
	; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0			; VLEN128-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8			; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
	; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2			; VLEN128-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VLEN128-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; VLEN128-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; VLEN128: middle.block:			; VLEN128: middle.block:
	; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLEN128: scalar.ph:			; VLEN128: scalar.ph:
	; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLEN128-NEXT: br label [[FOR_BODY:%.*]]			; VLEN128-NEXT: br label [[FOR_BODY:%.*]]
	; VLEN128: for.body:			; VLEN128: for.body:
	; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; VLEN128-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	Show All 13 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @splat_ptr(ptr noalias nocapture %a, ptr %v, i64 %n) {			define void @splat_ptr(ptr noalias nocapture %a, ptr %v, i64 %n) {
	; VLENUNK-LABEL: @splat_ptr(			; VLENUNK-LABEL: @splat_ptr(
	; VLENUNK-NEXT: entry:			; VLENUNK-NEXT: entry:
				; VLENUNK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLENUNK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; VLENUNK: vector.ph:
				; VLENUNK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLENUNK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; VLENUNK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x ptr> poison, ptr [[V:%.]], i32 0
				; VLENUNK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x ptr> [[BROADCAST_SPLATINSERT]], <vscale x 1 x ptr> poison, <vscale x 1 x i32> zeroinitializer
				; VLENUNK-NEXT: br label [[VECTOR_BODY:%.*]]
				; VLENUNK: vector.body:
				; VLENUNK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; VLENUNK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; VLENUNK-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; VLENUNK-NEXT: [[TMP4:%.*]] = getelementptr inbounds ptr, ptr [[TMP3]], i32 0
				; VLENUNK-NEXT: store <vscale x 1 x ptr> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
				; VLENUNK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; VLENUNK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; VLENUNK: middle.block:
				; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; VLENUNK: scalar.ph:
				; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: store ptr [[V:%.*]], ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: store ptr [[V]], ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: ret void			; VLENUNK-NEXT: ret void
	;			;
	; VLEN128-LABEL: @splat_ptr(			; VLEN128-LABEL: @splat_ptr(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLEN128-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VLEN128: vector.ph:			; VLEN128: vector.ph:
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x ptr> poison, ptr [[V:%.]], i32 0			; VLEN128-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x ptr> [[BROADCAST_SPLATINSERT]], <2 x ptr> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x ptr> poison, ptr [[V]], i32 0			; VLEN128-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; VLEN128-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x ptr> [[BROADCAST_SPLATINSERT1]], <2 x ptr> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x ptr> poison, ptr [[V:%.]], i32 0
				; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x ptr> [[BROADCAST_SPLATINSERT]], <vscale x 1 x ptr> poison, <vscale x 1 x i32> zeroinitializer
	; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]			; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]
	; VLEN128: vector.body:			; VLEN128: vector.body:
	; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VLEN128-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; VLEN128-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; VLEN128-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; VLEN128-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
	; VLEN128-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds ptr, ptr [[TMP3]], i32 0
	; VLEN128-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; VLEN128-NEXT: store <vscale x 1 x ptr> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
	; VLEN128-NEXT: [[TMP4:%.*]] = getelementptr inbounds ptr, ptr [[TMP2]], i32 0			; VLEN128-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: store <2 x ptr> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8			; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
	; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds ptr, ptr [[TMP2]], i32 2			; VLEN128-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLEN128-NEXT: store <2 x ptr> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VLEN128-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; VLEN128-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; VLEN128: middle.block:			; VLEN128: middle.block:
	; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLEN128: scalar.ph:			; VLEN128: scalar.ph:
	; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLEN128-NEXT: br label [[FOR_BODY:%.*]]			; VLEN128-NEXT: br label [[FOR_BODY:%.*]]
	; VLEN128: for.body:			; VLEN128: for.body:
	; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLEN128-NEXT: store ptr [[V]], ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: store ptr [[V]], ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]			; VLEN128-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define i64 @uniform_load(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %n) {			define i64 @uniform_load(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %n) {
	; VLENUNK-LABEL: @uniform_load(			; VLENUNK-LABEL: @uniform_load(
	; VLENUNK-NEXT: entry:			; VLENUNK-NEXT: entry:
				; VLENUNK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLENUNK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; VLENUNK: vector.ph:
				; VLENUNK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLENUNK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; VLENUNK-NEXT: br label [[VECTOR_BODY:%.*]]
				; VLENUNK: vector.body:
				; VLENUNK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; VLENUNK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; VLENUNK-NEXT: [[TMP3:%.]] = load i64, ptr [[B:%.]], align 8
				; VLENUNK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP3]], i32 0
				; VLENUNK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; VLENUNK-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; VLENUNK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; VLENUNK-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP5]], align 8
				; VLENUNK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
				; VLENUNK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
				; VLENUNK: middle.block:
				; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; VLENUNK: scalar.ph:
				; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[V:%.]] = load i64, ptr [[B:%.]], align 8			; VLENUNK-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 8
	; VLENUNK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
	; VLENUNK-NEXT: ret i64 [[V_LCSSA]]			; VLENUNK-NEXT: ret i64 [[V_LCSSA]]
	;			;
	; VLEN128-LABEL: @uniform_load(			; VLEN128-LABEL: @uniform_load(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLEN128-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VLEN128: vector.ph:			; VLEN128: vector.ph:
				; VLEN128-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLEN128-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]			; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]
	; VLEN128: vector.body:			; VLEN128: vector.body:
	; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VLEN128-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; VLEN128-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; VLEN128-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; VLEN128-NEXT: [[TMP3:%.]] = load i64, ptr [[B:%.]], align 8
	; VLEN128-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 8			; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP3]], i32 0
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0			; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
	; VLEN128-NEXT: [[TMP3:%.*]] = load i64, ptr [[B]], align 8			; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0			; VLEN128-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP5]], align 8
	; VLEN128-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
	; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; VLEN128-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLEN128-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0			; VLEN128-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 8
	; VLEN128-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 2
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP7]], align 8
	; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VLEN128-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; VLEN128: middle.block:			; VLEN128: middle.block:
	; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLEN128: scalar.ph:			; VLEN128: scalar.ph:
	; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLEN128-NEXT: br label [[FOR_BODY:%.*]]			; VLEN128-NEXT: br label [[FOR_BODY:%.*]]
	; VLEN128: for.body:			; VLEN128: for.body:
	; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLEN128-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 8			; VLEN128-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 8
	; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	Show All 16 Lines

	for.end:			for.end:
	ret i64 %v			ret i64 %v
	}			}

	define i64 @uniform_load_unaligned(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %n) {			define i64 @uniform_load_unaligned(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %n) {
	; VLENUNK-LABEL: @uniform_load_unaligned(			; VLENUNK-LABEL: @uniform_load_unaligned(
	; VLENUNK-NEXT: entry:			; VLENUNK-NEXT: entry:
				; VLENUNK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLENUNK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; VLENUNK: vector.ph:
				; VLENUNK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLENUNK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; VLENUNK-NEXT: br label [[VECTOR_BODY:%.*]]
				; VLENUNK: vector.body:
				; VLENUNK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; VLENUNK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; VLENUNK-NEXT: [[TMP3:%.]] = load i64, ptr [[B:%.]], align 1
				; VLENUNK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP3]], i32 0
				; VLENUNK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; VLENUNK-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; VLENUNK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; VLENUNK-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP5]], align 8
				; VLENUNK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
				; VLENUNK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
				; VLENUNK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
				; VLENUNK: middle.block:
				; VLENUNK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; VLENUNK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; VLENUNK: scalar.ph:
				; VLENUNK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]			; VLENUNK-NEXT: br label [[FOR_BODY:%.*]]
	; VLENUNK: for.body:			; VLENUNK: for.body:
	; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLENUNK-NEXT: [[V:%.]] = load i64, ptr [[B:%.]], align 1			; VLENUNK-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 1
	; VLENUNK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; VLENUNK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLENUNK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; VLENUNK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLENUNK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLENUNK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; VLENUNK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	; VLENUNK: for.end:			; VLENUNK: for.end:
	; VLENUNK-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ]			; VLENUNK-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
	; VLENUNK-NEXT: ret i64 [[V_LCSSA]]			; VLENUNK-NEXT: ret i64 [[V_LCSSA]]
	;			;
	; VLEN128-LABEL: @uniform_load_unaligned(			; VLEN128-LABEL: @uniform_load_unaligned(
	; VLEN128-NEXT: entry:			; VLEN128-NEXT: entry:
	; VLEN128-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VLEN128-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; VLEN128-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VLEN128: vector.ph:			; VLEN128: vector.ph:
				; VLEN128-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; VLEN128-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; VLEN128-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]			; VLEN128-NEXT: br label [[VECTOR_BODY:%.*]]
	; VLEN128: vector.body:			; VLEN128: vector.body:
	; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; VLEN128-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VLEN128-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; VLEN128-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
	; VLEN128-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; VLEN128-NEXT: [[TMP3:%.]] = load i64, ptr [[B:%.]], align 1
	; VLEN128-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 1			; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP3]], i32 0
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0			; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; VLEN128-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
	; VLEN128-NEXT: [[TMP3:%.*]] = load i64, ptr [[B]], align 1			; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
	; VLEN128-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0			; VLEN128-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP5]], align 8
	; VLEN128-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; VLEN128-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; VLEN128-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
	; VLEN128-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; VLEN128-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VLEN128-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0			; VLEN128-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 8
	; VLEN128-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 2
	; VLEN128-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP7]], align 8
	; VLEN128-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VLEN128-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; VLEN128-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; VLEN128: middle.block:			; VLEN128: middle.block:
	; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; VLEN128-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VLEN128-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VLEN128: scalar.ph:			; VLEN128: scalar.ph:
	; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VLEN128-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VLEN128-NEXT: br label [[FOR_BODY:%.*]]			; VLEN128-NEXT: br label [[FOR_BODY:%.*]]
	; VLEN128: for.body:			; VLEN128: for.body:
	; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VLEN128-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VLEN128-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 1			; VLEN128-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 1
	; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VLEN128-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; VLEN128-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VLEN128-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; VLEN128-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	Show All 20 Lines