This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1
LoopVectorizationLegality.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
loop-form.ll
-
loop-legality-checks.ll
-
remarks-multi-exit-loops.ll

Differential D105817

[LV] Enable vectorization of multiple exit loops w/computable exit counts
ClosedPublic

Authored by reames on Jul 12 2021, 7:56 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
anna

Commits

rG95346ba87740: [LV] Enable vectorization of multiple exit loops w/computable exit counts

Summary

This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jul 12 2021, 7:56 AM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald TranscriptJul 12 2021, 7:56 AM

reames requested review of this revision.Jul 12 2021, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 7:56 AM

Harbormaster completed remote builds in B113508: Diff 357944.Jul 12 2021, 8:30 AM

ping

LGTM. Thanks for getting this finally for LV :) Perhaps wait for other reviewers a day or so?

Didn't see any target specific LV tests failing in the list of "failed tests" above. so looks like we're fine.

LGTM, thanks Philip!

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1133	IIUC the update logic for phis has been generalised already.

This revision is now accepted and ready to land.Jul 15 2021, 7:08 AM

This revision was landed with ongoing or failed builds.Jul 15 2021, 8:53 AM

Closed by commit rG95346ba87740: [LV] Enable vectorization of multiple exit loops w/computable exit counts (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG95346ba87740: [LV] Enable vectorization of multiple exit loops w/computable exit counts.

@reames This commit has broken our 2 stage A64FX SVE bot. (sorry for the late report, took me a while to bisect it)

https://lab.llvm.org/buildbot/#/builders/176/builds/88

This bot builds stage 1 with -mcpu=a64fx then uses that to build stage 2 with flags -mcpu=a64fx -mllvm -aarch64-sve-vector-bits-min=512 and doing so leads to this error from clang-tblgen:

[3689/7201] Building AbstractBasicReader.inc...
FAILED: tools/clang/include/clang/AST/AbstractBasicReader.inc 
cd /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2 && /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/bin/clang-tblgen -gen-clang-basic-reader -I /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/tools/clang/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/stage2/include -I/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/include /home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST/PropertiesBase.td --write-if-changed -o tools/clang/include/clang/AST/AbstractBasicReader.inc -d tools/clang/include/clang/AST/AbstractBasicReader.inc.d
/home/tcwg-buildslave/worker/clang-aarch64-sve-vls-2stage/llvm/clang/include/clang/AST/PropertiesBase.td:362:3: error: creation code for Array doesn't refer to property "totalLength"
  def : Creator<[{
  ^

I haven't seen this on any other bot, the code itself seems fine (though first time reading it myself) and if I remove -mllvm -aarch64-sve-vector-bits-min=512 the build works. So I assume what's happening is something is being vectorised incorrectly, or doing so is uncovering a different issue. It could be particular to SVE.

Obviously "build all of llvm and some of clang" isn't a great reproducer so I'm going to see if I can emit some vectorisation info with the names/locations of the functions that were previously not vectorised and now are.

Any other ideas welcome.

In D105817#2896422, @DavidSpickett wrote:

Any other ideas welcome.

@DavidSpickett - I took a skim through the SVE code to see if anything obvious fell out, but I didn't spot anything. Unfortunately, you've got a combination of an architecture I'm unfamiliar with, and an IR feature I'm unfamiliar with. I don't think I'll be able to help you much without a reproducer and some context.

A couple ideas for you:

If you use smaller min bit widths, do you still see the problem?
Does setting that flag change the number of loops which get vectorizer? If not, it must change *how* they get vectorized.
Is there a known bug with scalable vectorization and epilogue loops? This change causes a lot more epilogue loops to be generated.
If you change the control knob for preferring fixed vs scalable with SVE registers, does the symptom change?

Does setting that flag change the number of loops which get vectorizer? If not, it must change *how* they get vectorized.

I found a single loop in StringRef::find was vectorised with this patch applied. I've opened https://bugs.llvm.org/show_bug.cgi?id=51182 to track me investigating that.

That doesn't tell us if it's an SVE specific issue so I will look at the IR produced then at the SVE assembly itself, with some of the steps you suggested.

Oh and I'll update the bot to workaround this issue until we can find the cause.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

15 lines

test/

Transforms/

LoopVectorize/

loop-form.ll

120 lines

loop-legality-checks.ll

17 lines

remarks-multi-exit-loops.ll

2 lines

Diff 358997

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,119 Lines • ▼ Show 20 Lines	reportVectorizationFailure("The loop must have a single backedge",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

// We currently must have a single "exit block" after the loop. Note that
// multiple "exiting blocks" inside the loop are allowed, provided they all
// reach the single exit block.
// TODO: This restriction can be relaxed in the near future, it's here solely
// to allow separation of changes for review. We need to generalize the phi
// update logic in a number of places.
fhahnUnsubmitted Not Done Reply Inline Actions IIUC the update logic for phis has been generalised already. fhahn: IIUC the update logic for phis has been generalised already.
if (!Lp->getUniqueExitBlock()) {
reportVectorizationFailure("The loop must have a unique exit block",
"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)
Result = false;
else
return false;
}
return Result;		return Result;
}		}

bool LoopVectorizationLegality::canVectorizeLoopNestCFG(		bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
Loop *Lp, bool UseVPlanNativePath) {		Loop *Lp, bool UseVPlanNativePath) {
// Store the result and return it at the end instead of exiting early, in case		// Store the result and return it at the end instead of exiting early, in case
// allowExtraAnalysis is used to report multiple reasons for not vectorizing.		// allowExtraAnalysis is used to report multiple reasons for not vectorizing.
bool Result = true;		bool Result = true;
▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-form.ll

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	if.end:
%exit = phi i32 [0, %for.cond], [1, %for.body]		%exit = phi i32 [0, %for.cond], [1, %for.body]
ret i32 %exit		ret i32 %exit
}		}

; multiple exits w/distinct target blocks		; multiple exits w/distinct target blocks
define i32 @multiple_exit_blocks(i16* %p, i32 %n) {		define i32 @multiple_exit_blocks(i16* %p, i32 %n) {
; CHECK-LABEL: @multiple_exit_blocks(		; CHECK-LABEL: @multiple_exit_blocks(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 0)
		; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SMAX]], i32 2096)
		; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i32 [[UMIN]], 1
		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP0]], 2
		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 2
		; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 2, i32 [[N_MOD_VF]]
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[TMP2]]
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 0
		; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 1
		; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[TMP3]] to i64
		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP6]], i32 0
		; CHECK-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <2 x i16>*
		; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP8]], align 4
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
		; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: br label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; CHECK-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; CHECK: for.cond:
; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]		; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64		; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]		; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
; CHECK-NEXT: store i16 0, i16* [[B]], align 4		; CHECK-NEXT: store i16 0, i16* [[B]], align 4
; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1		; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096		; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]		; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
; CHECK: if.end2:		; CHECK: if.end2:
; CHECK-NEXT: ret i32 1		; CHECK-NEXT: ret i32 1
;		;
; TAILFOLD-LABEL: @multiple_exit_blocks(		; TAILFOLD-LABEL: @multiple_exit_blocks(
; TAILFOLD-NEXT: entry:		; TAILFOLD-NEXT: entry:
; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]		; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
Show All 35 Lines
if.end2:		if.end2:
ret i32 1		ret i32 1
}		}

; LCSSA, common value each exit		; LCSSA, common value each exit
define i32 @multiple_exit_blocks2(i16* %p, i32 %n) {		define i32 @multiple_exit_blocks2(i16* %p, i32 %n) {
; CHECK-LABEL: @multiple_exit_blocks2(		; CHECK-LABEL: @multiple_exit_blocks2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 0)
		; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SMAX]], i32 2096)
		; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i32 [[UMIN]], 1
		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP0]], 2
		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 2
		; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 2, i32 [[N_MOD_VF]]
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[TMP2]]
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 0
		; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 1
		; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[TMP3]] to i64
		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP6]], i32 0
		; CHECK-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <2 x i16>*
		; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP8]], align 4
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
		; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: br label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; CHECK-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; CHECK: for.cond:
; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]		; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64		; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]		; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
; CHECK-NEXT: store i16 0, i16* [[B]], align 4		; CHECK-NEXT: store i16 0, i16* [[B]], align 4
; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1		; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096		; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]		; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]], !llvm.loop [[LOOP15:![0-9]+]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]		; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]
; CHECK-NEXT: ret i32 [[I_LCSSA]]		; CHECK-NEXT: ret i32 [[I_LCSSA]]
; CHECK: if.end2:		; CHECK: if.end2:
; CHECK-NEXT: [[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ]		; CHECK-NEXT: [[I_LCSSA1:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ]
; CHECK-NEXT: ret i32 [[I_LCSSA1]]		; CHECK-NEXT: ret i32 [[I_LCSSA1]]
;		;
; TAILFOLD-LABEL: @multiple_exit_blocks2(		; TAILFOLD-LABEL: @multiple_exit_blocks2(
Show All 39 Lines
if.end2:		if.end2:
ret i32 %i		ret i32 %i
}		}

; LCSSA, distinct value each exit		; LCSSA, distinct value each exit
define i32 @multiple_exit_blocks3(i16* %p, i32 %n) {		define i32 @multiple_exit_blocks3(i16* %p, i32 %n) {
; CHECK-LABEL: @multiple_exit_blocks3(		; CHECK-LABEL: @multiple_exit_blocks3(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 0)
		; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SMAX]], i32 2096)
		; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i32 [[UMIN]], 1
		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP0]], 2
		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 2
		; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 2, i32 [[N_MOD_VF]]
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[TMP2]]
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 0
		; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 1
		; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[TMP3]] to i64
		; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[TMP6]], i32 0
		; CHECK-NEXT: [[TMP8:%.]] = bitcast i16 [[TMP7]] to <2 x i16>*
		; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP8]], align 4
		; CHECK-NEXT: [[TMP9:%.*]] = add nsw <2 x i32> [[VEC_IND]], <i32 1, i32 1>
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
		; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: br label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; CHECK-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; CHECK: for.cond:
; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]		; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64		; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]		; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
; CHECK-NEXT: store i16 0, i16* [[B]], align 4		; CHECK-NEXT: store i16 0, i16* [[B]], align 4
; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1		; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096		; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]		; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]], !llvm.loop [[LOOP17:![0-9]+]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]		; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ]
; CHECK-NEXT: ret i32 [[I_LCSSA]]		; CHECK-NEXT: ret i32 [[I_LCSSA]]
; CHECK: if.end2:		; CHECK: if.end2:
; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i32 [ [[INC]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i32 [ [[INC]], [[FOR_BODY]] ]
; CHECK-NEXT: ret i32 [[INC_LCSSA]]		; CHECK-NEXT: ret i32 [[INC_LCSSA]]
;		;
; TAILFOLD-LABEL: @multiple_exit_blocks3(		; TAILFOLD-LABEL: @multiple_exit_blocks3(
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 1		; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP9:%.]] = getelementptr float, float [[ADDR]], i64 [[TMP8]]		; CHECK-NEXT: [[TMP9:%.]] = getelementptr float, float [[ADDR]], i64 [[TMP8]]
; CHECK-NEXT: store float 1.000000e+01, float* [[TMP9]], align 4		; CHECK-NEXT: store float 1.000000e+01, float* [[TMP9]], align 4
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]		; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
; CHECK: pred.store.continue2:		; CHECK: pred.store.continue2:
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200		; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200
; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: br label [[SCALAR_PH]]		; CHECK-NEXT: br label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]		; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:		; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR]], i64 [[IV]]		; CHECK-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR]], i64 [[IV]]
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.]], label [[LOOP_BODY:%.]]		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.]], label [[LOOP_BODY:%.]]
; CHECK: loop.body:		; CHECK: loop.body:
; CHECK-NEXT: [[TMP11:%.]] = load float, float [[GEP]], align 4		; CHECK-NEXT: [[TMP11:%.]] = load float, float [[GEP]], align 4
; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP11]], 0.000000e+00		; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP11]], 0.000000e+00
; CHECK-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]		; CHECK-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: store float 1.000000e+01, float* [[GEP]], align 4		; CHECK-NEXT: store float 1.000000e+01, float* [[GEP]], align 4
; CHECK-NEXT: br label [[LOOP_LATCH]]		; CHECK-NEXT: br label [[LOOP_LATCH]]
; CHECK: loop.latch:		; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-NEXT: br label [[LOOP_HEADER]], !llvm.loop [[LOOP13:![0-9]+]]		; CHECK-NEXT: br label [[LOOP_HEADER]], !llvm.loop [[LOOP19:![0-9]+]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; TAILFOLD-LABEL: @scalar_predication(		; TAILFOLD-LABEL: @scalar_predication(
; TAILFOLD-NEXT: entry:		; TAILFOLD-NEXT: entry:
; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]		; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]
; TAILFOLD: loop.header:		; TAILFOLD: loop.header:
; TAILFOLD-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]		; TAILFOLD-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, i32 [[ADDR:%.*]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, i32 [[ADDR:%.*]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[TMP2]], i32 0
; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*		; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4		; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
; CHECK-NEXT: [[TMP5]] = add <2 x i32> [[VEC_PHI]], [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP5]] = add <2 x i32> [[VEC_PHI]], [[WIDE_LOAD]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200		; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[TMP5]])
; CHECK-NEXT: br label [[SCALAR_PH]]		; CHECK-NEXT: br label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]		; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:		; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]		; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]
; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i64 [[IV]]		; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i64 [[IV]]
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[LOOP_LATCH]]		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[LOOP_LATCH]]
; CHECK: loop.latch:		; CHECK: loop.latch:
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[GEP]], align 4		; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[GEP]], align 4
; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[TMP8]]		; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[TMP8]]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] = icmp eq i64 [[IV]], 400		; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] = icmp eq i64 [[IV]], 400
; CHECK-NEXT: br i1 [[EXITCOND2_NOT]], label [[EXIT]], label [[LOOP_HEADER]], !llvm.loop [[LOOP15:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND2_NOT]], label [[EXIT]], label [[LOOP_HEADER]], !llvm.loop [[LOOP21:![0-9]+]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [ 0, [[LOOP_HEADER]] ], [ [[ACCUM_NEXT]], [[LOOP_LATCH]] ]		; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [ 0, [[LOOP_HEADER]] ], [ [[ACCUM_NEXT]], [[LOOP_LATCH]] ]
; CHECK-NEXT: ret i32 [[LCSSA]]		; CHECK-NEXT: ret i32 [[LCSSA]]
;		;
; TAILFOLD-LABEL: @me_reduction(		; TAILFOLD-LABEL: @me_reduction(
; TAILFOLD-NEXT: entry:		; TAILFOLD-NEXT: entry:
; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]		; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]
; TAILFOLD: loop.header:		; TAILFOLD: loop.header:
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll

	; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; Make sure LV legal bails out when there is no exiting block
	; CHECK-LABEL: "no_exiting_block"
	; CHECK: LV: Not vectorizing: The loop must have a unique exit block.
	define i32 @no_exiting_block() {
	entry:
	br label %for.body

	for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16
	br i1 %cmp, label %for.body, label %for.second

	for.second:
	br label %for.body
	}

	; Make sure LV legal bails out when there is a non-int, non-ptr phi			; Make sure LV legal bails out when there is a non-int, non-ptr phi
	; CHECK-LABEL: "invalid_phi_types"			; CHECK-LABEL: "invalid_phi_types"
	; CHECK: LV: Not vectorizing: Found a non-int non-pointer PHI.			; CHECK: LV: Not vectorizing: Found a non-int non-pointer PHI.
	define i32 @invalid_phi_types() {			define i32 @invalid_phi_types() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 33 Lines

llvm/test/Transforms/LoopVectorize/remarks-multi-exit-loops.ll

	; RUN: opt -disable-output -loop-vectorize -pass-remarks-analysis='.*' -force-vector-width=2 2>&1 %s \| FileCheck %s			; RUN: opt -disable-output -loop-vectorize -pass-remarks-analysis='.*' -force-vector-width=2 2>&1 %s \| FileCheck %s

	; Make sure LV does not crash when generating remarks for loops with non-unique			; Make sure LV does not crash when generating remarks for loops with non-unique
	; exit blocks.			; exit blocks.
	define i32 @test_non_unique_exit_blocks(i32* nocapture readonly align 4 dereferenceable(1024) %data, i32 %x) {			define i32 @test_non_unique_exit_blocks(i32* nocapture readonly align 4 dereferenceable(1024) %data, i32 %x) {
	; CHECK: loop not vectorized: loop control flow is not understood by vectorizer			; CHECK: loop not vectorized: could not determine number of loop iterations
	;			;
	entry:			entry:
	br label %for.header			br label %for.header

	for.header: ; preds = %for.cond.lr.ph, %for.body			for.header: ; preds = %for.cond.lr.ph, %for.body
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.latch ]
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %iv.next, 256			%exitcond.not = icmp eq i64 %iv.next, 256
	Show All 14 Lines