Diff 70437

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,254 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&		assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same integer type");

		auto scalarUserIsUniform = [&](User *U) -> bool {
		mkuperUnsubmitted Not Done Reply Inline Actions Why "scalar"? mkuper: Why "scalar"?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions This probably makes more sense with the response below. We're checking to see if all the scalar users of the IV we are scalarizing are also uniform. We can scalarize an IV if it has both vector and scalar users (ending up with two versions). mssimpso: This probably makes more sense with the response below. We're checking to see if all the scalar…
		mkuperUnsubmitted Not Done Reply Inline Actions Oh, ok, I understand now - this checks "isScalar() implies isUniform()" by computing "!isScalar() \|\| isUniform()". I read "scalar users" as "pre-vectorization users" not "users that are going to stay scalar", and got confused. mkuper: Oh, ok, I understand now - this checks "isScalar() implies isUniform()" by computing "!isScalar…
		auto *I = cast<Instruction>(U);
		return !OrigLoop->contains(I) \|\| !Legal->isScalarAfterVectorization(I) \|\|
		mkuperUnsubmitted Done Reply Inline Actions Why not return (!OrigLoop->contains(I) \|\| !Legal->isScalarAfterVectorization(I) \|\| Legal->isUniformAfterVectorization(I)) ? mkuper: Why not ``` return (!OrigLoop->contains(I) \|\| !Legal->isScalarAfterVectorization(I) \|\|…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That should work! mssimpso: That should work!
		Legal->isUniformAfterVectorization(I);
		};

		// Determine the number of scalars we need to generate for each unroll
		// iteration. If EntryVal is uniform or all it's scalar users are uniform, we
		// only need to generate the first lane. Otherwise, we generate all VF
		// values. We are essentially determining if the induction variable has no
		// "multi-scalar" (non-uniform scalar) users.
		unsigned Lanes = VF;
		if (Legal->isUniformAfterVectorization(cast<Instruction>(EntryVal)) \|\|
		all_of(EntryVal->users(), scalarUserIsUniform))
		Lanes = 1;

		mkuperUnsubmitted Not Done Reply Inline Actions Why do we need this? Wouldn't we already mark EntryVal as uniform in collectLoopUniforms(), if all its users are uniform / consecutive pointers? mkuper: Why do we need this? Wouldn't we already mark EntryVal as uniform in collectLoopUniforms(), if…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions This is for the cases in which we both want a vector and a scalar version of an IV. We scalarize an IV that's not marked scalar after vectorization when it has at least one scalar user. For such an IV, if all it's scalar users are also uniform, we will only ever need the first lane of the scalar IV. mssimpso: This is for the cases in which we both want a vector and a scalar version of an IV. We…
		mkuperUnsubmitted Not Done Reply Inline Actions Ah, got it, thanks! Sentences like "all its scalar users are uniform" still confuse me, because of the terminology we've adopted. My personal preference is still "uniform" for what we call "uniform scalar" and "multi-scalar" for what we call "non-uniform scalar" - then we could say that we're looking for an IV that has no multi-scalar users. But maybe I just need to get used to the current state. :-) mkuper: Ah, got it, thanks! Sentences like "all its scalar users are uniform" still confuse me…
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions I agree the terminology is a bit confusing. At the very least, I'll add a more detailed comment here. mssimpso: I agree the terminology is a bit confusing. At the very least, I'll add a more detailed comment…
// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
ScalarParts Entry(UF);		ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
for (unsigned Lane = 0; Lane < VF; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);		auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = Builder.CreateMul(StartIdx, Step);
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
Entry[Part][Lane] = Add;		Entry[Part][Lane] = Add;
}		}
}		}
VectorLoopValueMap.initScalar(EntryVal, Entry);		VectorLoopValueMap.initScalar(EntryVal, Entry);
}		}
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,
// vectorization factor is one), there is no need to generate an		// vectorization factor is one), there is no need to generate an
// extractelement instruction.		// extractelement instruction.
auto *U = getVectorValue(V)[Part];		auto *U = getVectorValue(V)[Part];
if (!U->getType()->isVectorTy()) {		if (!U->getType()->isVectorTy()) {
assert(VF == 1 && "Value not scalarized has non-vector type");		assert(VF == 1 && "Value not scalarized has non-vector type");
return U;		return U;
}		}

		assert(Lane > 0 ? !Legal->isUniformAfterVectorization(cast<Instruction>(V))
		: true && "Uniform values only have lane zero");

// Otherwise, the value from the original loop has been vectorized and is		// Otherwise, the value from the original loop has been vectorized and is
// represented by UF vector values. Extract and return the requested scalar		// represented by UF vector values. Extract and return the requested scalar
// value from the appropriate vector lane.		// value from the appropriate vector lane.
return Builder.CreateExtractElement(U, Builder.getInt32(Lane));		return Builder.CreateExtractElement(U, Builder.getInt32(Lane));
}		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
▲ Show 20 Lines • Show All 496 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,

// Initialize a new scalar map entry.		// Initialize a new scalar map entry.
ScalarParts Entry(UF);		ScalarParts Entry(UF);

VectorParts Cond;		VectorParts Cond;
if (IfPredicateInstr)		if (IfPredicateInstr)
Cond = createBlockInMask(Instr->getParent());		Cond = createBlockInMask(Instr->getParent());

		// Determine the number of scalars we need to generate for each unroll
		// iteration. If the instruction is uniform, we only need to generate the
		// first lane. Otherwise, we generate all VF values.
		unsigned Lanes = Legal->isUniformAfterVectorization(Instr) ? 1 : VF;

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
// For each scalar that we create:		// For each scalar that we create:
for (unsigned Width = 0; Width < VF; ++Width) {		for (unsigned Width = 0; Width < Lanes; ++Width) {

// Start if-block.		// Start if-block.
Value *Cmp = nullptr;		Value *Cmp = nullptr;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));		Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
ConstantInt::get(Cmp->getType(), 1));		ConstantInt::get(Cmp->getType(), 1));
}		}
▲ Show 20 Lines • Show All 1,557 Lines • ▼ Show 20 Lines	static bool mayDivideByZero(Instruction &I) {
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {		void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {

		// Scalarize instructions that should remain scalar after vectorization.
		mkuperUnsubmitted Done Reply Inline Actions This scares me a bit. :-) As I said on the PR, I definitely think we should do it, but it still scares me. Perhaps add a bit more testing? Regardless, there's (at least?) one more special case I see below - DbgInfoIntrinsic. mkuper: This scares me a bit. :-) As I said on the PR, I definitely think we should do it, but it still…
		hfinkelUnsubmitted Not Done Reply Inline Actions As I said on the PR, I definitely think we should do it, but it still scares me. Perhaps add a bit more testing? Which PR? hfinkel: > As I said on the PR, I definitely think we should do it, but it still scares me. Perhaps add…
		mkuperUnsubmitted Not Done Reply Inline Actions My bad, not a PR, meant "on the other review". :-) https://reviews.llvm.org/D23509 mkuper: My bad, not a PR, meant "on the other review". :-) https://reviews.llvm.org/D23509
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Right! Thanks for pointing out the debug intrinsics. Perhaps it would make better sense to handle all of the cases individually, instead all together at the top? mssimpso: Right! Thanks for pointing out the debug intrinsics. Perhaps it would make better sense to…
		mkuperUnsubmitted Not Done Reply Inline Actions Hmpf. I think I still prefer handling this at the top - duplicating this check into each case seems silly. mkuper: Hmpf. I think I still prefer handling this at the top - duplicating this check into each case…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Agreed. mssimpso: Agreed.
		if (!(isa<BranchInst>(&I) \|\| isa<PHINode>(&I) \|\|
		isa<DbgInfoIntrinsic>(&I)) &&
		Legal->isScalarAfterVectorization(&I)) {
		scalarizeInstruction(&I);
		continue;
		}

switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		continue;
case Instruction::PHI: {		case Instruction::PHI: {
// Vectorize PHINodes.		// Vectorize PHINodes.
widenPHIInstruction(&I, UF, VF, PV);		widenPHIInstruction(&I, UF, VF, PV);
▲ Show 20 Lines • Show All 2,637 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll

	; RUN: opt < %s -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -S \| FileCheck %s

				; CHECK: vector.body:
	; CHECK: fadd			; CHECK: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: =
	; CHECK-NOT: fadd			; CHECK-NOT: fadd
	; CHECK-SAME: >			; CHECK: middle.block

	target datalayout = "e-m:e-i64:64-n32:64"			target datalayout = "e-m:e-i64:64-n32:64"
	target triple = "powerpc64le-ibm-linux-gnu"			target triple = "powerpc64le-ibm-linux-gnu"

	define void @test(double* nocapture readonly %arr, i32 signext %len) #0 {			define void @test(double* nocapture readonly %arr, i32 signext %len) #0 {
	entry:			entry:
	%cmp4 = icmp sgt i32 %len, 0			%cmp4 = icmp sgt i32 %len, 0
	br i1 %cmp4, label %for.body.lr.ph, label %for.end			br i1 %cmp4, label %for.body.lr.ph, label %for.end
	Show All 24 Lines

test/Transforms/LoopVectorize/PowerPC/vsx-tsvc-s173.ll

Show All 37 Lines	for.end: ; preds = %for.body3
%cmp = icmp slt i32 %inc11, %mul		%cmp = icmp slt i32 %inc11, %mul
br i1 %cmp, label %for.cond1.preheader, label %for.end12		br i1 %cmp, label %for.cond1.preheader, label %for.end12

for.end12: ; preds = %for.end, %entry		for.end12: ; preds = %for.end, %entry
ret i32 0		ret i32 0

; CHECK-LABEL: @s173		; CHECK-LABEL: @s173
; CHECK: load <4 x float>, <4 x float>*		; CHECK: load <4 x float>, <4 x float>*
; CHECK: add i64 %index, 16000		; CHECK: add nsw i64 %index, 16000
; CHECK: ret i32 0		; CHECK: ret i32 0
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

test/Transforms/LoopVectorize/global_alias.ll

	Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines
	; /// Different objects, negative induction, shortening slide			; /// Different objects, negative induction, shortening slide
	; int noAlias08 (int a) {			; int noAlias08 (int a) {
	; int i;			; int i;
	; for (i=0; i<SIZE-10; i++)			; for (i=0; i<SIZE-10; i++)
	; Foo.A[SIZE-i-1] = Foo.B[SIZE-i-10] + a;			; Foo.A[SIZE-i-1] = Foo.B[SIZE-i-10] + a;
	; return Foo.A[a];			; return Foo.A[a];
	; }			; }
	; CHECK-LABEL: define i32 @noAlias08(			; CHECK-LABEL: define i32 @noAlias08(
	; CHECK: sub <4 x i32>			; CHECK: load <4 x i32>
	; CHECK: ret			; CHECK: ret

	define i32 @noAlias08(i32 %a) #0 {			define i32 @noAlias08(i32 %a) #0 {
	entry:			entry:
	%a.addr = alloca i32, align 4			%a.addr = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store i32 %a, i32* %a.addr, align 4			store i32 %a, i32* %a.addr, align 4
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	Show All 35 Lines
	; /// Different objects, negative induction, widening slide			; /// Different objects, negative induction, widening slide
	; int noAlias09 (int a) {			; int noAlias09 (int a) {
	; int i;			; int i;
	; for (i=0; i<SIZE; i++)			; for (i=0; i<SIZE; i++)
	; Foo.A[SIZE-i-10] = Foo.B[SIZE-i-1] + a;			; Foo.A[SIZE-i-10] = Foo.B[SIZE-i-1] + a;
	; return Foo.A[a];			; return Foo.A[a];
	; }			; }
	; CHECK-LABEL: define i32 @noAlias09(			; CHECK-LABEL: define i32 @noAlias09(
	; CHECK: sub <4 x i32>			; CHECK: load <4 x i32>
	; CHECK: ret			; CHECK: ret

	define i32 @noAlias09(i32 %a) #0 {			define i32 @noAlias09(i32 %a) #0 {
	entry:			entry:
	%a.addr = alloca i32, align 4			%a.addr = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store i32 %a, i32* %a.addr, align 4			store i32 %a, i32* %a.addr, align 4
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines
	; /// Same objects, negative induction, constant distance, just enough for vector size			; /// Same objects, negative induction, constant distance, just enough for vector size
	; int noAlias14 (int a) {			; int noAlias14 (int a) {
	; int i;			; int i;
	; for (i=0; i<SIZE; i++)			; for (i=0; i<SIZE; i++)
	; Foo.A[SIZE-i-1] = Foo.A[SIZE-i-5] + a;			; Foo.A[SIZE-i-1] = Foo.A[SIZE-i-5] + a;
	; return Foo.A[a];			; return Foo.A[a];
	; }			; }
	; CHECK-LABEL: define i32 @noAlias14(			; CHECK-LABEL: define i32 @noAlias14(
	; CHECK: sub <4 x i32>			; CHECK: load <4 x i32>
	; CHECK: ret			; CHECK: ret

	define i32 @noAlias14(i32 %a) #0 {			define i32 @noAlias14(i32 %a) #0 {
	entry:			entry:
	%a.addr = alloca i32, align 4			%a.addr = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store i32 %a, i32* %a.addr, align 4			store i32 %a, i32* %a.addr, align 4
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	;			;
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; sum += a[i];			; sum += a[i];
	;			;
	; CHECK-LABEL: @scalarize_induction_variable_01(			; CHECK-LABEL: @scalarize_induction_variable_01(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %[[i0:.+]] = add i64 %index, 0			; CHECK: %[[i0:.+]] = add i64 %index, 0
	; CHECK: %[[i1:.+]] = add i64 %index, 1
	; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i0]]			; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
	; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
	;			;
	; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_01(			; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_01(
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; UNROLL-NO-IC: %[[i0:.+]] = add i64 %index, 0			; UNROLL-NO-IC: %[[i0:.+]] = add i64 %index, 0
	; UNROLL-NO-IC: %[[i1:.+]] = add i64 %index, 1
	; UNROLL-NO-IC: %[[i2:.+]] = add i64 %index, 2			; UNROLL-NO-IC: %[[i2:.+]] = add i64 %index, 2
	; UNROLL-NO-IC: %[[i3:.+]] = add i64 %index, 3
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i0]]			; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i2]]			; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i2]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i3]]
	;			;
	; IND-LABEL: @scalarize_induction_variable_01(			; IND-LABEL: @scalarize_induction_variable_01(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; IND-NOT: add i64 {{.*}}, 2			; IND-NOT: add i64 {{.*}}, 2
	; IND: getelementptr inbounds i64, i64* %a, i64 %index			; IND: getelementptr inbounds i64, i64* %a, i64 %index
	;			;
	; UNROLL-LABEL: @scalarize_induction_variable_01(			; UNROLL-LABEL: @scalarize_induction_variable_01(
	▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines
	; CHECK: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0			; CHECK: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0
	; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer			; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
	; CHECK: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>			; CHECK: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]			; CHECK: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]
	; CHECK: %offset.idx = add i32 %i, %index			; CHECK: %offset.idx = add i32 %i, %index
	; CHECK: %[[A1:.*]] = add i32 %offset.idx, 0			; CHECK: %[[A1:.*]] = add i32 %offset.idx, 0
	; CHECK: %[[A2:.*]] = add i32 %offset.idx, 1
	; CHECK: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A1]]			; CHECK: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A1]]
	; CHECK: %[[G2:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A2]]
	; CHECK: %[[G3:.]] = getelementptr i32, i32 %[[G1]], i32 0			; CHECK: %[[G3:.]] = getelementptr i32, i32 %[[G1]], i32 0
	; CHECK: %[[B1:.]] = bitcast i32 %[[G3]] to <2 x i32>*			; CHECK: %[[B1:.]] = bitcast i32 %[[G3]] to <2 x i32>*
	; CHECK: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]			; CHECK: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]
	; CHECK: %index.next = add i32 %index, 2			; CHECK: %index.next = add i32 %index, 2
	; CHECK: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>			; CHECK: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>
	; CHECK: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec			; CHECK: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec
	; CHECK: br i1 %[[CMP]]			; CHECK: br i1 %[[CMP]]
	;			;
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction_plus.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	@array = common global [1024 x i32] zeroinitializer, align 16			@array = common global [1024 x i32] zeroinitializer, align 16

	;CHECK-LABEL: @array_at_plus_one(			;CHECK-LABEL: @array_at_plus_one(
	;CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			;CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	;CHECK: %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]			;CHECK: %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]
	;CHECK: %vec.ind1 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %vector.ph ], [ %vec.ind.next2, %vector.body ]			;CHECK: %vec.ind1 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %vector.ph ], [ %vec.ind.next2, %vector.body ]
	;CHECK: add nsw <4 x i64> %vec.ind, <i64 12, i64 12, i64 12, i64 12>			;CHECK: %[[T1:.+]] = add i64 %index, 0
				;CHECK: %[[T2:.+]] = add nsw i64 %[[T1]], 12
				;CHECK: getelementptr inbounds [1024 x i32], [1024 x i32]* @array, i64 0, i64 %[[T2]]
	;CHECK: %vec.ind.next = add <4 x i64> %vec.ind, <i64 4, i64 4, i64 4, i64 4>			;CHECK: %vec.ind.next = add <4 x i64> %vec.ind, <i64 4, i64 4, i64 4, i64 4>
	;CHECK: %vec.ind.next2 = add <4 x i32> %vec.ind1, <i32 4, i32 4, i32 4, i32 4>			;CHECK: %vec.ind.next2 = add <4 x i32> %vec.ind1, <i32 4, i32 4, i32 4, i32 4>
	;CHECK: ret i32			;CHECK: ret i32
	define i32 @array_at_plus_one(i32 %n) nounwind uwtable ssp {			define i32 @array_at_plus_one(i32 %n) nounwind uwtable ssp {
	%1 = icmp sgt i32 %n, 0			%1 = icmp sgt i32 %n, 0
	br i1 %1, label %.lr.ph, label %._crit_edge			br i1 %1, label %.lr.ph, label %._crit_edge

	.lr.ph: ; preds = %0, %.lr.ph			.lr.ph: ; preds = %0, %.lr.ph
	Show All 13 Lines

test/Transforms/LoopVectorize/reverse_induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure consecutive vector generates correct negative indices.			; Make sure consecutive vector generates correct negative indices.
	; PR15882			; PR15882

	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %startval, %index			; CHECK: %offset.idx = sub i64 %startval, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {			define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i128(			; CHECK-LABEL: @reverse_induction_i128(
	; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i128 %startval, %index			; CHECK: %offset.idx = sub i128 %startval, %index
	; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i128 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i128 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i128 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i128 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i128 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i128 %offset.idx, -7

	define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {			define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i16(			; CHECK-LABEL: @reverse_induction_i16(
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i16 %startval, {{.*}}			; CHECK: %offset.idx = sub i16 %startval, {{.*}}
	; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i16 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i16 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i16 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i16 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i16 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i16 %offset.idx, -7

	define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {			define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 26 Lines
	; --reverse_induction;			; --reverse_induction;
	; }			; }
	; }			; }

	; CHECK-LABEL: @reverse_forward_induction_i64_i8(			; CHECK-LABEL: @reverse_forward_induction_i64_i8(
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 1023, %index			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define void @reverse_forward_induction_i64_i8() {			define void @reverse_forward_induction_i64_i8() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]
	Show All 9 Lines
	while.end:			while.end:
	ret void			ret void
	}			}

	; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(			; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 1023, %index			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define void @reverse_forward_induction_i64_i8_signed() {			define void @reverse_forward_induction_i64_i8_signed() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Scalarize instructions marked scalar after vectorization
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70437

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll

test/Transforms/LoopVectorize/PowerPC/vsx-tsvc-s173.ll

test/Transforms/LoopVectorize/global_alias.ll

test/Transforms/LoopVectorize/induction.ll

test/Transforms/LoopVectorize/induction_plus.ll

test/Transforms/LoopVectorize/reverse_induction.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Scalarize instructions marked scalar after vectorizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70437

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll

test/Transforms/LoopVectorize/PowerPC/vsx-tsvc-s173.ll

test/Transforms/LoopVectorize/global_alias.ll

test/Transforms/LoopVectorize/induction.ll

test/Transforms/LoopVectorize/induction_plus.ll

test/Transforms/LoopVectorize/reverse_induction.ll

[LV] Scalarize instructions marked scalar after vectorization
ClosedPublic