This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
10
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
10
pr23580.ll

Differential D10281

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs
AbandonedPublic

Authored by wmi on Jun 5 2015, 12:02 PM.

Download Raw Diff

Details

Reviewers

qcolombet
nadav
hfinkel

Summary

No matter whether gep merging takes effect or not, it is better for the analysis not to depend on having only one level GEP, just as DecomposeGEPExpression does right now.

The patch extends consecutive analysis in LoopVectorizer pass to handle multiple level GEPs. This is a following patch for http://reviews.llvm.org/D9865.

I also tried other way to solve the problem more generally by generating a temporarily merged GEP everytime when analyzing a GEP and removing it after the analysis, but it failed. A lot of existing analysis requires GEP to be a valid inst inserted in the function. We need to insert the temporarily combined GEP into the original BB, do the analysis, then delete it -- making a dangling GEP insn just for the analysis doesn't work. But it makes the IR during the analysis messy this way. Another way is to make the combined GEP kind of meta data just for analysis, but I am not sure how much effort it will cost because the meta data needs to be updated from time to time.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 27216.Jun 5 2015, 12:02 PM

wmi retitled this revision from to Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs.

wmi updated this object.

wmi edited the test plan for this revision. (Show Details)

wmi added reviewers: qcolombet, hfinkel, nadav.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: davidxl, Unknown Object (MLST).

Hi Wei,

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

Thanks,
-Quentin

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

I add the testcase which is the same one as test/Transforms/InstCombine/gep-merge1.ll in http://reviews.llvm.org/D9865.

Thanks,
Wei.

Hi Wei,

Please find some questions inline. Also, could you please clarify, if this patch depends on D9865 or not?

Thanks,
Michael

lib/Transforms/Vectorize/LoopVectorize.cpp
1738–1739	Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)`?
1740	I don't understand why it's correct. Could you please clarify the logic behind it? Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure.
test/Transforms/LoopVectorize/pr23580.ll
47–69	Why do we need such complicated loop body, if we're basically only interested in gep+gep+load? Also, the control flow in this test is strange, and I'm not sure if it's necessary for the purpose of the test. Could we simplify it?

Michael, Thanks for the review.

could you please clarify, if this patch depends on D9865 or not?

The patch doesn't depend on D9865. The motivation is to enhance the analysis independently, .i.e, even when gep related IR is not in an expected shape, the analysis can still be valid.

lib/Transforms/Vectorize/LoopVectorize.cpp
1738–1739	Yes, it is better.
1740	Originally there are two cases where a load/store is consecutive: case1. The pointer operand of gep is a phi and it is an induction variable. case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0. The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep. For both case1 and case3, the pointer operand of the original gep has const stride so it is loop variant. For case2, the pointer operand of the original gep is loop invariant. That is why case3 can reuse the same logic as case1 in InnerLoopVectorizer::vectorizeMemoryInstruction.
test/Transforms/LoopVectorize/pr23580.ll
47–69	I will simplify the test.

mzolotukhin added inline comments.Jun 17 2015, 4:51 PM

test/Transforms/LoopVectorize/pr23580.ll
2	Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'.

I made a fix to use dyn_cast_or_null and simplified the test. PTAL.

Thanks for addressing my comments.

Am I getting it right, that in the test we want to be able to tell that the following two gep-expressions result in the same address?

%arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
%ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0

If that's so, can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

test/Transforms/LoopVectorize/pr23580.ll
8–9	This looks unused.
15	This looks unused.

mzolotukhin added inline comments.Jun 17 2015, 6:02 PM

test/Transforms/LoopVectorize/pr23580.ll
3	Can we just run `-loop-rotate` manually on the test and use the output as the new test (we won't need to run `-loop-rotate` there)? And why do we need `-instcombine`? Could we just check vectorizer's output?

wmi added inline comments.Jun 17 2015, 11:57 PM

test/Transforms/LoopVectorize/pr23580.ll
3	I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.
8–9	Fixed.
15	Fixed. Thanks for catching it.

can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

I try that and it seems works. The code looks simpler this way. I havn't done much test. llvm unit test is ok. I will do more testing for it.

The patch using SCEV caused a regression in MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt. The fix is: when there is a symbolic stride, don't return 0 but try the original LAA logic.

Hi Wei,

Thanks for your work! Please find more comments below:

Originally there are two cases where a load/store is consecutive:
case1. The pointer operand of gep is a phi and it is an induction variable.
case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0.

The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep.

These all are details that should be covered by SCEV. That is, once you use SCEV for such analysis, you no longer need to bother about whether the pointer operand is PHI, GEP, or something else. And, you don't need to specifically handle the cases like %gep2 = getelementptr %gep1, i64 0, i64 0 since in terms of SCEV they should give you the same SCEV expression.

Thus, I expect that this patch should make a lot of code in isConsecutivePtr redundant - probably the only code we need there is the one you are about to add. For instance, I suspect that we won't need these checks:

// We can emit wide load/stores only if the last non-zero index is the
// induction variable.
...

...if we use SCEV properly.

I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.

I'm not convinced here. Please take a look at, for instance, test/Transforms/LoopVectorize/X86/powof2div.ll. In order to figure out whether the vectorization takes place, we need to just look for a vector type (like <4 x i32>) in the output. -inst-combine is unnecessary for this. Also, I'm pretty sure that the test can and should be reduced further - we don't need so many basic blocks and instructions to test this feature (again, you could take a look at powof2div.ll as an example).

lib/Transforms/Vectorize/LoopVectorize.cpp
1621	Please use `dyn_cast_or_null` here. It'll be much more compact.
1622–1624	I'd rewrite this as if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(*SE))) {
test/Transforms/LoopVectorize/pr23580.ll
28	Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further.

Thanks for the comments. phabricator server seems down, so I post the
updated patch directly in attachment.

patch.txt190 KBDownload

repaste the patch using SCEV in phabricator.

One problem I noticed about using SCEV to check consecutiveness is that it may add a new case that: both pointer operand of gep and one operand of gep are variant. For this case, InnerLoopVectorizer::vectorizeMemoryInstruction may generate incorrect code. Previously, isConsecutive only return true for the case either pointer operand of gep is invariant, or all the other operands of gep are invariant. If we simply check SCEV in isConsecutive, It is like we move the complexity from isConsecutive to vectorizeMemoryInstruction.

Please make sure to upload patches with full context.

lib/Transforms/Vectorize/LoopVectorize.cpp
1562	This can just be: int64_t StepVal = C->getValue()->getSExtValue();
1567	Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

I uploaded the patch with full context (Sorry) and addressed Hal's comments.

hfinkel added inline comments.Jun 22 2015, 5:15 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1563	Does this give you what you want if you have nested loops? You only want that part of the recurrence that refers to the inner loop, right?

wmi added inline comments.Jun 22 2015, 6:13 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1563	Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found the Loop inside SCEVAddRecExpr may refer to outside loop if the SCEVAddRecExpr is invariant for the inside loop. I will check whether the loop of SCEVAddRecExpr is identical with the loop in LoopVectorizationLegality.

Drop this revision in favor of D21861.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

76 lines

test/

Transforms/

LoopVectorize/

pr23580.ll

76 lines

Diff 27851

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 1,550 Lines • ▼ Show 20 Lines	if (Phi && Inductions.count(Phi)) {
InductionInfo II = Inductions[Phi];		InductionInfo II = Inductions[Phi];
return II.getConsecutiveDirection();		return II.getConsecutiveDirection();
}		}

GetElementPtrInst *Gep = dyn_cast_or_null<GetElementPtrInst>(Ptr);		GetElementPtrInst *Gep = dyn_cast_or_null<GetElementPtrInst>(Ptr);
if (!Gep)		if (!Gep)
return 0;		return 0;

unsigned NumOperands = Gep->getNumOperands();		const DataLayout &DL = Gep->getModule()->getDataLayout();
		unsigned GEPAllocSize = DL.getTypeAllocSize(
		cast<PointerType>(Gep->getType()->getScalarType())->getElementType());

		hfinkelUnsubmitted Not Done Reply Inline Actions This can just be: int64_t StepVal = C->getValue()->getSExtValue(); hfinkel: This can just be: int64_t StepVal = C->getValue()->getSExtValue();
		unsigned NumOperands;
		hfinkelUnsubmitted Not Done Reply Inline Actions Does this give you what you want if you have nested loops? You only want that part of the recurrence that refers to the inner loop, right? hfinkel: Does this give you what you want if you have nested loops? You only want that part of the…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found the Loop inside SCEVAddRecExpr may refer to outside loop if the SCEVAddRecExpr is invariant for the inside loop. I will check whether the loop of SCEVAddRecExpr is identical with the loop in LoopVectorizationLegality. wmi: Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found…
		while (Gep) {
		NumOperands = Gep->getNumOperands();
Value *GpPtr = Gep->getPointerOperand();		Value *GpPtr = Gep->getPointerOperand();
// If this GEP value is a consecutive pointer induction variable and all of
// the indices are constant then we know it is consecutive. We can
Phi = dyn_cast<PHINode>(GpPtr);		Phi = dyn_cast<PHINode>(GpPtr);
		hfinkelUnsubmitted Not Done Reply Inline Actions Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return hfinkel: Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.
if (Phi && Inductions.count(Phi)) {		if (Phi && Inductions.count(Phi)) {
		// If this GEP value is a consecutive pointer induction variable and
		// all of the indices are constant then we know it is consecutive.

// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());		PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());
if (GepPtrType->getElementType()->isAggregateType())
return 0;

// Make sure that all of the index operands are loop invariant.		// Make sure that all of the index operands are loop invariant.
for (unsigned i = 1; i < NumOperands; ++i)		for (unsigned i = 1; i < NumOperands; ++i)
if (!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))		if (!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))
return 0;		return 0;

		if (GepPtrType->getElementType()->isAggregateType()) {
		if ((Gep = dyn_cast<GetElementPtrInst>(GpPtr)))
		continue;
		else
		return 0;
		}
InductionInfo II = Inductions[Phi];		InductionInfo II = Inductions[Phi];
return II.getConsecutiveDirection();		return II.getConsecutiveDirection();
		} else {
		// If the pointer operand of the GEP is a SCEVAddRecExpr, and all the
		// other operand is 0, and the pointer operand is another
		// GetElementPtrInst, recursively find the induction variable in the
		// pointer operand.
		const SCEV *PtrScev = SE->getSCEV(GpPtr);
		if (dyn_cast<SCEVAddRecExpr>(PtrScev)) {
		for (unsigned i = 1; i < NumOperands; ++i)
		if (!match(Gep->getOperand(i), m_Zero()))
		return 0;

		Gep = dyn_cast<GetElementPtrInst>(GpPtr);
		if (!Gep)
		return 0;

		unsigned NewAllocSize = DL.getTypeAllocSize(
		cast<PointerType>(Gep->getType()->getScalarType())
		->getElementType());
		if (GEPAllocSize != NewAllocSize)
		return 0;

		continue;
		} else {
		break;
		}
		}
}		}

unsigned InductionOperand = getGEPInductionOperand(Gep);		unsigned InductionOperand = getGEPInductionOperand(Gep);

// Check that all of the gep indices are uniform except for our induction		// Check that all of the gep indices are uniform except for our induction
// operand.		// operand.
for (unsigned i = 0; i != NumOperands; ++i)		for (unsigned i = 0; i != NumOperands; ++i)
if (i != InductionOperand &&		if (i != InductionOperand &&
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Please use `dyn_cast_or_null` here. It'll be much more compact. mzolotukhin: Please use `dyn_cast_or_null` here. It'll be much more compact.
!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))		!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))
return 0;		return 0;

		mzolotukhinUnsubmitted Not Done Reply Inline Actions I'd rewrite this as if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(SE))) { mzolotukhin:* I'd rewrite this as ``` if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec…
// We can emit wide load/stores only if the last non-zero index is the		// We can emit wide load/stores only if the last non-zero index is the
// induction variable.		// induction variable.
const SCEV *Last = nullptr;		const SCEV *Last = nullptr;
if (!Strides.count(Gep))		if (!Strides.count(Gep))
Last = SE->getSCEV(Gep->getOperand(InductionOperand));		Last = SE->getSCEV(Gep->getOperand(InductionOperand));
else {		else {
// Because of the multiplication by a stride we can have a s/zext cast.		// Because of the multiplication by a stride we can have a s/zext cast.
// We are going to replace this stride by 1 so the cast is safe to ignore.		// We are going to replace this stride by 1 so the cast is safe to ignore.
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
bool UniformLoad = LI && Legal->isUniform(Ptr);		bool UniformLoad = LI && Legal->isUniform(Ptr);
if (!ConsecutiveStride \|\| UniformLoad)		if (!ConsecutiveStride \|\| UniformLoad)
return scalarizeInstruction(Instr);		return scalarizeInstruction(Instr);

Constant *Zero = Builder.getInt32(0);		Constant *Zero = Builder.getInt32(0);
VectorParts &Entry = WidenMap.get(Instr);		VectorParts &Entry = WidenMap.get(Instr);

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
		Value *GepPtr;
		Instruction *GepPtrInst;
GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);		GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);
if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {		if (Gep && (GepPtr = Gep->getPointerOperand()) &&
		(GepPtrInst = dyn_cast<Instruction>(GepPtr)) &&
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)`? mzolotukhin: Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, it is better. wmi: Yes, it is better.
		!SE->isLoopInvariant(SE->getSCEV(GepPtrInst), OrigLoop)) {
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I don't understand why it's correct. Could you please clarify the logic behind it? Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure. mzolotukhin: I don't understand why it's correct. Could you please clarify the logic behind it? Originally…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Originally there are two cases where a load/store is consecutive: case1. The pointer operand of gep is a phi and it is an induction variable. case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0. The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep. For both case1 and case3, the pointer operand of the original gep has const stride so it is loop variant. For case2, the pointer operand of the original gep is loop invariant. That is why case3 can reuse the same logic as case1 in InnerLoopVectorizer::vectorizeMemoryInstruction. wmi: Originally there are two cases where a load/store is consecutive: case1. The pointer operand of…
		// The case Gep->getPointerOperand() is an induction variable
		// or a SCEVAddRecExpr.
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Value *PtrOperand = Gep->getPointerOperand();		Value *PtrOperand = Gep->getPointerOperand();
Value *FirstBasePtr = getVectorValue(PtrOperand)[0];		Value *FirstBasePtr = getVectorValue(PtrOperand)[0];
FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);		FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);

// Create the new GEP with the new induction variable.		// Create the new GEP with the new induction variable.
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());		GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
Gep2->setOperand(0, FirstBasePtr);		Gep2->setOperand(0, FirstBasePtr);
▲ Show 20 Lines • Show All 3,134 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/pr23580.ll

				; PR23580
				; RUN: opt < %s -O2 -S \| FileCheck %s
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'. mzolotukhin: Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'.

				mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we just run `-loop-rotate` manually on the test and use the output as the new test (we won't need to run `-loop-rotate` there)? And why do we need `-instcombine`? Could we just check vectorizer's output? mzolotukhin: Can we just run `-loop-rotate` manually on the test and use the output as the new test (we…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output. wmi: I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated…
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.anon = type { [0 x %class.C] }
				%class.C = type { i8 }
				%struct.B = type { i16 }
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This looks unused. mzolotukhin: This looks unused.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				%class.G = type <{ %struct.F, [2 x i32], i8, [7 x i8] }>
				%struct.F = type { i8, i8, i8, i16, i32* }

				@a = global i32 0, align 4
				@d = internal global %struct.anon zeroinitializer, align 1

				mzolotukhinUnsubmitted Not Done Reply Inline Actions This looks unused. mzolotukhin: This looks unused.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Thanks for catching it. wmi: Fixed. Thanks for catching it.
				declare %struct.B* @_ZN1C5m_fn1Ev(%class.C*)

				; Check geps inside for.body are merged so loop vectorizer can recognize loads
				; inside for.body to be inter-iterations consecutive, and generate %wide.loads.
				;
				; CHECK-LABEL: @fn2(
				; CHECK: %wide.load{{[0-9]*}} =
				; CHECK: %wide.load{{[0-9]*}} =
				; CHECK: %wide.load{{[0-9]*}} =

				define void @fn2(%class.G* nocapture readonly %this, i1 zeroext %arg) align 2 {
				entry:
				br label %for.cond
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further. mzolotukhin: Why do we need this call? If we just need some external variable, I'd rather replace it with…

				for.cond:
				%tmp1 = load i32, i32* @a, align 4
				%idxprom = sext i32 %tmp1 to i64
				%arrayidx3 = getelementptr inbounds [0 x %class.C], [0 x %class.C]* getelementptr inbounds (%struct.anon, %struct.anon* @d, i32 0, i32 0), i32 0, i64 %idxprom
				%call = call %struct.B* @_ZN1C5m_fn1Ev(%class.C* %arrayidx3)
				%tmp4 = load i32, i32* @a, align 4
				%idx.ext = sext i32 %tmp4 to i64
				%add.ptr = getelementptr inbounds %struct.B, %struct.B* %call, i64 %idx.ext
				%tobool = trunc i32 %tmp4 to i1
				br i1 %tobool, label %for.cond13, label %if.else30

				for.cond13: ; preds = %for.body, %if.then
				%k.0 = phi i32 [ 1, %for.cond ], [ %add, %for.body ]
				%cmp14 = icmp slt i32 %k.0, %tmp4
				br i1 %cmp14, label %for.body, label %if.else30, !llvm.loop !0

				for.body: ; preds = %for.cond13
				%idxprom15 = sext i32 %k.0 to i64
				%arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
				%ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0
				%tmp9 = load i16, i16* %ival, align 2
				%conv17 = sext i16 %tmp9 to i32
				%add = add nsw i32 %k.0, 1
				%idxprom18 = sext i32 %add to i64
				%arrayidx19 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom18
				%ival20 = getelementptr inbounds %struct.B, %struct.B* %arrayidx19, i32 0, i32 0
				%tmp10 = load i16, i16* %ival20, align 2
				%conv21 = sext i16 %tmp10 to i32
				%add22 = add nsw i32 %conv17, %conv21
				%mul = mul nsw i32 %tmp1, %add22
				%add23 = add nsw i32 %tmp4, %mul
				%shr = ashr i32 %add23, %tmp4
				%arrayidx25 = getelementptr inbounds %struct.B, %struct.B* %call, i64 %idxprom15
				%ival26 = getelementptr inbounds %struct.B, %struct.B* %arrayidx25, i32 0, i32 0
				%tmp11 = load i16, i16* %ival26, align 2
				%conv27 = sext i16 %tmp11 to i32
				%sub = sub nsw i32 %conv27, %shr
				%conv28 = trunc i32 %sub to i16
				store i16 %conv28, i16* %ival26, align 2
				br label %for.cond13
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why do we need such complicated loop body, if we're basically only interested in gep+gep+load? Also, the control flow in this test is strange, and I'm not sure if it's necessary for the purpose of the test. Could we simplify it? mzolotukhin: Why do we need such complicated loop body, if we're basically only interested in gep+gep+load?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I will simplify the test. wmi: I will simplify the test.

				if.else30:
				br label %for.cond
				}

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}

This is an archive of the discontinued LLVM Phabricator instance.

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27851

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/pr23580.ll

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs
AbandonedPublic