Download Raw Diff

Details

Reviewers

qcolombet
nadav
hfinkel

Summary

No matter whether gep merging takes effect or not, it is better for the analysis not to depend on having only one level GEP, just as DecomposeGEPExpression does right now.

The patch extends consecutive analysis in LoopVectorizer pass to handle multiple level GEPs. This is a following patch for http://reviews.llvm.org/D9865.

I also tried other way to solve the problem more generally by generating a temporarily merged GEP everytime when analyzing a GEP and removing it after the analysis, but it failed. A lot of existing analysis requires GEP to be a valid inst inserted in the function. We need to insert the temporarily combined GEP into the original BB, do the analysis, then delete it -- making a dangling GEP insn just for the analysis doesn't work. But it makes the IR during the analysis messy this way. Another way is to make the combined GEP kind of meta data just for analysis, but I am not sure how much effort it will cost because the meta data needs to be updated from time to time.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 27216.Jun 5 2015, 12:02 PM

wmi retitled this revision from to Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs.

wmi updated this object.

wmi edited the test plan for this revision. (Show Details)

wmi added reviewers: qcolombet, hfinkel, nadav.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: davidxl, Unknown Object (MLST).

Hi Wei,

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

Thanks,
-Quentin

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

I add the testcase which is the same one as test/Transforms/InstCombine/gep-merge1.ll in http://reviews.llvm.org/D9865.

Thanks,
Wei.

Hi Wei,

Please find some questions inline. Also, could you please clarify, if this patch depends on D9865 or not?

Thanks,
Michael

lib/Transforms/Vectorize/LoopVectorize.cpp
1723–1728	Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)`?
1725	I don't understand why it's correct. Could you please clarify the logic behind it? Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure.
test/Transforms/LoopVectorize/pr23580.ll
48–70	Why do we need such complicated loop body, if we're basically only interested in gep+gep+load? Also, the control flow in this test is strange, and I'm not sure if it's necessary for the purpose of the test. Could we simplify it?

Michael, Thanks for the review.

could you please clarify, if this patch depends on D9865 or not?

The patch doesn't depend on D9865. The motivation is to enhance the analysis independently, .i.e, even when gep related IR is not in an expected shape, the analysis can still be valid.

lib/Transforms/Vectorize/LoopVectorize.cpp
1723–1728	Yes, it is better.
1725	Originally there are two cases where a load/store is consecutive: case1. The pointer operand of gep is a phi and it is an induction variable. case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0. The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep. For both case1 and case3, the pointer operand of the original gep has const stride so it is loop variant. For case2, the pointer operand of the original gep is loop invariant. That is why case3 can reuse the same logic as case1 in InnerLoopVectorizer::vectorizeMemoryInstruction.
test/Transforms/LoopVectorize/pr23580.ll
48–70	I will simplify the test.

mzolotukhin added inline comments.Jun 17 2015, 4:51 PM

test/Transforms/LoopVectorize/pr23580.ll
3	Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'.

I made a fix to use dyn_cast_or_null and simplified the test. PTAL.

Thanks for addressing my comments.

Am I getting it right, that in the test we want to be able to tell that the following two gep-expressions result in the same address?

%arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
%ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0

If that's so, can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

test/Transforms/LoopVectorize/pr23580.ll
8–9	This looks unused.
15	This looks unused.

mzolotukhin added inline comments.Jun 17 2015, 6:02 PM

test/Transforms/LoopVectorize/pr23580.ll
3	Can we just run `-loop-rotate` manually on the test and use the output as the new test (we won't need to run `-loop-rotate` there)? And why do we need `-instcombine`? Could we just check vectorizer's output?

wmi added inline comments.Jun 17 2015, 11:57 PM

test/Transforms/LoopVectorize/pr23580.ll
3	I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.
8–9	Fixed.
15	Fixed. Thanks for catching it.

can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

I try that and it seems works. The code looks simpler this way. I havn't done much test. llvm unit test is ok. I will do more testing for it.

The patch using SCEV caused a regression in MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt. The fix is: when there is a symbolic stride, don't return 0 but try the original LAA logic.

Hi Wei,

Thanks for your work! Please find more comments below:

Originally there are two cases where a load/store is consecutive:
case1. The pointer operand of gep is a phi and it is an induction variable.
case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0.

The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep.

These all are details that should be covered by SCEV. That is, once you use SCEV for such analysis, you no longer need to bother about whether the pointer operand is PHI, GEP, or something else. And, you don't need to specifically handle the cases like %gep2 = getelementptr %gep1, i64 0, i64 0 since in terms of SCEV they should give you the same SCEV expression.

Thus, I expect that this patch should make a lot of code in isConsecutivePtr redundant - probably the only code we need there is the one you are about to add. For instance, I suspect that we won't need these checks:

// We can emit wide load/stores only if the last non-zero index is the
// induction variable.
...

...if we use SCEV properly.

I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.

I'm not convinced here. Please take a look at, for instance, test/Transforms/LoopVectorize/X86/powof2div.ll. In order to figure out whether the vectorization takes place, we need to just look for a vector type (like <4 x i32>) in the output. -inst-combine is unnecessary for this. Also, I'm pretty sure that the test can and should be reduced further - we don't need so many basic blocks and instructions to test this feature (again, you could take a look at powof2div.ll as an example).

lib/Transforms/Vectorize/LoopVectorize.cpp
1585	Please use `dyn_cast_or_null` here. It'll be much more compact.
1586–1588	I'd rewrite this as if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(*SE))) {
test/Transforms/LoopVectorize/pr23580.ll
28	Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further.

Thanks for the comments. phabricator server seems down, so I post the
updated patch directly in attachment.

patch.txt190 KBDownload

repaste the patch using SCEV in phabricator.

One problem I noticed about using SCEV to check consecutiveness is that it may add a new case that: both pointer operand of gep and one operand of gep are variant. For this case, InnerLoopVectorizer::vectorizeMemoryInstruction may generate incorrect code. Previously, isConsecutive only return true for the case either pointer operand of gep is invariant, or all the other operands of gep are invariant. If we simply check SCEV in isConsecutive, It is like we move the complexity from isConsecutive to vectorizeMemoryInstruction.

Please make sure to upload patches with full context.

lib/Transforms/Vectorize/LoopVectorize.cpp
1562	This can just be: int64_t StepVal = C->getValue()->getSExtValue();
1567	Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

I uploaded the patch with full context (Sorry) and addressed Hal's comments.

hfinkel added inline comments.Jun 22 2015, 5:15 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1563	Does this give you what you want if you have nested loops? You only want that part of the recurrence that refers to the inner loop, right?

wmi added inline comments.Jun 22 2015, 6:13 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1563	Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found the Loop inside SCEVAddRecExpr may refer to outside loop if the SCEVAddRecExpr is invariant for the inside loop. I will check whether the loop of SCEVAddRecExpr is identical with the loop in LoopVectorizationLegality.

Drop this revision in favor of D21861.

Diff 27919

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 1,553 Lines • ▼ Show 20 Lines	int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {

GetElementPtrInst *Gep = dyn_cast_or_null<GetElementPtrInst>(Ptr);		GetElementPtrInst *Gep = dyn_cast_or_null<GetElementPtrInst>(Ptr);
if (!Gep)		if (!Gep)
return 0;		return 0;

unsigned NumOperands = Gep->getNumOperands();		unsigned NumOperands = Gep->getNumOperands();
Value *GpPtr = Gep->getPointerOperand();		Value *GpPtr = Gep->getPointerOperand();
// If this GEP value is a consecutive pointer induction variable and all of		// If this GEP value is a consecutive pointer induction variable and all of
// the indices are constant then we know it is consecutive. We can		// the indices are constant then we know it is consecutive. We can
		hfinkelUnsubmitted Not Done Reply Inline Actions This can just be: int64_t StepVal = C->getValue()->getSExtValue(); hfinkel: This can just be: int64_t StepVal = C->getValue()->getSExtValue();
Phi = dyn_cast<PHINode>(GpPtr);		Phi = dyn_cast<PHINode>(GpPtr);
		hfinkelUnsubmitted Not Done Reply Inline Actions Does this give you what you want if you have nested loops? You only want that part of the recurrence that refers to the inner loop, right? hfinkel: Does this give you what you want if you have nested loops? You only want that part of the…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found the Loop inside SCEVAddRecExpr may refer to outside loop if the SCEVAddRecExpr is invariant for the inside loop. I will check whether the loop of SCEVAddRecExpr is identical with the loop in LoopVectorizationLegality. wmi: Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found…
if (Phi && Inductions.count(Phi)) {		if (Phi && Inductions.count(Phi)) {

// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());		PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());
		hfinkelUnsubmitted Not Done Reply Inline Actions Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return hfinkel: Don't put an 'else' if the 'if' unconditionally returns. http://llvm.org/docs/CodingStandards.
if (GepPtrType->getElementType()->isAggregateType())		if (GepPtrType->getElementType()->isAggregateType())
return 0;		return 0;

// Make sure that all of the index operands are loop invariant.		// Make sure that all of the index operands are loop invariant.
for (unsigned i = 1; i < NumOperands; ++i)		for (unsigned i = 1; i < NumOperands; ++i)
if (!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))		if (!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))
return 0;		return 0;

InductionInfo II = Inductions[Phi];		InductionInfo II = Inductions[Phi];
return II.getConsecutiveDirection();		return II.getConsecutiveDirection();
}		}

		const SCEV *PtrScev = SE->getSCEV(Ptr);
		const SCEVAddRecExpr *PtrAddRec = nullptr;
		const DataLayout &DL = Gep->getModule()->getDataLayout();
		// If this Ptr value is SCEVAddRecExpr, has constant stride and the stride
		// equals to the size of pointer element type, we know it is consecutive.
		if (PtrScev && (PtrAddRec = dyn_cast<SCEVAddRecExpr>(PtrScev))) {
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Please use `dyn_cast_or_null` here. It'll be much more compact. mzolotukhin: Please use `dyn_cast_or_null` here. It'll be much more compact.
		const SCEV Step = PtrAddRec->getStepRecurrence(SE);
		const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
		if (!C)
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I'd rewrite this as if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(SE))) { mzolotukhin:* I'd rewrite this as ``` if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec…
		return 0;
		const APInt &APStepVal = C->getValue()->getValue();
		int64_t StepVal = APStepVal.getSExtValue();
		int64_t ElemSize =
		DL.getTypeAllocSize(Ptr->getType()->getPointerElementType());
		if (StepVal == ElemSize)
		return 1;
		else if (StepVal == -ElemSize)
		return -1;
		else
		return 0;
		}

unsigned InductionOperand = getGEPInductionOperand(Gep);		unsigned InductionOperand = getGEPInductionOperand(Gep);

// Check that all of the gep indices are uniform except for our induction		// Check that all of the gep indices are uniform except for our induction
// operand.		// operand.
for (unsigned i = 0; i != NumOperands; ++i)		for (unsigned i = 0; i != NumOperands; ++i)
if (i != InductionOperand &&		if (i != InductionOperand &&
!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))		!SE->isLoopInvariant(SE->getSCEV(Gep->getOperand(i)), TheLoop))
return 0;		return 0;
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
bool UniformLoad = LI && Legal->isUniform(Ptr);		bool UniformLoad = LI && Legal->isUniform(Ptr);
if (!ConsecutiveStride \|\| UniformLoad)		if (!ConsecutiveStride \|\| UniformLoad)
return scalarizeInstruction(Instr);		return scalarizeInstruction(Instr);

Constant *Zero = Builder.getInt32(0);		Constant *Zero = Builder.getInt32(0);
VectorParts &Entry = WidenMap.get(Instr);		VectorParts &Entry = WidenMap.get(Instr);

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
		Instruction *GepPtrInst = nullptr;
GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);		GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);
if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {		if (Gep &&
		(GepPtrInst = dyn_cast_or_null<Instruction>(Gep->getPointerOperand())) &&
		!SE->isLoopInvariant(SE->getSCEV(GepPtrInst), OrigLoop)) {
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I don't understand why it's correct. Could you please clarify the logic behind it? Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure. mzolotukhin: I don't understand why it's correct. Could you please clarify the logic behind it? Originally…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Originally there are two cases where a load/store is consecutive: case1. The pointer operand of gep is a phi and it is an induction variable. case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0. The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep. For both case1 and case3, the pointer operand of the original gep has const stride so it is loop variant. For case2, the pointer operand of the original gep is loop invariant. That is why case3 can reuse the same logic as case1 in InnerLoopVectorizer::vectorizeMemoryInstruction. wmi: Originally there are two cases where a load/store is consecutive: case1. The pointer operand of…
		// The case Gep->getPointerOperand() is an induction variable
		// or a SCEVAddRecExpr.
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)`? mzolotukhin: Could we use `GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand()` instead of `(GepPtr = Gep…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, it is better. wmi: Yes, it is better.
Value *PtrOperand = Gep->getPointerOperand();		Value *PtrOperand = Gep->getPointerOperand();
Value *FirstBasePtr = getVectorValue(PtrOperand)[0];		Value *FirstBasePtr = getVectorValue(PtrOperand)[0];
FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);		FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);

// Create the new GEP with the new induction variable.		// Create the new GEP with the new induction variable.
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());		GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
Gep2->setOperand(0, FirstBasePtr);		Gep2->setOperand(0, FirstBasePtr);
Gep2->setName("gep.indvar.base");		Gep2->setName("gep.indvar.base");
▲ Show 20 Lines • Show All 3,133 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/pr23580.ll

				; PR23580
				; RUN: opt < %s -loop-vectorize -instcombine -S \| FileCheck %s

				mzolotukhinUnsubmitted Not Done Reply Inline Actions Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'. mzolotukhin: Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'.
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we just run `-loop-rotate` manually on the test and use the output as the new test (we won't need to run `-loop-rotate` there)? And why do we need `-instcombine`? Could we just check vectorizer's output? mzolotukhin: Can we just run `-loop-rotate` manually on the test and use the output as the new test (we…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output. wmi: I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated…
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.B = type { i16 }
				%class.G = type <{ %struct.F, [2 x i32], i8, [7 x i8] }>
				%struct.F = type { i8, i8, i8, i16, i32* }
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This looks unused. mzolotukhin: This looks unused.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

				@a = global i32 0, align 4

				declare %struct.B* @_ZN1C5m_fn1Ev()

				; Check geps inside for.body are merged so loop vectorizer can recognize loads
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This looks unused. mzolotukhin: This looks unused.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Thanks for catching it. wmi: Fixed. Thanks for catching it.
				; inside for.body to be inter-iterations consecutive, and generate %wide.loads.
				;
				; CHECK-LABEL: @fn2(
				; CHECK: %wide.load{{[0-9]*}} =
				; CHECK: %wide.load{{[0-9]*}} =

				define void @fn2(%class.G* nocapture readonly %this) align 2 {
				entry:
				br label %for.preheader

				for.preheader: ; preds = %entry
				%call = call %struct.B* @_ZN1C5m_fn1Ev()
				%tmp4 = load i32, i32* @a, align 4
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further. mzolotukhin: Why do we need this call? If we just need some external variable, I'd rather replace it with…
				%idx.ext = sext i32 %tmp4 to i64
				%add.ptr = getelementptr inbounds %struct.B, %struct.B* %call, i64 %idx.ext
				%cmp14.1 = icmp slt i32 1, %tmp4
				br i1 %cmp14.1, label %for.body.lr.ph, label %for.end, !llvm.loop !0

				for.body.lr.ph: ; preds = %for.preheader
				br label %for.body

				for.body: ; preds = %for.body.lr.ph, %for.body
				%k.02 = phi i32 [ 1, %for.body.lr.ph ], [ %add, %for.body ]
				%idxprom15 = sext i32 %k.02 to i64
				%arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
				%ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0
				%tmp9 = load i16, i16* %ival, align 2
				%add = add nsw i32 %k.02, 1
				%arrayidx25 = getelementptr inbounds %struct.B, %struct.B* %call, i64 %idxprom15
				%ival26 = getelementptr inbounds %struct.B, %struct.B* %arrayidx25, i32 0, i32 0
				store i16 %tmp9, i16* %ival26, align 2
				%cmp14 = icmp slt i32 %add, %tmp4
				br i1 %cmp14, label %for.body, label %for.cond.for.end_crit_edge, !llvm.loop !0

				for.cond.for.end_crit_edge: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.cond.for.end_crit_edge, %for.preheader
				ret void
				}

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}

This is an archive of the discontinued LLVM Phabricator instance.

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27919

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/pr23580.ll

This is an archive of the discontinued LLVM Phabricator instance.

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 27919

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/pr23580.ll

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs
AbandonedPublic