This is an archive of the discontinued LLVM Phabricator instance.

Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs
AbandonedPublic

Authored by wmi on Jun 5 2015, 12:02 PM.

Details

Summary

No matter whether gep merging takes effect or not, it is better for the analysis not to depend on having only one level GEP, just as DecomposeGEPExpression does right now.

The patch extends consecutive analysis in LoopVectorizer pass to handle multiple level GEPs. This is a following patch for http://reviews.llvm.org/D9865.

I also tried other way to solve the problem more generally by generating a temporarily merged GEP everytime when analyzing a GEP and removing it after the analysis, but it failed. A lot of existing analysis requires GEP to be a valid inst inserted in the function. We need to insert the temporarily combined GEP into the original BB, do the analysis, then delete it -- making a dangling GEP insn just for the analysis doesn't work. But it makes the IR during the analysis messy this way. Another way is to make the combined GEP kind of meta data just for analysis, but I am not sure how much effort it will cost because the meta data needs to be updated from time to time.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 27216.Jun 5 2015, 12:02 PM
wmi retitled this revision from to Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs.
wmi updated this object.
wmi edited the test plan for this revision. (Show Details)
wmi added reviewers: qcolombet, hfinkel, nadav.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: davidxl, Unknown Object (MLST).
qcolombet edited edge metadata.Jun 17 2015, 11:08 AM

Hi Wei,

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

Thanks,
-Quentin

wmi updated this revision to Diff 27851.Jun 17 2015, 11:29 AM
wmi edited edge metadata.

I assume that since you found that limitation, you have a test case that exposes it. Could you add the test case to the patch please?

I add the testcase which is the same one as test/Transforms/InstCombine/gep-merge1.ll in http://reviews.llvm.org/D9865.

Thanks,
Wei.

mzolotukhin added a subscriber: mzolotukhin.

Hi Wei,

Please find some questions inline. Also, could you please clarify, if this patch depends on D9865 or not?

Thanks,
Michael

lib/Transforms/Vectorize/LoopVectorize.cpp
1738–1739

Could we use GepPtrInst = dyn_cast_or_null(Gep->getPointerOperand() instead of (GepPtr = Gep->getPointerOperand()) && (GepPtrInst = dyn_cast<Instruction>(GepPtr)?

1740

I don't understand why it's correct. Could you please clarify the logic behind it?

Originally the condition was true when the pointer operand was an induction variable. Now it can be true for an arbitrary non-invariant expression that happen to have a specific gep-structure.

test/Transforms/LoopVectorize/pr23580.ll
47–69

Why do we need such complicated loop body, if we're basically only interested in gep+gep+load? Also, the control flow in this test is strange, and I'm not sure if it's necessary for the purpose of the test.

Could we simplify it?

wmi added a comment.Jun 17 2015, 4:04 PM

Michael, Thanks for the review.

could you please clarify, if this patch depends on D9865 or not?

The patch doesn't depend on D9865. The motivation is to enhance the analysis independently, .i.e, even when gep related IR is not in an expected shape, the analysis can still be valid.

lib/Transforms/Vectorize/LoopVectorize.cpp
1738–1739

Yes, it is better.

1740

Originally there are two cases where a load/store is consecutive:
case1. The pointer operand of gep is a phi and it is an induction variable.
case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0.

The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep.

For both case1 and case3, the pointer operand of the original gep has const stride so it is loop variant. For case2, the pointer operand of the original gep is loop invariant. That is why case3 can reuse the same logic as case1 in InnerLoopVectorizer::vectorizeMemoryInstruction.

test/Transforms/LoopVectorize/pr23580.ll
47–69

I will simplify the test.

mzolotukhin added inline comments.Jun 17 2015, 4:51 PM
test/Transforms/LoopVectorize/pr23580.ll
2

Please also use only needed passes (-loop-vectorize + maybe something else) instead of '-O2'.

wmi updated this revision to Diff 27898.Jun 17 2015, 5:01 PM

I made a fix to use dyn_cast_or_null and simplified the test. PTAL.

Thanks for addressing my comments.

Am I getting it right, that in the test we want to be able to tell that the following two gep-expressions result in the same address?

%arrayidx16 = getelementptr inbounds %struct.B, %struct.B* %add.ptr, i64 %idxprom15
%ival = getelementptr inbounds %struct.B, %struct.B* %arrayidx16, i32 0, i32 0

If that's so, can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

test/Transforms/LoopVectorize/pr23580.ll
8–9

This looks unused.

15

This looks unused.

mzolotukhin added inline comments.Jun 17 2015, 6:02 PM
test/Transforms/LoopVectorize/pr23580.ll
3

Can we just run -loop-rotate manually on the test and use the output as the new test (we won't need to run -loop-rotate there)?
And why do we need -instcombine? Could we just check vectorizer's output?

wmi added inline comments.Jun 17 2015, 11:57 PM
test/Transforms/LoopVectorize/pr23580.ll
3

I fix it and use the IR after -loop-rotate. I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.

8–9

Fixed.

15

Fixed. Thanks for catching it.

wmi updated this revision to Diff 27919.Jun 17 2015, 11:59 PM

can't we just use SCEV for checking it? It would be more general, than checking for all operands being 0 etc.

I try that and it seems works. The code looks simpler this way. I havn't done much test. llvm unit test is ok. I will do more testing for it.

wmi updated this revision to Diff 27945.Jun 18 2015, 10:36 AM

The patch using SCEV caused a regression in MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt. The fix is: when there is a symbolic stride, don't return 0 but try the original LAA logic.

Hi Wei,

Thanks for your work! Please find more comments below:

Originally there are two cases where a load/store is consecutive:
case1. The pointer operand of gep is a phi and it is an induction variable.
case2. The pointer operand is invariant, only one index operand of the gep is a loop induction variable and all the other index operands on the right hand side of the variant index operand are all 0.

The one more case (case 3) added in the patch is when the pointer operand of gep (named as gep_a) is another gep (named as gep_b). For such load/store to be consecutive, all the index operands of gep_a are all 0 , and gep_b should be case 1 or 2 or another recurisive gep.

These all are details that should be covered by SCEV. That is, once you use SCEV for such analysis, you no longer need to bother about whether the pointer operand is PHI, GEP, or something else. And, you don't need to specifically handle the cases like %gep2 = getelementptr %gep1, i64 0, i64 0 since in terms of SCEV they should give you the same SCEV expression.

Thus, I expect that this patch should make a lot of code in isConsecutivePtr redundant - probably the only code we need there is the one you are about to add. For instance, I suspect that we won't need these checks:

// We can emit wide load/stores only if the last non-zero index is the
// induction variable.
...

...if we use SCEV properly.

I keep -instcombine because it makes the generated IR a lot simpler and may be helpful for checking the output.

I'm not convinced here. Please take a look at, for instance, test/Transforms/LoopVectorize/X86/powof2div.ll. In order to figure out whether the vectorization takes place, we need to just look for a vector type (like <4 x i32>) in the output. -inst-combine is unnecessary for this. Also, I'm pretty sure that the test can and should be reduced further - we don't need so many basic blocks and instructions to test this feature (again, you could take a look at powof2div.ll as an example).

lib/Transforms/Vectorize/LoopVectorize.cpp
1621

Please use dyn_cast_or_null here. It'll be much more compact.

1622–1624

I'd rewrite this as

if (auto C = dyn_cast_or_null<SCEVConstant>(PtrAddRec->getStepRecurrence(*SE))) {
test/Transforms/LoopVectorize/pr23580.ll
28

Why do we need this call? If we just need some external variable, I'd rather replace it with function argument. That'll hopefully allow us to reduce the test even further.

wmi added a comment.Jun 19 2015, 12:04 PM

Thanks for the comments. phabricator server seems down, so I post the
updated patch directly in attachment.

wmi updated this revision to Diff 28132.Jun 22 2015, 11:34 AM

repaste the patch using SCEV in phabricator.

One problem I noticed about using SCEV to check consecutiveness is that it may add a new case that: both pointer operand of gep and one operand of gep are variant. For this case, InnerLoopVectorizer::vectorizeMemoryInstruction may generate incorrect code. Previously, isConsecutive only return true for the case either pointer operand of gep is invariant, or all the other operands of gep are invariant. If we simply check SCEV in isConsecutive, It is like we move the complexity from isConsecutive to vectorizeMemoryInstruction.

hfinkel edited edge metadata.Jun 22 2015, 4:20 PM

Please make sure to upload patches with full context.

lib/Transforms/Vectorize/LoopVectorize.cpp
1562

This can just be:

int64_t StepVal = C->getValue()->getSExtValue();
1567

Don't put an 'else' if the 'if' unconditionally returns.

http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

wmi updated this revision to Diff 28174.Jun 22 2015, 5:03 PM
wmi edited edge metadata.

I uploaded the patch with full context (Sorry) and addressed Hal's comments.

hfinkel added inline comments.Jun 22 2015, 5:15 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1563

Does this give you what you want if you have nested loops? You only want that part of the recurrence that refers to the inner loop, right?

wmi added inline comments.Jun 22 2015, 6:13 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1563

Yes, I want the recurrence refering to the inner loop. I just tried a small testcase and found the Loop inside SCEVAddRecExpr may refer to outside loop if the SCEVAddRecExpr is invariant for the inside loop. I will check whether the loop of SCEVAddRecExpr is identical with the loop in LoopVectorizationLegality.