This is an archive of the discontinued LLVM Phabricator instance.

[Loop Vectorizer] Simplified GEP cloning. NFC.
ClosedPublic

Authored by delena on Sep 14 2016, 6:26 AM.

Download Raw Diff

Details

Reviewers

Ayal
mkuper

Commits

rGa1a0e7ddbec0: [Loop vectorizer] Simplified GEP cloning. NFC.
rL281851: [Loop vectorizer] Simplified GEP cloning. NFC.

Summary

I simplified GEP cloning in vectorizeMemoryInstruction().

Diff Detail

Repository: rL LLVM

Event Timeline

delena updated this revision to Diff 71341.Sep 14 2016, 6:26 AM

delena retitled this revision from to [Loop Vectorizer] Simplified GEP cloning. NFC..

delena updated this object.

delena added reviewers: mkuper, Ayal.

delena set the repository for this revision to rL LLVM.

delena added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptSep 14 2016, 6:26 AM

Hi Elena,

This looks good to me. Thanks! At first I was a bit surprised by this simplification, but it makes sense. If we have a consecutive GEP for a memory instruction that's not scalarized, we only need to compute it's value at VF=0. For the unroll iterations, we compute the offset from this GEP (at UF=0) later on.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2794 ↗	(On Diff #71341)	Did you mean to add a blank line here?

Thanks Elena, this looks like a significant improvement.

Just to make sure I understood correctly, the reason this works is that for a consecutive GEP, each operand must be either uniform or consecutive (this holds for both ptr and non-ptr IVs), so getScalarValue(0, 0) always does the right thing?

The only thing I'm a bit concerned about is removing all the asserts. Perhaps it's worth adding a loop under #ifndef NDEBUG, that verifies this? But if you/Ayal/Matt don't think it's useful, I won't insist.
Other than that, LGTM.

This revision is now accepted and ready to land.Sep 14 2016, 10:42 AM

In D24557#542757, @mkuper wrote:

Just to make sure I understood correctly, the reason this works is that for a consecutive GEP, each operand must be either uniform or consecutive (this holds for both ptr and non-ptr IVs), so getScalarValue(0, 0) always does the right thing?

Right, for a consecutive GEP, only the induction operand (as in getGEPInductionOperand) should be loop-varying. The other operands should be uniform.

The only thing I'm a bit concerned about is removing all the asserts. Perhaps it's worth adding a loop under #ifndef NDEBUG, that verifies this? But if you/Ayal/Matt don't think it's useful, I won't insist.
Other than that, LGTM.

Yeah, I agree about the asserts. We check these conditions in isConsecutivePtr. But just to mirror what what you said above, I think we should be able to verify them with following assert in the existing loop:

assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i))

What do you think?

In D24557#542843, @mssimpso wrote:
Yeah, I agree about the asserts. We check these conditions in isConsecutivePtr. But just to mirror what what you said above, I think we should be able to verify them with following assert in the existing loop:
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i))
What do you think?

This covers the "else" branch of the original if, but I think it doesn't cover the pointer IV case (since getGEPInductionOperand() doesn't quite do what its name implies. :-\ ).
It should probably be something like:

assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))

Regardless, I thought it would be clearer for it to be a separate NDEBUG loop, but I'm totally ok with it being in the existing loop.

In D24557#542867, @mkuper wrote:

This covers the "else" branch of the original if, but I think it doesn't cover the pointer IV case (since getGEPInductionOperand() doesn't quite do what its name implies. :-\ ).

Ah, right! Thanks, Michael.

assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))

I'm not sure that this is the right check. My problem in https://reviews.llvm.org/D20789 was in these assertions that started to fail after more deep GEP analysis.
Re-checking GEP means checking correctness of isConsecutivePtr(Ptr).
In the new version (20789) isConsecutivePtr() uses PSE, which includes versioning and picks up more GEPs than we do now.
And we can just rely on correctness of this function.
GEP cloning does not depend on index role inside the loop and getScalarValue() covers all of them.

In D24557#542905, @delena wrote:

assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))

I'm not sure that this is the right check. My problem in https://reviews.llvm.org/D20789 was in these assertions that started to fail after more deep GEP analysis.
Re-checking GEP means checking correctness of isConsecutivePtr(Ptr).
In the new version (20789) isConsecutivePtr() uses PSE, which includes versioning and picks up more GEPs than we do now.
And we can just rely on correctness of this function.
GEP cloning does not depend on index role inside the loop and getScalarValue() covers all of them.

I want D20789 to go back in (there's a real performance issue I have that's affected by it :) ), but this scares me.
In theory, the assert I wrote above (or something similar) should not be failing. If it's failing, we are either missing a case, or there's a real subtle bug somewhere. I'd really like to know which one it is. We can't just say "we don't understand why this assert is failing, so let's not have it there". Once we do understand it, and it turns out that, say, the real assertion is very complex and requires replicating the logic of isConsecutivePtr() - I'll be perfectly OK with not adding it.

To quote G.K. Chesterton:

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.

I would also like to see a comment in the code. Something along the lines of @mkuper or @mssimpso's explanation as to why getScalarValue(0, 0) works for all GEP operands.

Regarding the "uniform" assertion, I had been assuming this is equivalent to Legal->isUniform(), but I took a look at the implementation of isUniform in LAI. Over there, we have something like:

PSE.getSE()->isLoopInvariant(PSE.getSE()->getSCEV(Value), Loop);

Whereas here, we have something like:

PSE.getSE()->isLoopInvariant(PSE.getSCEV(Value), Loop);

I'm not familiar enough with PSE to know what, if any, difference this might make.

In D24557#543702, @mssimpso wrote:
Regarding the "uniform" assertion, I had been assuming this is equivalent to Legal->isUniform(), but I took a look at the implementation of isUniform in LAI. Over there, we have something like:
PSE.getSE()->isLoopInvariant(PSE.getSE()->getSCEV(Value), Loop);
Whereas here, we have something like:
PSE.getSE()->isLoopInvariant(PSE.getSCEV(Value), Loop);
I'm not familiar enough with PSE to know what, if any, difference this might make.

Yes, they are different. The second form can assume the predicates from PSE and produce a SCEV that's more friendly to optimizations.

Let's look at this GEP:

getelementptr inbounds i8, i8* %ptr, i64 %ind

May %ptr be not-loop-invariant and not-induction-variable?
It may be uniform, right?
In this case
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
should work, but
assert(i == getGEPInductionOperand(Gep) || PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)) || Legal->isInductionVariable(Gep->getOperand(i)))
will not.

I have a more high-level question. Is it safe to allow loop-variant uniform values here for the non-induction operands?

Uniformity is a property of how the value is used and it does not state anything about loop-variance. I would be more comfortable formulating this in terms of a single induction and loop-invariant operands.

In D24557#543796, @delena wrote:

Let's look at this GEP:

getelementptr inbounds i8, i8* %ptr, i64 %ind

May %ptr be not-loop-invariant and not-induction-variable?
It may be uniform, right?
In this case
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
should work, but
assert(i == getGEPInductionOperand(Gep) || PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)) || Legal->isInductionVariable(Gep->getOperand(i)))
will not.

I'm not sure I understand what you're saying.

assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))

Is exactly the check I suggested.
If you're saying it should work in all cases, then what's the problem? Or are you saying there are other cases where it's the wrong check? Or did I completely misunderstand you?

Ugh. Ignore my last comment, this is wrong.

We have *almost* fixed what "uniform" means in the vectorizer (thanks again, Matt), but not quite.

(a) LVL::isUniform() and LAI::isUniform() are basically equivalent to isLoopInvariant(). They imply the value is equal on all lanes.
(b) LVL::isUniformAfterVectorization() refers to what @anemet is talking about - it's a property of how the value is used, specifically that we only use the lane 0 value.

So:
@anemet: this is already phrased in terms of loop invariance.
@delena: I no longer agree with your example.

If we have:

%foo = getelementptr inbounds i8, i8* %ptr, i64 %ind

And %foo is consecutive, then one of the following holds:

%ptr is loop-invariant (uniform), and %ind is consecutive.
%ptr itself is consecutive, and %ind is loop-invariant (uniform).

So, I think I understand why the assert fails now.
For case 1, where %ind is consecutive, the assert will pass, because %ind is the "induction operand" (that is, it's in the "right place" in the GEP).
For case 2, where %ptr is consecutive, the assert may fail, because %ptr has to be consecutive, but doesn't really have to be the IV.

This also possibly explains why the assert started failing after you improved consecutivity detection. If we have something like this:

%p = getelementptr inbounds i8, i8* %iv, i64 0
%q = getelementptr inbounds i8, i8* %p, i64 1

Then as long as only %p is considered consecutive, everything is fine, but if %q is also considered consecutive, the assert should fail.

Does this make sense?

In D24557#543837, @mkuper wrote:

Ugh. Ignore my last comment, this is wrong.

We have *almost* fixed what "uniform" means in the vectorizer (thanks again, Matt), but not quite.

(a) LVL::isUniform() and LAI::isUniform() are basically equivalent to isLoopInvariant(). They imply the value is equal on all lanes.

We should really remove this guy. It's totally confusing now.

(b) LVL::isUniformAfterVectorization() refers to what @anemet is talking about - it's a property of how the value is used, specifically that we only use the lane 0 value.

So:
@anemet: this is already phrased in terms of loop invariance.

Sure, what I meant was that at some point in this discussion, we started formulating the assert in terms of uniformity vs. loop-invariance which I found confusing.

@delena: I no longer agree with your example.

If we have:
%foo = getelementptr inbounds i8, i8* %ptr, i64 %ind
And %foo is consecutive, then one of the following holds:

%ptr is loop-invariant (uniform), and %ind is consecutive.

%ptr itself is consecutive, and %ind is loop-invariant (uniform).

So, I think I understand why the assert fails now.
For case 1, where %ind is consecutive, the assert will pass, because %ind is the "induction operand" (that is, it's in the "right place" in the GEP).
For case 2, where %ptr is consecutive, the assert may fail, because %ptr has to be consecutive, but doesn't really have to be the IV.

This also possibly explains why the assert started failing after you improved consecutivity detection. If we have something like this:
%p = getelementptr inbounds i8, i8* %iv, i64 0
%q = getelementptr inbounds i8, i8* %p, i64 1
Then as long as only %p is considered consecutive, everything is fine, but if %q is also considered consecutive, the assert should fail.

Does this make sense?

Makes sense to me. Sounds like the assert should state that one of the operands needs to be a loop-variant uniform while the others all loop-invariant. Agreed?

I would still like @delena to confirm that this is indeed what was happening!

In D24557#543837, @mkuper wrote:

(a) LVL::isUniform() and LAI::isUniform() are basically equivalent to isLoopInvariant(). They imply the value is equal on all lanes.
(b) LVL::isUniformAfterVectorization() refers to what @anemet is talking about - it's a property of how the value is used, specifically that we only use the lane 0 value.

Yes, this makes sense to me. Thanks, Michael. Another nuance is that isLoopInvariant here means that the value is the same on every iteration. This is different from Loop::isLoopInvariant, which implies an instruction is not contained in the loop body.

Does this make sense?

Makes sense to me as well.

In D24557#543857, @anemet wrote:

Sounds like the assert should state that one of the operands needs to be a loop-variant uniform while the others all loop-invariant. Agreed?

The "loop-variant uniform" part sounds strange to me, since IV's don't have to be uniform. Don't we mean: one operand "is an induction variable or consecutive pointer" and all other operands are loop-invariant?

In D24557#543887, @mssimpso wrote:

In D24557#543857, @anemet wrote:

Sounds like the assert should state that one of the operands needs to be a loop-variant uniform while the others all loop-invariant. Agreed?

The "loop-variant uniform" part sounds strange to me, since IV's don't have to be uniform. Don't we mean: one operand "is an induction variable or consecutive pointer" and all other operands are loop-invariant?

WFM.

I would still like @delena to confirm that this is indeed what was happening!

I don't have a reproducer. Otherwise I was adding a test case and closing the issue.

I can put an assert that checks that the GEP has only one loop-variant operand. If it fails after the second commit, I'll fetch the case, at least.

In D24557#543964, @delena wrote:

I would still like @delena to confirm that this is indeed what was happening!

I don't have a reproducer. Otherwise I was adding a test case and closing the issue.

I can put an assert that checks that the GEP has only one loop-variant operand. If it fails after the second commit, I'll fetch the case, at least.

That sounds good to me.

anemet added inline comments.Sep 16 2016, 10:32 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2790–2791 ↗	(On Diff #71341)	@delena, to be a bit more specific, the comment I would except to see here is something like: Let's create the uniform value of this GEP (i.e. lane 0 value). The operands are either loop-invariant or consecutive (GEP pointer or induction-variable index). @mssimpso, please check that I am not oversimplifying/missing some subtleties here again.

mssimpso added inline comments.Sep 16 2016, 11:08 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2790–2791 ↗	(On Diff #71341)	Sounds mostly good to me! For "GEP pointer or induction variable index", I may be taking things a bit too far than is useful, but I think the consecutive pointer operand (if non-pointer IV) doesn't technically have to be a GEP, since we still look through bitcasts (with getGEPInstruction) in isConsecutivePtr. What about something like: "One operand is either a non-pointer induction variable or consecutive pointer (pointer induction variables are also consecutive), and the other operands are loop-invariant." I think it would also be useful to comment on the fact that this GEP is created for the first unroll iteration and that the GEPs for the rest of the unroll iterations are computed below as an offset from this GEP.

anemet added inline comments.Sep 16 2016, 12:21 PM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2790–2791 ↗	(On Diff #71341)	Sounds mostly good to me! For "GEP pointer or induction variable index", I may be taking things a bit too far than is useful, but I think the consecutive pointer operand (if non-pointer IV) doesn't technically have to be a GEP, since we still look through bitcasts (with getGEPInstruction) in isConsecutivePtr. Are you talking about the pointer operand of this 'Gep' (as the variable in the code)? I didn't mean to suggest that that is produced by a GEP, I was just referring to the pointer operand of the variable 'Gep'. "One operand is either a non-pointer induction variable or consecutive pointer (pointer induction variables are also consecutive), and the other operands are loop-invariant." I think it would also be useful to comment on the fact that this GEP is created for the first unroll iteration and that the GEPs for the rest of the unroll iterations are computed below as an offset from this GEP. Agreed and I think that is an implication that this is a uniform use of the GEP value (that is why I mentioned it in my version of the comment). Uniform only states that for the vectorized value we can use the lane 0 scalar value.

mssimpso added inline comments.Sep 16 2016, 12:25 PM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2790–2791 ↗	(On Diff #71341)	Are you talking about the pointer operand of this 'Gep' (as the variable in the code)? I didn't mean to suggest that that is produced by a GEP, I was just referring to the pointer operand of the variable 'Gep'. Ah, we're on the same page then. Thanks, Adam!

Added a debug mode check for consecutive GEP, as we talked in review.
Added comments around a new GEP and about the safe usage of getScalarValue().

Closed by commit rL281851: [Loop vectorizer] Simplified GEP cloning. NFC. (authored by delena). · Explain WhySep 18 2016, 2:31 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

59 lines

Diff 71745

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,777 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
bool CreateGatherScatter =		bool CreateGatherScatter =
!ConsecutiveStride && Legal->isLegalGatherOrScatter(Instr);		!ConsecutiveStride && Legal->isLegalGatherOrScatter(Instr);

VectorParts VectorGep;		VectorParts VectorGep;

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
GetElementPtrInst *Gep = getGEPInstruction(Ptr);		GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (ConsecutiveStride) {		if (ConsecutiveStride) {
if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {		if (Gep) {
setDebugLocFromInst(Builder, Gep);
auto *FirstBasePtr = getScalarValue(Gep->getPointerOperand(), 0, 0);

// Create the new GEP with the new induction variable.
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
Gep2->setOperand(0, FirstBasePtr);
Gep2->setName("gep.indvar.base");
Ptr = Builder.Insert(Gep2);
} else if (Gep) {
setDebugLocFromInst(Builder, Gep);
assert(PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getPointerOperand()),
OrigLoop) &&
"Base ptr must be invariant");
// The last index does not have to be the induction. It can be
// consecutive and be a function of the index. For example A[I+1];
unsigned NumOperands = Gep->getNumOperands();		unsigned NumOperands = Gep->getNumOperands();
unsigned InductionOperand = getGEPInductionOperand(Gep);		#ifndef NDEBUG
// Create the new GEP with the new induction variable.		// The original GEP that identified as a consecutive memory access
		// should have only one loop-variant operand.
		unsigned NumOfLoopVariantOps = 0;
		for (unsigned i = 0; i < NumOperands; ++i)
		if (!PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)),
		OrigLoop))
		NumOfLoopVariantOps++;
		assert(NumOfLoopVariantOps == 1 &&
		"Consecutive GEP should have only one loop-variant operand");
		#endif
GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());		GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
		Gep2->setName("gep.indvar");

for (unsigned i = 0; i < NumOperands; ++i) {		// A new GEP is created for a 0-lane value of the first unroll iteration.
Value *GepOperand = Gep->getOperand(i);		// The GEPs for the rest of the unroll iterations are computed below as an
Instruction *GepOperandInst = dyn_cast<Instruction>(GepOperand);		// offset from this GEP.
		for (unsigned i = 0; i < NumOperands; ++i)
// Update last index or loop invariant instruction anchored in loop.		// We can apply getScalarValue() for all GEP indices. It returns an
if (i == InductionOperand \|\|		// original value for loop-invariant operand and 0-lane for consecutive
(GepOperandInst && OrigLoop->contains(GepOperandInst))) {		// operand.
assert((i == InductionOperand \|\|		Gep2->setOperand(i, getScalarValue(Gep->getOperand(i),
PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),		0, /* First unroll iteration */
OrigLoop)) &&		0 /* 0-lane of the vector */ ));
"Must be last index or loop invariant");		setDebugLocFromInst(Builder, Gep);

Gep2->setOperand(i, getScalarValue(GepOperand, 0, 0));
Gep2->setName("gep.indvar.idx");
}
}
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);

} else { // No GEP		} else { // No GEP
// Use the induction element ptr.		// Use the induction element ptr.
assert(isa<PHINode>(Ptr) && "Invalid induction ptr");		assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
Ptr = getScalarValue(Ptr, 0, 0);		Ptr = getScalarValue(Ptr, 0, 0);
}		}
} else {		} else {
// At this point we should vector version of GEP for Gather or Scatter		// At this point we should vector version of GEP for Gather or Scatter
▲ Show 20 Lines • Show All 4,413 Lines • Show Last 20 Lines