I simplified GEP cloning in vectorizeMemoryInstruction().
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Elena,
This looks good to me. Thanks! At first I was a bit surprised by this simplification, but it makes sense. If we have a consecutive GEP for a memory instruction that's not scalarized, we only need to compute it's value at VF=0. For the unroll iterations, we compute the offset from this GEP (at UF=0) later on.
../lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2794 | Did you mean to add a blank line here? |
Thanks Elena, this looks like a significant improvement.
Just to make sure I understood correctly, the reason this works is that for a consecutive GEP, each operand must be either uniform or consecutive (this holds for both ptr and non-ptr IVs), so getScalarValue(0, 0) always does the right thing?
The only thing I'm a bit concerned about is removing all the asserts. Perhaps it's worth adding a loop under #ifndef NDEBUG, that verifies this? But if you/Ayal/Matt don't think it's useful, I won't insist.
Other than that, LGTM.
Right, for a consecutive GEP, only the induction operand (as in getGEPInductionOperand) should be loop-varying. The other operands should be uniform.
The only thing I'm a bit concerned about is removing all the asserts. Perhaps it's worth adding a loop under #ifndef NDEBUG, that verifies this? But if you/Ayal/Matt don't think it's useful, I won't insist.
Other than that, LGTM.
Yeah, I agree about the asserts. We check these conditions in isConsecutivePtr. But just to mirror what what you said above, I think we should be able to verify them with following assert in the existing loop:
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i))
What do you think?
This covers the "else" branch of the original if, but I think it doesn't cover the pointer IV case (since getGEPInductionOperand() doesn't quite do what its name implies. :-\ ).
It should probably be something like:
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
Regardless, I thought it would be clearer for it to be a separate NDEBUG loop, but I'm totally ok with it being in the existing loop.
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
I'm not sure that this is the right check. My problem in https://reviews.llvm.org/D20789 was in these assertions that started to fail after more deep GEP analysis.
Re-checking GEP means checking correctness of isConsecutivePtr(Ptr).
In the new version (20789) isConsecutivePtr() uses PSE, which includes versioning and picks up more GEPs than we do now.
And we can just rely on correctness of this function.
GEP cloning does not depend on index role inside the loop and getScalarValue() covers all of them.
I want D20789 to go back in (there's a real performance issue I have that's affected by it :) ), but this scares me.
In theory, the assert I wrote above (or something similar) should not be failing. If it's failing, we are either missing a case, or there's a real subtle bug somewhere. I'd really like to know which one it is. We can't just say "we don't understand why this assert is failing, so let's not have it there". Once we do understand it, and it turns out that, say, the real assertion is very complex and requires replicating the logic of isConsecutivePtr() - I'll be perfectly OK with not adding it.
To quote G.K. Chesterton:
In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.
Regarding the "uniform" assertion, I had been assuming this is equivalent to Legal->isUniform(), but I took a look at the implementation of isUniform in LAI. Over there, we have something like:
PSE.getSE()->isLoopInvariant(PSE.getSE()->getSCEV(Value), Loop);
Whereas here, we have something like:
PSE.getSE()->isLoopInvariant(PSE.getSCEV(Value), Loop);
I'm not familiar enough with PSE to know what, if any, difference this might make.
Yes, they are different. The second form can assume the predicates from PSE and produce a SCEV that's more friendly to optimizations.
Let's look at this GEP:
getelementptr inbounds i8, i8* %ptr, i64 %ind
May %ptr be not-loop-invariant and not-induction-variable?
It may be uniform, right?
In this case
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
should work, but
assert(i == getGEPInductionOperand(Gep) || PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)) || Legal->isInductionVariable(Gep->getOperand(i)))
will not.
I have a more high-level question. Is it safe to allow loop-variant uniform values here for the non-induction operands?
Uniformity is a property of how the value is used and it does not state anything about loop-variance. I would be more comfortable formulating this in terms of a single induction and loop-invariant operands.
I'm not sure I understand what you're saying.
assert(i == getGEPInductionOperand(Gep) || Legal->isUniform(Gep->getOperand(i) || Legal->isInductionVariable(Gep->getOperand(i)))
Is exactly the check I suggested.
If you're saying it should work in all cases, then what's the problem? Or are you saying there are other cases where it's the wrong check? Or did I completely misunderstand you?
Ugh. Ignore my last comment, this is wrong.
We have *almost* fixed what "uniform" means in the vectorizer (thanks again, Matt), but not quite.
(a) LVL::isUniform() and LAI::isUniform() are basically equivalent to isLoopInvariant(). They imply the value is equal on all lanes.
(b) LVL::isUniformAfterVectorization() refers to what @anemet is talking about - it's a property of how the value is used, specifically that we only use the lane 0 value.
So:
@anemet: this is already phrased in terms of loop invariance.
@delena: I no longer agree with your example.
If we have:
%foo = getelementptr inbounds i8, i8* %ptr, i64 %ind
And %foo is consecutive, then one of the following holds:
- %ptr is loop-invariant (uniform), and %ind is consecutive.
- %ptr itself is consecutive, and %ind is loop-invariant (uniform).
So, I think I understand why the assert fails now.
For case 1, where %ind is consecutive, the assert will pass, because %ind is the "induction operand" (that is, it's in the "right place" in the GEP).
For case 2, where %ptr is consecutive, the assert may fail, because %ptr has to be consecutive, but doesn't really have to be the IV.
This also possibly explains why the assert started failing after you improved consecutivity detection. If we have something like this:
%p = getelementptr inbounds i8, i8* %iv, i64 0 %q = getelementptr inbounds i8, i8* %p, i64 1
Then as long as only %p is considered consecutive, everything is fine, but if %q is also considered consecutive, the assert should fail.
Does this make sense?
We should really remove this guy. It's totally confusing now.
(b) LVL::isUniformAfterVectorization() refers to what @anemet is talking about - it's a property of how the value is used, specifically that we only use the lane 0 value.
So:
@anemet: this is already phrased in terms of loop invariance.
Sure, what I meant was that at some point in this discussion, we started formulating the assert in terms of uniformity vs. loop-invariance which I found confusing.
@delena: I no longer agree with your example.
If we have:
%foo = getelementptr inbounds i8, i8* %ptr, i64 %indAnd %foo is consecutive, then one of the following holds:
- %ptr is loop-invariant (uniform), and %ind is consecutive.
- %ptr itself is consecutive, and %ind is loop-invariant (uniform).
So, I think I understand why the assert fails now.
For case 1, where %ind is consecutive, the assert will pass, because %ind is the "induction operand" (that is, it's in the "right place" in the GEP).
For case 2, where %ptr is consecutive, the assert may fail, because %ptr has to be consecutive, but doesn't really have to be the IV.This also possibly explains why the assert started failing after you improved consecutivity detection. If we have something like this:
%p = getelementptr inbounds i8, i8* %iv, i64 0 %q = getelementptr inbounds i8, i8* %p, i64 1Then as long as only %p is considered consecutive, everything is fine, but if %q is also considered consecutive, the assert should fail.
Does this make sense?
Makes sense to me. Sounds like the assert should state that one of the operands needs to be a loop-variant uniform while the others all loop-invariant. Agreed?
I would still like @delena to confirm that this is indeed what was happening!
Yes, this makes sense to me. Thanks, Michael. Another nuance is that isLoopInvariant here means that the value is the same on every iteration. This is different from Loop::isLoopInvariant, which implies an instruction is not contained in the loop body.
Does this make sense?
Makes sense to me as well.
The "loop-variant uniform" part sounds strange to me, since IV's don't have to be uniform. Don't we mean: one operand "is an induction variable or consecutive pointer" and all other operands are loop-invariant?
I would still like @delena to confirm that this is indeed what was happening!
I don't have a reproducer. Otherwise I was adding a test case and closing the issue.
I can put an assert that checks that the GEP has only one loop-variant operand. If it fails after the second commit, I'll fetch the case, at least.
../lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2790–2791 | @delena, to be a bit more specific, the comment I would except to see here is something like:
@mssimpso, please check that I am not oversimplifying/missing some subtleties here again. |
../lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2790–2791 | Sounds mostly good to me! For "GEP pointer or induction variable index", I may be taking things a bit too far than is useful, but I think the consecutive pointer operand (if non-pointer IV) doesn't technically have to be a GEP, since we still look through bitcasts (with getGEPInstruction) in isConsecutivePtr. What about something like: "One operand is either a non-pointer induction variable or consecutive pointer (pointer induction variables are also consecutive), and the other operands are loop-invariant." I think it would also be useful to comment on the fact that this GEP is created for the first unroll iteration and that the GEPs for the rest of the unroll iterations are computed below as an offset from this GEP. |
../lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2790–2791 |
Are you talking about the pointer operand of this 'Gep' (as the variable in the code)? I didn't mean to suggest that that is produced by a GEP, I was just referring to the pointer operand of the variable 'Gep'.
Agreed and I think that is an implication that this is a uniform use of the GEP value (that is why I mentioned it in my version of the comment). Uniform only states that for the *vectorized* value we can use the lane 0 scalar value. |
../lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2790–2791 |
Ah, we're on the same page then. Thanks, Adam! |
Added a debug mode check for consecutive GEP, as we talked in review.
Added comments around a new GEP and about the safe usage of getScalarValue().
@delena, to be a bit more specific, the comment I would except to see here is something like:
@mssimpso, please check that I am not oversimplifying/missing some subtleties here again.