This patch ensures that we actually scalarize instructions marked scalar after vectorization. Previously, such instructions may have been vectorized.
Details
Diff Detail
Event Timeline
Would it make sense to commit the different changes here separately?
That is:
Patch 1: collectLoopUniforms()
Patch 2: The parts that only emit instructions for the low lane of uniforms.
Patch 3: vectorizeBlockInLoop()
If I'm not mistaken both (1) and (2) should be good independently. (1) because it will make other things that care about uniformity slightly more correct, and (2) because it may help some instructions we already scalarize today (although I'm not sure we actually have such cases now. So maybe 2+3 should go in together).
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2263 | Why "scalar"? | |
2265 | Why not return (!OrigLoop->contains(I) || !Legal->isScalarAfterVectorization(I) || Legal->isUniformAfterVectorization(I)) ? | |
2278 | Why do we need this? | |
4576 | This scares me a bit. :-) Regardless, there's (at least?) one more special case I see below - DbgInfoIntrinsic. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4576 |
Which PR? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4576 | My bad, not a PR, meant "on the other review". :-) |
Right, that's what I was thinking. I wanted to included everything here in the review (at least initially) for context. We can split it up exactly as you suggested. Regarding an independent patch 2, it should be NFC post-instcombine. Pre-instcombine, we would at least no longer generate unnecessary steps for uniform IVs.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2263 | This probably makes more sense with the response below. We're checking to see if all the scalar users of the IV we are scalarizing are also uniform. We can scalarize an IV if it has both vector and scalar users (ending up with two versions). | |
2265 | That should work! | |
2278 | This is for the cases in which we both want a vector and a scalar version of an IV. We scalarize an IV that's not marked scalar after vectorization when it has at least one scalar user. For such an IV, if all it's scalar users are also uniform, we will only ever need the first lane of the scalar IV. | |
4576 | Right! Thanks for pointing out the debug intrinsics. Perhaps it would make better sense to handle all of the cases individually, instead all together at the top? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2263 | Oh, ok, I understand now - this checks "isScalar() implies isUniform()" by computing "!isScalar() || isUniform()". | |
2278 | Ah, got it, thanks! Sentences like "all its scalar users are uniform" still confuse me, because of the terminology we've adopted. | |
4576 | Hmpf. I think I still prefer handling this at the top - duplicating this check into each case seems silly. |
Thanks for the quick feedback, Michael! I'll start addressing your comments by first pulling out the changes to collectLoopUniforms in a separate review, and adding some specific tests for it.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2278 | I agree the terminology is a bit confusing. At the very least, I'll add a more detailed comment here. | |
4576 | Agreed. |
I pulled out the changes to collectLoopUniforms (Patch 1) in D24271 and rebased. I also addressed Michael's comments on this existing patch. Next, I'll pull out the changes to uniform instruction emission (Patch 2) in a separate review.
I pulled out the changes to uniform instruction emission (Patch 2) in D24275 and rebased.
Why "scalar"?