For non-uniform instructions marked for scalarization, we should update VectorTy when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step.
Diff Detail
- Build Status
Buildable 6697 Build 6697: arc lint + arc unit
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7159 | isVectorTy() implies !isVoidTy(), so checking the latter becomes redundant. Does the VF > 1 check also become redundant? |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7159 | Right, I thought there may be some redundancy here. So we should replace "!VectorTy->isVoidTy()" with "VectorTy->isVectorTy()". Checking that the type is actually a vector is what we really want. I'm not quite sure about "VF > 1". We only care about scalarization if we are vectorizing. I think it might be needed for the "TTI.getNumberOfParts(VectorTy) < VF" check, which wouldn't make sense if VF == 1. |
Addressed Ayal's comment.
- Removed !isVoidTy() check since this is implied by isVectorTy().
Thanks, Ayal! The change is mostly noise on the hardware (AArch64 Kryo/Falkor) and benchmarks (spec, test-suite) I tested.
For some context, I had been experimenting with a patch that changes the cost model's "tie-break" rule when I noticed the issue. Right now we select the smallest VF with the best score (so scalar wins out if there's no benefit from vectorization). But I was testing a patch that changed this. If any VF > 1 was better than VF == 1, it selected the largest VF having the best score. So we would prefer wider vectors in a tie if any vectorization was better than scalar.
For some loops, this interacted badly with the MaxVF determination based on types. Currently, if we know a load or store will be scalarized, we don't consider it when collecting type sizes, which determine the MaxVF. So with the above experimental patch, I was seeing loops where nothing was vectorized being effectively unrolled by the MaxVF, because VectorTy wasn't being set properly.
Sorry for the long explanation!
isVectorTy() implies !isVoidTy(), so checking the latter becomes redundant. Does the VF > 1 check also become redundant?