There is a stride check in ARMTargetTransformInfo that decides whether a loop cannot be tail predicated by checking whether the strides in it are different from 1. To enable tail predication for loops containing gather/scatters, this patch takes a more detailed approach if the EnableMaskedGatherScatters flag is true, and also adds some more detailed debug messages.
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1634 | else ifs? | |
1639 | I think this would be easier to read if this was organised in stride order and separating the gather/scatter from the consecutive accesses. So when !EnableMaskedGatherScatter, getPtrStride should only ever be 1, right? So I don't think we have to track the 'NextStride' business. |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1634 | A load or a store can be vld2 or a vst2, neither of which can be tail folded unfortunately. |
Widened the range of allowed strides to also include loop invariant expressions.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1634 | Good point. | |
1634 | Right, I forgot about the vstr2's. In that case we can never allow a stride of 2 here, as the only instructions that get us here are loads and stores. | |
1639 | Good point. We may as well get rid of that. |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1634–1639 | Can you change this condition to something like if (NextStride == -1 || (NextStride == 2 && MVEMaxSupportedInterleaveFactor >= 2) || (NextStride == 4 && MVEMaxSupportedInterleaveFactor >= 4)) That should hopefully make it more futureproof, and specifically rule out reverse loads even if the vectorizer changes to support them. | |
1649–1650 | Perhaps if (auto AR = dyn_cast<SCEVAddRecExpr>(PtrScev)) { |
Thanks. Looks good to me.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
31 | Do you need to include this? |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
31 | Oops, no. Will remove for commit. |
Do you need to include this?