I would like to extend the SLP vectorizer to support overlapping vector
loads. This allows vectorizing cases where we operate on overlapping
vectors that can be loaded efficiently
The simplest C example is something like the snippet below, where we add
<s[0], s[1], s[2], s[3]> and <s[1], s[2], s[3], s[4]>. Those vectors can
be directly loaded from &s[0] and &s[1]. The problem is that currently
overlapping bundles are not allowed, which leads to gathering the second
vector, which is not profitable on AArch64.
void test(int *s,int* __restrict__ d) { for (int x=0;x<4;x++,s++) { d[x] = s[0] + s[1]; } }
The invariant that bundles should not overlap seems to be relied on and
encoded in multiple places. In this patch, I mostly tried to disable
various checks and assertions. It effectively allows overlapping
bundles, iff they first entry in Scalars is unique.
This clearly is not a proper solution, but I am hoping that sharing the
patch can be the start of a discussion on how to properly address the
limitations. It would be great if you could share your thoughts.