Hi All,
This patch improves the cost model in SLPVectorizer for division by power of 2. Currently the below code is not vectorized by clang in O3. Gcc though is able to vectorizes this.
void f(int* restrict a,int *restrict b,int *restrict c ) { a[0] = (b[0]+c[0])/2; a[1] = (b[1]+c[1])/2; a[2] = (b[2]+c[2])/2; a[3] = (b[3]+c[3])/2; }
The problem is SLPVectorizer estimates the cost of vector divide as too high to be profitable and gives up on vectorization.
But in cases such as the above were we are dividing by power of 2 the cost infact is much less as backend converts them into instruction such as psrad/psraw on X86 targets.
The current patch updates the cost model when we divide by power of 2 to enable vectorization in such cases.
Please let me know your i/p's on the same.
Thanks and Regards
Karthik Bhat
I wonder if this is the right design:
Having some optional feature bits seems better. I imagine will want more special properties like this in the future.