Considered a loop with uniform load:
float inc = 0.5;
void foo(float *A, unsigned N) {
for (int i=0;i<N;i++){ A[i] += inc; // Uniform load of inc }
}
If the "uniform load" is not hoisted before vectorization, the cost of the uniform load is "scalar load + broadcast".
It is not correctly calculated in the current version and a huge cost for one splat vector prevents loop vectorization.
This can just be + instead of +=, otherwise, this LGTM.