Scalarization can expose optimization opportunities for the individual
elements of a vector, and can therefore be beneficial on targets like
GPUs that tend to operate on scalars anyway.
However, notably with 16-bit operations it is often beneficial to keep
<2 x i16 / half> vectors around since there are packed instructions for
those.
Refactor the code to operate on "fragments" of split vectors. The
fragments are usually scalars, but may themselves be smaller vectors
when the scalarizer-min-bits option is used. If the split is uneven,
the last fragment is a shorter remainder.
This is almost NFC when the new option is unused, but it happens to
clean up some code in the fully scalarized case as well.
I think I misunderstood this at first.
My interpretation was that if setting this to 16 it would not scalarize vectors with element sizes up to 16 bits.
So it wouldn't scalarize <16 x i8> or <4 x i16> while it would scalarize <2 x i24> and <2 x i32>.
But this size is not mapping to the element sizes, right?
We could not get some kind of vector split/re-partition from <16 x i8> to <2 x i8>. So it is not really scalarizing as the value still will be a vector.
Not sure exactly how to rephrase it to make that clearer (considering that I misunderstood this to be an element size).
Maybe I got fooled by the slogan for this patch "limit scalarization for small element types". I actually thought that I would see something that prevented scalarization from happening when the element size was smaller than a threshold. But what the patch actually seem to do is to prevent scalarization to happen for large vector factors (it just splits it up into using smaller vectors instead of scalarizing).
So everywhere in this pass when it says "scalarize" I guess one should read it as "split" (or "resize" or something similar). For example code comments saying "Perform actual scalarization" could be followed by code that emit vector operations.