This change attempts to produce vectorized integer expressions in bit widths
that are narrower than their scalar counterparts. By reducing the bit width
where possible, we can pack more isomorphic expressions into a single vector
and increase parallelism. The need for demotion arises especially on
architectures in which the small integer types (e.g., i8 and i16) are not legal
for scalar operations but can still be used in vectors.
Like similar work done within the loop vectorizer, we rely on InstCombine to
perform the actual type-shrinking. Here, we only insert the truncations that
are needed to seed InstCombine's type demotion. This introduces the limitation
that we can only rewrite single-use chains (every instruction in the expression
can have at most one use). We further limit ourselves to chains that are rooted
by instructions other than stores, since we cannot change the width of vector
memory operations. With these restrictions, only expression roots can be used
externally, and we sign extend them back to their original type after we
extract them from the vectors.
We use ComputeNumSignBits from ValueTracking to determine the minimum required
bit width of an expression. We update cost estimates to account for the
narrower types and sign extensions we add to the vectorized code.