This change attempts to produce vectorized integer expressions in bit widths

that are narrower than their scalar counterparts. By reducing the bit width

where possible, we can pack more isomorphic expressions into a single vector

and increase parallelism. The need for demotion arises especially on

architectures in which the small integer types (e.g., i8 and i16) are not legal

for scalar operations but can still be used in vectors.

Like similar work done within the loop vectorizer, we rely on InstCombine to

perform the actual type-shrinking. Here, we only insert the truncations that

are needed to seed InstCombine's type demotion. This introduces the limitation

that we can only rewrite single-use chains (every instruction in the expression

can have at most one use). We further limit ourselves to chains that are rooted

by instructions other than stores, since we cannot change the width of vector

memory operations. With these restrictions, only expression roots can be used

externally, and we sign extend them back to their original type after we

extract them from the vectors.

We use ComputeNumSignBits from ValueTracking to determine the minimum required

bit width of an expression. We update cost estimates to account for the

narrower types and sign extensions we add to the vectorized code.