Single instructions exist for i8 and i16 comparisons of memory against a small immediate.
This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.
7 loops in 4 files are vectorized differently:
10 files in total differ on SPEC.