Final goal:
detect that, in an i16 data type example:
i16 M[x] = (M[x] & 0xFF00) | (M[x] >> 8 & 0xFF)
can be reduced to an i8 data movement:
i8 M[x] = i8 M[x+1]
The existing code does not handle shifted mask, nor truncating stores and sext loads.
Step 1: Width reduction masked loads by shifted masks
the operation
and i16 (load i16, [M]), 0xFF00
can be replaced by
shl i16 (load i8, [M+1], zext i16), 8
Typo here in Redudant?