Pre-SSE41 targets tended to have weak (serial) GPR<->VEC moves, meaning we only allowed a single extraction before spilling the vector to stack and loading the element instead. But this didn't make use of the DWORD/WORD extraction we had to use could extract multiple i8 elements at the same time.
This patch attempts to determine if all uses of a vector are element extractions, and works out whether all the extractions share the same WORD or (lowest) DWORD, in which case we can perform a single extraction and just shift/truncate the individual elements.
ZEXT_MOVL too no?