This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.
Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
appropriate 8 bit element, copy to scalar, shift down the
correct bit within the 8 bits we extracted. Not exactly sure
how to describe such a bitcast from i1 vector to i8 vector
within the type system for elements less than 8.