[CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.
This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.
Differential Revision: https://reviews.llvm.org/D60457