The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.
This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.
Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
So if I'm following this correctly, this takes a cast like <12 x i8> -> <3 x i32>, and turns it into <16 x i8> -> <4 x i32>? That makes sense, but please add a comment describing it.