When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This patch applies that behaviour to vector types. If the vector type
is TypePromoteInteger, the element type is going to be
TypePromoteInteger as well, which will lead to have a single promoting
load rather than N individual promoting loads. For instance, if we have
a v3i1, we would now have a load of v4i1 instead of 3 loads of i1.
I don't have any knowledge with AMDGPU and it seems that this commit
introduces some less performant code with the R600 architecture. I would
appreciate if someone knowledgeable with that architecture could enlighten
me on how bad the changes are.