This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
I think the isTypeLegal check should be !LegalTypes ||| isLegalType((), but there
seems to be an intentionally added bug in getConstant where a build_vector with
mismatched type and element type, so on x86 an i1 vector type is the result type
of a build_vector with i8 input elements.
By 'use' do you just mean 'directly access' or do you foresee other possibilities?