[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.
This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.
V2: Addressed review comments re test.
Differential Revision: https://reviews.llvm.org/D63160