If we have a store of a load with no other uses in between it, it's

considered dead and is removed. So sometimes when legalizing a fixed

length vector store of an insert, we end up producing better code

through scalarization than without.

An example is the follow below:

%a = load <4 x i64>, ptr %x %b = insertelement <4 x i64> %a, i64 %y, i32 2 store <4 x i64> %b, ptr %x

If this is scalarized, then DAGCombine successfully removes 3 of the 4

stores which are considered dead, and on RISC-V we get:

sd a1, 16(a0)

However if we make the vector type legal (-mattr=+v), then we lose the

optimisation because we don't scalarize it.

This patch attempts to recover the optimisation for vectors by

identifying patterns where we store a load with a single insert

inbetween, replacing it with a scalar store of the inserted element.