This is just a draft with a lot of dirty hacks to show proof-of-concept
for the revectorization of partially vectorized instructions.
The main motivation comes from here: http://llvm.org/pr42022.
We need to merge store <2 x float>, store <2 x float> to store <4 x float>.
Such cases happen, for instance, when SLP-optimizing already SLP-optimized inlined
code.
The solution is to permit insertelement's as vectorization tree node, but this should
be done carefully: insertelement's couldn't be scheduled (they have inner deps).
Early feedback is appreciated since this unfinished work could concern another revisions
(like D44067 or PR35732).