This patch attempts to replace the insertion of zero scalars with a vector blend with zero. Not only does this avoid the use of the integer insertion instructions (which are particularly slow on many targets) but it adds the ability to merge multiple insertions together.
There are 2 parts to this patch - the lowering of zero insertions to shuffles and the combining of target shuffles into blends with zeros - if accepted these will be committed separately.
Note: Support for blending in other rematerializable constants (e.g. insertion of all-bits) would be easy to add but multiple insertions wouldn't be merged until we can support combining of binary shuffles. I will add support for the insertion stage now at least if people think its worth it?
May be useful to specify the arguments with comments, e.g., /*Pos*/, etc.