If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Hi Florian,
this looks good to me too, but shouldn't you also check that the transformation is not happening if any of the lanes of %ins1, other than the first one, is not an undef? Or is this case covered somewhere else?
Francesco