This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.
The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.
Could you explain why in the comment?