If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently.
I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op.
I thought we had some utility to determine if a mask had an any-size splat subset pattern, but I don't see it, so I wrote a small match mask helper for this 1 case.
From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions.
This might fail if one half has an undef mask value and the other doesn't and we then use the mask half with the undef (there's probably a better way to phrase that....) - we need to return a merged mask with the mask value set to the non-undef case. IIRC we used to have a helper to do this but it evolved into isRepeatedShuffleMask which assumes sublane offsets.