After thinking about it more, the extra code in D56450 doesn't add much value if we can't remove the existing matcher. So this is a minimal alternative to that patch.
The existing code is safe/correct for 128-bit ops, but we need to adjust the outputs to account for undefs no matter what.
I still need to convince myself that the last section where we match 256-bit ops is always safe, but this does fix all of the known patterns from PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243