PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
We can't do much with the undef demanded elts - we should probably only support the (mul X, undef -> 0) pattern the same as regular integer multiplies. I can add support for it if you guys want but I can't see it being used by real world code. Same with constant folding support.
Can we just call SimplifyDemandedVectorElts on the Intrinsic node here with an all 1s demanded mask. This allows reuse of the code in InstCombineSimplifyDemanded. See the code at line 1879 in this file in this diff.