As explained on D159533, I'm trying to generalize the "(zext (trunc x)) -> x iff the upper bits are known zero" fold in getNode() and I was seeing assertions in the aarch64 mull matching code as it was assuming these 'zero-extend-inreg' patterns will remain from earlier in LowerMUL.
Instead I've updated selectUmullSmull/skipExtensionForVectorMULL to just use value tracking to detect when the upper bits are known zero, and to insert the truncation nodes later if necessary.
I really don't like creating SDValue(N, 0) on the fly from SDNode value as technically we could be using any result index from these nodes - so I've ended up cleaning up a lot of mul code to use SDValue directly instead of peeking through to the SDNode. I'm happy to undo this and just rely use SDValue(N, 0) if there's resistance, but this is much cleaner imo. I'd push this change as pre-commit NFC.
(Sorry for still using Phab but I'm frantically trying to get my local backlog dealt with before moving over to using github branches).
I believe this should always be a 128bit vector at this point.