We can sometimes get code that does:
xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but should give identical results.
This extends the pattern matching to handle them.
A minor followup suggestion: you might want to look at using ComputeNumSignBits etc instead of requiring a specific extension opcode. This would help if, for example, one of the operands is a constant.