Fix an off-by-one error in extended umul extension for WebGPU.
Revert to the long multiplication algorithm originally added to wide
integer emulation, which was deleted in D139776. It is much easier
to see why it is correct.
Add runtime tests based on the mlir-vulkan-runner. These run both with
and without umul extension.