This will cause the operation to be repeated in both a mask and another masked
or unmasked form. This can a wasted of execution resources.
Unfortunately, a lot of our intrinsic test both masked and unmasked operations
with the same inputs and we now emit the op part only once. I'll try to clean
this up, but wanted to get feedback on the patch itself first.