This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.
I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.
Note that the extra tests demonstrate that scalar unary ops aren't aware of the 'pass through' nature of the remaining vector lanes - this can be fixed in a future patch with explicit patterns like for scalar binary instructions.
Including Sanjay + Elena as they have both worked on the scalar operations recently.