While we might think of these as not executing in the "packed" domain,
in fact they do. They use the same set of resources as the packed
floating point operations in every sense, and we want to ensure that we
use the correct shuffle, load, and store instructions when interacting
with them.
Fixing this today is largely aesthetic because we just ended up using
'packed single' as the fallback default. But it removes a false
domain break and will ensure that if the default is the integer domain
that doesn't interfere with producing good floating point code.
Note that this use a 'let ExeDomain = ... in' mechanism rather than
threading it through all of the multiclasses the way I did in r228135.
I'm interested in which approach seems more appropriate, and I'm happy
to make them both converge on that.