Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
The penalties are minor (and non-existent on some latest architectures), but definitely present on pre-AVX targets. It does allow us to be consistent along an instruction chain which isn't a bad thing. By allowing domain switching we also encourage float domain instructions which often have shorter encodings.
I'm working on getting some confirmation on the latest ones, but most current Core architectures suffer a 1-clk penalty switching between fp and int domains. This doesn't include the Atom line, which can do it for free.
The 1 clk isn't insignificant if you're latency bound and you do a lot of switching on the critical path. I'm not familiar with the code that decides to switch, but can it take architectures and maybe code size into consideration (i.e. favor smaller encoding with Os/Oz)?
Float domain is the default as we assume that float instructions are at least as small as the equivalent double/integer alternatives (this was true in SSE days, not so certain about the latest instruction sets) - this is why most domain agnostic code ends up using floats. Through that we get some optsize automatically without requiring Os/Oz. There is nothing to ensure we always use the shortest instruction (domain switches be damned).
We don't do much for specific architectures - we currently filter just by a target's instruction set - as the code is really only there to try and maintain a particular domain as long as possible.