Split off from D46031.
In masked merge case, this degrades IPC by decreasing instruction count.
The next patch should be able to recover and improve this.
This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
I'd say this regression is an improvement, since IPC increased in that case?