We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.
The missing coverage was noticed during the review of D40335.
Doesn't the ExpectedUses have to be 2/4/8 for xmm/ymm/zmm (double)? and 4/8/16 for float?