Suggest to replace rL221429 with the fix described in D6321.
The bug in SimplifyDemandedBits (using DemandedMask instead of the single-use guarded NewMask) should be fixed regardless of the specific X86 blend optimization. Once it's applied, handling multi-uses in the X86 specific code is not needed for correctness.
While 221429 offers a way to perform the optimization even under multiple uses the current implementation would not get a chance to do so due to the multi-use guard (also note that test3 on vselect-avx.ll actually benefits from not performing the optimization as the sext-in-reg is removed altogether if left untouched). Perhaps this approach is worth generalizing to other multi-use cases?