This is a modified version of D57888 which was reverted.
It now uses PBLENDW for 128-bit integer blends using SDNodeXForms to rewrite the immediate. This keeps the instruction in the integer domain and avoids the possiblity of it being commuted and becoming movss/movsd when optsize is enabled. See more information in D57888.
I plan to reduce the test case that came with the revert and include it, but its late here and I wanted to get this patch up so Simon could look at it. I'll also make a separate patch to fix the underlying issue in the two address instruction pass.