Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead.
TODO: We should investigate adding VPBLENDVB handling to target shuffle combining as well.
Should we be preferring VPBLENDVB/VSELECT for AVX512 targets?