Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.
Note the FIXME for AVX512: I saw that masked loads were followed by blends after this change but there is currently no blend at all. I assume that's because the blend is part of the AVX512 masked load itself? I don't know enough about AVX512 to be sure how to solve it, so I've just enabled this for AVX1/AVX2 for now.