We only really need avx512bw for masking 256 or 512 bit GFNI
instructions due to the need for v32i1 or v64i1.
I wanted to enable 128-bit intrinsics with avx512vl, but the
__builtin_ia32_selectb_128 used in the header file requires avx512bw.
The codegen test for the same is also not using a masked instruction
because vselect with v16i1 mask and v16i8 is not legal so is expanded
before isel. To fix these issues we need a mask specific builtin and a
mask specific ISD opcode.
Fixes PR58687.
Since we are able to lower the mask version intrinsics, we have 3 choices for the FE support:
I slightly prefer to 1). I think the intention of the EVEX design is to use the masked instructions directly, so it should always imply AVX512BW in reality. But I'm fine with any way.